Thanks for the details. I see you're using RStudio Package Manager. There was an issue with the binaries that RSPM built for 6.0.0.2, we've been discussing with them and they should be fixing it on their side, so this should resolve itself soon (if it isn't already resolved).
Neal On Mon, Nov 1, 2021 at 1:36 PM Chris Berthiaume <[email protected]> wrote: > Hi Neal, > > Here's a reproducible example using a fresh Docker container for > bioconductor/bioconductor_docker:RELEASE_3_13. I start the container, start > R, install arrow, attach arrow, then try to read a simple parquet file I > just now created separately in Rstudio on MacOS with arrow 5.0.0. This > fails. I stop/start R again, install arrow 5.0.0.2 with > devtools::install_version(), attach, then verify that I can successfully > read the same parquet file. > > I've changed the R prompt character below from ">" to "$" to prevent any > text from being interpreted as an email reply. > > # Creating the parquet file in Rstudio in MacOS > $ x <- data.frame(A=seq(0, 2), B=seq(10,12)) > $ x > A B > 1 0 10 > 2 1 11 > 3 2 12 > $ arrow::write_parquet(x, "~/Desktop/arrowtest/x.parquet") > > # Run the test in a docker container > docker run -it --rm -v ~/Desktop/arrowtest:/data > bioconductor/bioconductor_docker:RELEASE_3_13 bash > root@5fa84c3f4a41:/# cd /data > root@5fa84c3f4a41:/data# R > > R version 4.1.1 (2021-08-10) -- "Kick Things" > Copyright (C) 2021 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > > $ install.packages('arrow') > Installing package into ‘/usr/local/lib/R/site-library’ > (as ‘lib’ is unspecified) > also installing the dependencies ‘bit’, ‘assertthat’, ‘bit64’ > > trying URL ' > https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit_4.0.4.tar.gz > ' > Content type 'binary/octet-stream' length 691644 bytes (675 KB) > ================================================== > downloaded 675 KB > > trying URL ' > https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/assertthat_0.2.1.tar.gz > ' > Content type 'binary/octet-stream' length 52329 bytes (51 KB) > ================================================== > downloaded 51 KB > > trying URL ' > https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit64_4.0.5.tar.gz > ' > Content type 'binary/octet-stream' length 573106 bytes (559 KB) > ================================================== > downloaded 559 KB > > trying URL ' > https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_6.0.0.2.tar.gz > ' > Content type 'binary/octet-stream' length 23646684 bytes (22.6 MB) > ================================================== > downloaded 22.6 MB > > * installing *binary* package ‘bit’ ... > * DONE (bit) > * installing *binary* package ‘assertthat’ ... > * DONE (assertthat) > * installing *binary* package ‘bit64’ ... > * DONE (bit64) > * installing *binary* package ‘arrow’ ... > * DONE (arrow) > > The downloaded source packages are in > ‘/tmp/Rtmp8HkDvX/downloaded_packages’ > $ library(arrow) > See arrow_info() for available features > > Attaching package: ‘arrow’ > > The following object is masked from ‘package:utils’: > > timestamp > > $ sessionInfo() > R version 4.1.1 (2021-08-10) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 20.04.3 LTS > > Matrix products: default > BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/ > libopenblasp-r0.3.8.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] arrow_6.0.0.2 > > loaded via a namespace (and not attached): > [1] tidyselect_1.1.1 bit_4.0.4 compiler_4.1.1 magrittr_2.0.1 > [5] assertthat_0.2.1 R6_2.5.1 tools_4.1.1 glue_1.4.2 > [9] bit64_4.0.5 vctrs_0.3.8 rlang_0.4.11 purrr_0.3.4 > $ read_parquet("x.parquet") > Error: NotImplemented: Support for codec 'snappy' not built > In order to read this file, you will need to reinstall arrow with > additional features enabled. > Set one of these environment variables before installing: > > * LIBARROW_MINIMAL=false (for all optional features, including 'snappy') > * ARROW_WITH_SNAPPY=ON (for just 'snappy') > > See https://arrow.apache.org/docs/r/articles/install.html for details > > root@5fa84c3f4a41:/data# R > > R version 4.1.1 (2021-08-10) -- "Kick Things" > Copyright (C) 2021 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > > $ devtools::install_version("arrow", "5.0.0.2") > Downloading package from url: > https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/Archive/arrow/arrow_5.0.0.2.tar.gz > These packages have more recent versions available. > It is recommended to update all of them. > Which would you like to update? > > 1: All > 2: CRAN packages only > 3: None > 4: rlang (0.4.11 -> 0.4.12) [CRAN] > > Enter one or more numbers, or an empty line to skip updates: > Installing package into ‘/usr/local/lib/R/site-library’ > (as ‘lib’ is unspecified) > * installing *binary* package ‘arrow’ ... > * DONE (arrow) > $ library(arrow) > > Attaching package: ‘arrow’ > > The following object is masked from ‘package:utils’: > > timestamp > > $ sessionInfo() > R version 4.1.1 (2021-08-10) > Platform: x86_64-pc-linux-gnu (64-bit) > Running under: Ubuntu 20.04.3 LTS > > Matrix products: default > BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/ > libopenblasp-r0.3.8.so > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] arrow_5.0.0.2 > > loaded via a namespace (and not attached): > [1] magrittr_2.0.1 usethis_2.0.1 devtools_2.4.2 > tidyselect_1.1.1 > [5] bit_4.0.4 pkgload_1.2.2 R6_2.5.1 rlang_0.4.11 > > [9] fastmap_1.1.0 tools_4.1.1 pkgbuild_1.2.0 > sessioninfo_1.1.1 > [13] cli_3.0.1 withr_2.4.2 ellipsis_0.3.2 remotes_2.4.0 > > [17] bit64_4.0.5 rprojroot_2.0.2 assertthat_0.2.1 lifecycle_1.0.1 > > [21] crayon_1.4.1 processx_3.5.2 purrr_0.3.4 callr_3.7.0 > > [25] vctrs_0.3.8 fs_1.5.0 ps_1.6.0 testthat_3.0.4 > > [29] memoise_2.0.0 glue_1.4.2 cachem_1.0.6 compiler_4.1.1 > > [33] desc_1.3.0 prettyunits_1.1.1 > $ read_parquet("x.parquet") > A B > 1 0 10 > 2 1 11 > 3 2 12 > > On Mon, Nov 1, 2021 at 7:05 AM Neal Richardson < > [email protected]> wrote: > >> Hi Chris, >> Could you share the output from when you installed the package? Snappy >> and the other compression libraries should be on in the binaries (see >> https://github.com/ursa-labs/arrow-r-nightly/runs/4052316735?check_suite_focus=true#step:4:625 >> for example), so I'm curious if there's anything in the install logs that >> help us understand what's up. >> >> Neal >> >> On Sun, Oct 31, 2021 at 7:06 PM Chris Berthiaume <[email protected]> wrote: >> >>> Hello, >>> >>> After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker >>> container, I started to see some new errors when reading Parquet files that >>> use snappy compression. I'm using the prebuilt Linux binary by setting >>> LIBARROW_BINARY=true during installation. Building arrow using the latest >>> nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux >>> binary does not have snappy compression support enabled? The error is >>> copied below. >>> >>> Error: NotImplemented: Support for codec 'snappy' not built >>> In order to read this file, you will need to reinstall arrow with >>> additional features enabled. >>> Set one of these environment variables before installing: >>> >>> * LIBARROW_MINIMAL=false (for all optional features, including 'snappy') >>> * ARROW_WITH_SNAPPY=ON (for just 'snappy') >>> >>> See https://arrow.apache.org/docs/r/articles/install.html for details >>> Backtrace: >>> 1. popcycle::get.vct.by.file(db, vct_dir, >>> "2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2 >>> 4. arrow::read_parquet(...) >>> 5. base::tryCatch(reader$ReadTable(), error = read_compressed_error) >>> 6. base:::tryCatchList(expr, classes, parentenv, handlers) >>> 7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]]) >>> 8. value[[3L]](cond) >>> >>> Thanks, >>> Chris Berthiaume >>> >>
