[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991987#comment-16991987 ] Kouhei Sutou commented on ARROW-6793: - Thanks! > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991985#comment-16991985 ] Neal Richardson commented on ARROW-6793: For reference of how it works for macOS and Windows, see https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingRpackages, which explains and links to various materials about the process. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991975#comment-16991975 ] Kouhei Sutou commented on ARROW-6793: - Thanks. I'll also look into R package installation process later. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991824#comment-16991824 ] Neal Richardson commented on ARROW-6793: R can't install system dependencies mainly per CRAN policy. The actual [policy|https://cran.r-project.org/web/packages/policies.html] says: "Packages should not write in the user’s home filespace (including clipboards), nor anywhere else on the file system apart from the R session’s temporary directory (or during installation in the location pointed to by {{TMPDIR}}: and such usage should be cleaned up). Installing into the system’s R installation (e.g., scripts to its {{bin}} directory) is not allowed." Privileges would be the next reason why we couldn't do it. I'll try your suggestion about using the system packages in that way, but I fear that still won't work because of the same CRAN policy. I think we have to statically link. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991094#comment-16991094 ] Kouhei Sutou commented on ARROW-6793: - > R packages are not allowed to install system dependencies Why? Because root privilege isn't supplied? Ruby packages uses {{sudo}} automatically if it's needed. Another approach: How about extracting .deb/.rpm contents for each distribution instead of creating new "manylinux"-ish binary? We already have binaries for Debian/Ubuntu/CentOS. We can use them. We can extract .deb by {{dpkg -x}}. It doesn't require root privilege. We can download a package by {{apt download ${PACKAGE_NAME}}. We can collect depended packages by {{apt depends ${PACKAGE_NAME}}}. We can extract .rpm by {{rpm2cpio}} and {{cpio}}. It doesn't require root privilege. We can download a package by {{dnf download ${PACKAGE_NAME}}}/{{yum download ${PACKAGE_NAME}}} but we need to install additional packages by {{dnf install 'dnf-command(download)'}}. (We can provide pre-extracted binaries instead of extracting on user environment.) We'll be able to use the extracted binaries by setting suitable {{LD_LIBRARY_PATH}}. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990650#comment-16990650 ] Neal Richardson commented on ARROW-6793: Unfortunately, R packages are not allowed to install system dependencies, so that's why I was looking into the manylinux approach. Not ideal but seems like the second-best solution. I'm less worried about the security risks of static linking because I plan to host nightly packages (like I'm doing for macOS and Windows R packages) so we can handle patches there. (Also, that seems like the tradeoff we're stuck with.) Given that, any suggestions? My hope is to get a "manylinux"-ish binary and set up some CI to see how many distributions that covers. I'm hoping for a reasonable coverage of versions of debian/ubuntu/centos. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990340#comment-16990340 ] Kouhei Sutou commented on ARROW-6793: - Generally, I don't like manylinux approach, one binary for multiple Linux environments. Because it requires static linking or bundling many shared libraries. Static linking isn't good for security. If there is a static linked library in our dependency, we need to release a new version with fixed bundled library as soon as possible. It's difficult because we need to vote for official release. Bundling many shared libraries has the same security problem. It also has conflict problem. If X library is bundled in A library and B library, X library in A and B must be the same version. If X in A and X in B are different version, it may be cause some errors. How about installing our official deb/rpm packages automatically at install time? Ruby (Red Arrow) does so. Which Linux distributions should we support? Here are supported distributions for now: * Debian GNU/Linux 9 * Debian GNU/Linux 10 * Ubuntu 16.04 * Ubuntu 18.04 * Ubuntu 19.10 * CentOS/RHEL 6 * CentOS/RHEL 7 * CentOS/RHEL 8 > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990210#comment-16990210 ] Neal Richardson commented on ARROW-6793: I've picked this back up this week. Now that manylinux2014 is starting to happen (ARROW-7344), I've tried to use that as a source for libarrow et al., hoping that a less-ancient base image would solve the ABI issues I experienced with manylinux2010 wheels. Unfortunately, the behavior with manylinux2014 is the same as with manylinux2010. To recap, in my R build script, I'm first downloading a wheel, unzipping it, and pointing to it for the lib/include dirs (cf. [https://github.com/nealrichardson/sandbox/blob/5ad43525e8d5a9fc25e33fde888408629c421d52/.travis.yml#L11-L20]), and: * Building the R package without {{-D_GLIBCXX_USE_CXX11_ABI=0}}, I get an undefined symbol (e.g. [https://travis-ci.org/nealrichardson/sandbox/builds/621785506#L1603]) * Adding {{-D_GLIBCXX_USE_CXX11_ABI=0}}, the package installs and works _except_ for when {{Rcpp::stop}} is called to raise an exception. This causes a core dump, typically {{bad_alloc}} trying to handle the exception message (backtrace here: [https://travis-ci.org/nealrichardson/sandbox/builds/621402417#L1892]). Symptoms are similar to [https://stackoverflow.com/questions/56494095/rcppstop-crashes-r-under-g] I'm struggling to figure out how to proceed. It looks like using the wheels themselves won't work for R, but maybe using the wheel base image (stripped down as it is) is a good starting point to build a generic library without having to create as much parallel infrastructure. But I'm not sure how exactly to tweak the cmake/flags to make this work–this is not my area of expertise. Or maybe this is the wrong approach entirely. Thoughts [~kou] [~kszucs]? (N.B. in reviewing this ticket you can disregard the previous comment thread, it's unrelated to this problem.) > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949791#comment-16949791 ] Neal Richardson commented on ARROW-6793: You don't need devtools/remotes if you want to install the current version. Just install it from CRAN. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949785#comment-16949785 ] Thomas Schm commented on ARROW-6793: Awesome, that's good news. Everyone following this thread. To install a tagged version from Github run R -e 'remotes::install_github("apache/arrow/r@apache-arrow-0.15.0")' or with CRAN devtools::install_version("arrow", version = "0.15.0", repos = "http://cran.us.r-project.org;) Thanks for all your help on that issue. The documentation on downloading the precompiled libraries is unfortunately slightly outdated. But @kou is already on the case. If I understand the linking process correctly there is no need to specify any version number for the precompiled libraries as debian is given merely access to a software archive and the compiler/linker can pick any library in need. I couldn't agree more with the initial premise of this thread. The experience for people running arrow on Linux relying on this binary packages is not exactly ideal :-) Painful. Thanks again... Note that the documentation is too terse for people not familiar with deep knowledge of debian and the way it can access libraries and/or familiar with devtools. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949770#comment-16949770 ] Neal Richardson commented on ARROW-6793: The binaries available on the install page _are_ in sync with tagged versions on GitHub, but you seem to be installing the head of the master branch (what you get if you do install_github without specifying a tag). If you want to use the built binary libraries for an official release version of the C++ library, you need to use the corresponding R package. You can get that from CRAN–it isn't lagging. In the output you pasted above, you were installing from a CRAN snapshot "https://mran.microsoft.com/snapshot/2019-09-19/;. That's your lag. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949763#comment-16949763 ] Thomas Schm commented on ARROW-6793: Gosh, that's a big can. Is there a chance to keep the precompiled libraries, see https://arrow.apache.org/install/ somewhat in sync with a tagged version from github? At the moment the libraries or all pointing to 0.15.0 etc. but CRAN is lagging and Github is somewhat ahead. Maybe it's a stupid idea in the first place to try to rely on this precomiled libraries? Or maybe one could install slightly outdated libraries to stay in sync with CRAN? > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949489#comment-16949489 ] Wes McKinney commented on ARROW-6793: - If you're building from master, you need to build both the C++ and R libraries from master. In general the git revision of both libraries should be the same > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949227#comment-16949227 ] Thomas Schm commented on ARROW-6793: Yes, I guess this ticket is addressing a subproblem of getting arrow into R on Linux. Solving this problem is unfortunately a huge task and the information is in fragments over Github, Jira and several articles. It's a very unfortunate situation. Trying to install apache/arrow/r from Github worked yesterday but fails today. The problem today relates to a commit you have done yesterday compression.cpp: In function ‘bool util___Codec__IsAvailable(arrow::Compression::type)’: compression.cpp:37:10: error: ‘IsAvailable’ is not a member of ‘arrow::util::Codec’ return arrow::util::Codec::IsAvailable(codec); ^ Are the libraries I link to outdated? I did a fresh pull just a few minutes ago. Is there way to specify a certain tag in the install via github route? > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948774#comment-16948774 ] Neal Richardson commented on ARROW-6793: You're welcome to use [https://github.com/apache/arrow/blob/master/r/Dockerfile]. Though for the record, this ticket is about something different. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948740#comment-16948740 ] Thomas Schm commented on ARROW-6793: The latest greatest version on cran is 0.15.0 from October 7. I really don't understand enough R. Nor the ways it tries to cope with dependency hell. Managed now via remotes::install_github("apache/arrow/r"). Would be amazing if there could be a more official Dockerfile taking R users through the experience you call not ideal. > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948734#comment-16948734 ] Neal Richardson commented on ARROW-6793: > trying URL > '[https://mran.microsoft.com/snapshot/2019-09-19/src/contrib/arrow_0.14.1.1.tar.gz'] That's not the new version of arrow. For development purposes you should be installing from the git repository, not CRAN (and definitely not an old snapshot of CRAN). > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948725#comment-16948725 ] Thomas Schm commented on ARROW-6793: The new version is breaking my Dockerfile. Here's some output: Step 6/6 : RUN install2.r --error --deps TRUE arrow ---> Running in 16553854c478 trying URL 'https://mran.microsoft.com/snapshot/2019-09-19/src/contrib/arrow_0.14.1.1.tar.gz' Content type 'application/octet-stream' length 105910 bytes (103 KB) == downloaded 103 KB * installing *source* package ‘arrow’ ... ** package ‘arrow’ successfully unpacked and MD5 sums checked ** using staged installation Arrow C++ libraries found via pkg-config PKG_CFLAGS=-DNDEBUG -DARROW_R_WITH_ARROW PKG_LIBS=-larrow -lparquet ** libs g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -DNDEBUG -DARROW_R_WITH_ARROW -I"/usr/local/lib/R/site-library/Rcpp/include" -I/usr/local/include -fvisibility=hidden -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c array.cpp -o array.o g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -DNDEBUG -DARROW_R_WITH_ARROW -I"/usr/local/lib/R/site-library/Rcpp/include" -I/usr/local/include -fvisibility=hidden -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c array__to_vector.cpp -o array__to_vector.o array__to_vector.cpp:22:35: fatal error: arrow/util/task-group.h: No such file or directory #include ^ compilation terminated. make: *** [array__to_vector.o] Error 1 /usr/local/lib/R/etc/Makeconf:176: recipe for target 'array__to_vector.o' failed ERROR: compilation failed for package ‘arrow’ * removing ‘/usr/local/lib/R/site-library/arrow’ The downloaded source packages are in ‘/tmp/downloaded_packages’ Error: installation of package ‘arrow’ had non-zero exit status In addition: Warning message: In install.packages(pkgs, ...) : installation of package ‘arrow’ had non-zero exit status ERROR: Service 'r' failed to build: The command '/bin/sh -c install2.r --error --deps TRUE arrow' returned a non-zero code: 1 /home/thomas/github/antarctic/r/Makefile:23: recipe for target 'build' failed > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945433#comment-16945433 ] Thomas Schm commented on ARROW-6793: Very recently i had the pleasure to install arrow on Linux. At this stage let me first remark that without the help of @xhochy and @kou I certainly would have failed. I have now managed to install(? still quite a lot of warning messages) in a rocker container. I have published the docker-image here: https://hub.docker.com/r/tschm/rocker-arrow Maybe one of the experts could fix and/or improve it? Many thanks Thomas > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux
[ https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944786#comment-16944786 ] Wes McKinney commented on ARROW-6793: - +1. We should be able to take the manylinux2010 base image and tweak the CXXFLAGS to suit R's requirements. Note that we may have to generate two different libraries, one for pre-gcc5 ABI and one for post. I think manylinux2010 uses the pre-gcc5 ABI in the interest of broad spectrum compatibility. The R build may need to detect which ABI the active configuration needs. Not sure how easy that will be > [R] Arrow C++ binary packaging for Linux > > > Key: ARROW-6793 > URL: https://issues.apache.org/jira/browse/ARROW-6793 > Project: Apache Arrow > Issue Type: Improvement > Components: R >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Fix For: 1.0.0 > > > Our current installation experience on Linux isn't ideal. Unless you've > already installed the Arrow C++ library, when you install the R package, you > get a shell that tells you to install the C++ library. That was a useful > approach to allow us to get the package on CRAN, which makes it easy for > macOS and Windows users to install, but it doesn't improve the installation > experience for Linux users. This is an impediment to adoption of arrow not > only by users but also by package maintainers who might want to depend on > arrow. > macOS and Windows have a better experience because at installation time, the > configure scripts download and statically link a prebuilt C++ library. CRAN > bundles the whole thing up and delivers that as a binary R package. > Python wheels do a similar thing: they're binaries that contain all external > dependencies. And there are pyarrow wheels for Linux. This suggests that we > could do something similar for R: build a generic Linux binary of the C++ > library and download it in the R package configure script at install time. > I experimented with using the Arrow C++ binaries included in the Python > wheels in R. See discussion at the end of ARROW-5956. This worked on macOS > (not useful for R, but it proved the concept) and almost worked on Linux, but > it turned out that the "manylinux2010" standard is too archaic to work with > contemporary Rcpp. > Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, > just with slightly more modern compiler/settings. Publish that C++ binary > package to bintray. Then download it in the R configure script if a > local/system package isn't found. > Once we have a basic version working, test against various distros on > [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere > and/or ensure the current fallback behavior when we encounter a distro that > this doesn't work for. If necessary, we can make multiple flavors of this C++ > binary for debian, centos, etc. -- This message was sent by Atlassian Jira (v8.3.4#803005)