[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-12-09 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991987#comment-16991987
 ] 

Kouhei Sutou commented on ARROW-6793:
-

Thanks!

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-12-09 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991985#comment-16991985
 ] 

Neal Richardson commented on ARROW-6793:


For reference of how it works for macOS and Windows, see 
https://cwiki.apache.org/confluence/display/ARROW/Release+Management+Guide#ReleaseManagementGuide-UpdatingRpackages,
 which explains and links to various materials about the process.

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-12-09 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991975#comment-16991975
 ] 

Kouhei Sutou commented on ARROW-6793:
-

Thanks.
I'll also look into R package installation process later.

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-12-09 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991824#comment-16991824
 ] 

Neal Richardson commented on ARROW-6793:


R can't install system dependencies mainly per CRAN policy. The actual 
[policy|https://cran.r-project.org/web/packages/policies.html] says: "Packages 
should not write in the user’s home filespace (including clipboards), nor 
anywhere else on the file system apart from the R session’s temporary directory 
(or during installation in the location pointed to by {{TMPDIR}}: and such 
usage should be cleaned up). Installing into the system’s R installation (e.g., 
scripts to its {{bin}} directory) is not allowed."

Privileges would be the next reason why we couldn't do it.

I'll try your suggestion about using the system packages in that way, but I 
fear that still won't work because of the same CRAN policy. I think we have to 
statically link.  

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-12-08 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16991094#comment-16991094
 ] 

Kouhei Sutou commented on ARROW-6793:
-

> R packages are not allowed to install system dependencies

Why? Because root privilege isn't supplied?

Ruby packages uses {{sudo}} automatically if it's needed.

Another approach: How about extracting .deb/.rpm contents for each distribution 
instead of creating new "manylinux"-ish binary? We already have binaries for 
Debian/Ubuntu/CentOS. We can use them.

We can extract .deb by {{dpkg -x}}. It doesn't require root privilege. We can 
download a package by {{apt download ${PACKAGE_NAME}}. We can collect depended 
packages by {{apt depends ${PACKAGE_NAME}}}.

We can extract .rpm by {{rpm2cpio}} and {{cpio}}. It doesn't require root 
privilege. We can download a package by {{dnf download ${PACKAGE_NAME}}}/{{yum 
download ${PACKAGE_NAME}}} but we need to install additional packages by {{dnf 
install 'dnf-command(download)'}}.

(We can provide pre-extracted binaries instead of extracting on user 
environment.)

We'll be able to use the extracted binaries by setting suitable 
{{LD_LIBRARY_PATH}}.



> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-12-07 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990650#comment-16990650
 ] 

Neal Richardson commented on ARROW-6793:


Unfortunately, R packages are not allowed to install system dependencies, so 
that's why I was looking into the manylinux approach. Not ideal but seems like 
the second-best solution. I'm less worried about the security risks of static 
linking because I plan to host nightly packages (like I'm doing for macOS and 
Windows R packages) so we can handle patches there. (Also, that seems like the 
tradeoff we're stuck with.)

Given that, any suggestions?

My hope is to get a "manylinux"-ish binary and set up some CI to see how many 
distributions that covers. I'm hoping for a reasonable coverage of versions of 
debian/ubuntu/centos. 

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-12-06 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990340#comment-16990340
 ] 

Kouhei Sutou commented on ARROW-6793:
-

Generally, I don't like manylinux approach, one binary for multiple Linux 
environments. Because it requires static linking or bundling many shared 
libraries.

Static linking isn't good for security. If there is a static linked library in 
our dependency, we need to release a new version with fixed bundled library as 
soon as possible. It's difficult because we need to vote for official release.

Bundling many shared libraries has the same security problem. It also has 
conflict problem. If X library is bundled in A library and B library, X library 
in A and B must be the same version. If X in A and X in B are different 
version, it may be cause some errors.

How about installing our official deb/rpm packages automatically at install 
time? Ruby (Red Arrow) does so.

Which Linux distributions should we support? Here are supported distributions 
for now:

  * Debian GNU/Linux 9
  * Debian GNU/Linux 10
  * Ubuntu 16.04
  * Ubuntu 18.04
  * Ubuntu 19.10
  * CentOS/RHEL 6
  * CentOS/RHEL 7
  * CentOS/RHEL 8

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-12-06 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990210#comment-16990210
 ] 

Neal Richardson commented on ARROW-6793:


I've picked this back up this week. Now that manylinux2014 is starting to 
happen (ARROW-7344), I've tried to use that as a source for libarrow et al., 
hoping that a less-ancient base image would solve the ABI issues I experienced 
with manylinux2010 wheels. Unfortunately, the behavior with manylinux2014 is 
the same as with manylinux2010. To recap, in my R build script, I'm first 
downloading a wheel, unzipping it, and pointing to it for the lib/include dirs 
(cf. 
[https://github.com/nealrichardson/sandbox/blob/5ad43525e8d5a9fc25e33fde888408629c421d52/.travis.yml#L11-L20]),
 and:
 * Building the R package without {{-D_GLIBCXX_USE_CXX11_ABI=0}}, I get an 
undefined symbol (e.g. 
[https://travis-ci.org/nealrichardson/sandbox/builds/621785506#L1603])
 * Adding {{-D_GLIBCXX_USE_CXX11_ABI=0}}, the package installs and works 
_except_ for when {{Rcpp::stop}} is called to raise an exception. This causes a 
core dump, typically {{bad_alloc}} trying to handle the exception message 
(backtrace here: 
[https://travis-ci.org/nealrichardson/sandbox/builds/621402417#L1892]). 
Symptoms are similar to 
[https://stackoverflow.com/questions/56494095/rcppstop-crashes-r-under-g]

I'm struggling to figure out how to proceed. It looks like using the wheels 
themselves won't work for R, but maybe using the wheel base image (stripped 
down as it is) is a good starting point to build a generic library without 
having to create as much parallel infrastructure. But I'm not sure how exactly 
to tweak the cmake/flags to make this work–this is not my area of expertise. Or 
maybe this is the wrong approach entirely.

Thoughts [~kou] [~kszucs]? (N.B. in reviewing this ticket you can disregard the 
previous comment thread, it's unrelated to this problem.)

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-11 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949791#comment-16949791
 ] 

Neal Richardson commented on ARROW-6793:


You don't need devtools/remotes if you want to install the current version. 
Just install it from CRAN.

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-11 Thread Thomas Schm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949785#comment-16949785
 ] 

Thomas Schm commented on ARROW-6793:


Awesome, that's good news. Everyone following this thread. To install a tagged 
version from Github run
R -e 'remotes::install_github("apache/arrow/r@apache-arrow-0.15.0")'
or with CRAN
devtools::install_version("arrow", version = "0.15.0", repos = 
"http://cran.us.r-project.org;)

Thanks for all your help on that issue. The documentation on downloading the 
precompiled libraries is unfortunately slightly outdated. But @kou is already 
on the case.  If I understand the linking process correctly there is no need to 
specify any version number for the precompiled libraries as debian is given 
merely access to a software archive and the compiler/linker can pick any 
library in need. I couldn't agree more with the initial premise of this thread. 
The experience for people running arrow on Linux relying on this binary 
packages is not exactly ideal :-) Painful. Thanks again... Note that the 
documentation is too terse for people not familiar with deep knowledge of 
debian and the way it can access libraries and/or familiar with devtools.

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-11 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949770#comment-16949770
 ] 

Neal Richardson commented on ARROW-6793:


The binaries available on the install page _are_ in sync with tagged versions 
on GitHub, but you seem to be installing the head of the master branch (what 
you get if you do install_github without specifying a tag). If you want to use 
the built binary libraries for an official release version of the C++ library, 
you need to use the corresponding R package. You can get that from CRAN–it 
isn't lagging. In the output you pasted above, you were installing from a CRAN 
snapshot "https://mran.microsoft.com/snapshot/2019-09-19/;. That's your lag.

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-11 Thread Thomas Schm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949763#comment-16949763
 ] 

Thomas Schm commented on ARROW-6793:


Gosh, that's a big can. Is there a chance to keep the precompiled libraries, 
see https://arrow.apache.org/install/ somewhat in sync with a tagged version 
from github? At the moment the libraries or all pointing to 0.15.0 etc. but 
CRAN is lagging and Github is somewhat ahead. Maybe it's a stupid idea in the 
first place to try to rely on this precomiled libraries? Or maybe one could 
install slightly outdated libraries to stay in sync with CRAN? 

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-11 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949489#comment-16949489
 ] 

Wes McKinney commented on ARROW-6793:
-

If you're building from master, you need to build both the C++ and R libraries 
from master. In general the git revision of both libraries should be the same

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-11 Thread Thomas Schm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16949227#comment-16949227
 ] 

Thomas Schm commented on ARROW-6793:


Yes, I guess this ticket is addressing a subproblem of getting arrow into R on 
Linux. Solving this problem is unfortunately a huge task and the information is 
in fragments over Github, Jira and several articles. It's a very unfortunate 
situation. Trying to install apache/arrow/r from Github worked yesterday but 
fails today. The problem today relates to a commit you have done yesterday

compression.cpp: In function ‘bool 
util___Codec__IsAvailable(arrow::Compression::type)’:
compression.cpp:37:10: error: ‘IsAvailable’ is not a member of 
‘arrow::util::Codec’
   return arrow::util::Codec::IsAvailable(codec);
  ^

Are the libraries I link to outdated? I did a fresh pull just a few minutes 
ago. Is there way to specify a certain tag in the install via github route? 

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-10 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948774#comment-16948774
 ] 

Neal Richardson commented on ARROW-6793:


You're welcome to use 
[https://github.com/apache/arrow/blob/master/r/Dockerfile]. Though for the 
record, this ticket is about something different.

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-10 Thread Thomas Schm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948740#comment-16948740
 ] 

Thomas Schm commented on ARROW-6793:


The latest greatest version on cran is 0.15.0 from October 7. I really don't 
understand enough R. Nor the ways it tries to cope with dependency hell. 
Managed now via remotes::install_github("apache/arrow/r"). Would be amazing if 
there could be a more official Dockerfile taking R users through the experience 
you call not ideal. 

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-10 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948734#comment-16948734
 ] 

Neal Richardson commented on ARROW-6793:


> trying URL 
> '[https://mran.microsoft.com/snapshot/2019-09-19/src/contrib/arrow_0.14.1.1.tar.gz']

That's not the new version of arrow. For development purposes you should be 
installing from the git repository, not CRAN (and definitely not an old 
snapshot of CRAN).

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-10 Thread Thomas Schm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948725#comment-16948725
 ] 

Thomas Schm commented on ARROW-6793:


The new version is breaking my Dockerfile.  Here's some output:

Step 6/6 : RUN install2.r --error --deps TRUE arrow
 ---> Running in 16553854c478
trying URL 
'https://mran.microsoft.com/snapshot/2019-09-19/src/contrib/arrow_0.14.1.1.tar.gz'
Content type 'application/octet-stream' length 105910 bytes (103 KB)
==
downloaded 103 KB

* installing *source* package ‘arrow’ ...
** package ‘arrow’ successfully unpacked and MD5 sums checked
** using staged installation
Arrow C++ libraries found via pkg-config
PKG_CFLAGS=-DNDEBUG -DARROW_R_WITH_ARROW
PKG_LIBS=-larrow -lparquet
** libs
g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -DNDEBUG 
-DARROW_R_WITH_ARROW -I"/usr/local/lib/R/site-library/Rcpp/include" 
-I/usr/local/include -fvisibility=hidden -fpic  -g -O2 -fstack-protector-strong 
-Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c 
array.cpp -o array.o
g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -DNDEBUG 
-DARROW_R_WITH_ARROW -I"/usr/local/lib/R/site-library/Rcpp/include" 
-I/usr/local/include -fvisibility=hidden -fpic  -g -O2 -fstack-protector-strong 
-Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g  -c 
array__to_vector.cpp -o array__to_vector.o
array__to_vector.cpp:22:35: fatal error: arrow/util/task-group.h: No such file 
or directory
 #include 
   ^
compilation terminated.
make: *** [array__to_vector.o] Error 1
/usr/local/lib/R/etc/Makeconf:176: recipe for target 'array__to_vector.o' failed
ERROR: compilation failed for package ‘arrow’
* removing ‘/usr/local/lib/R/site-library/arrow’

The downloaded source packages are in
‘/tmp/downloaded_packages’
Error: installation of package ‘arrow’ had non-zero exit status
In addition: Warning message:
In install.packages(pkgs, ...) :
  installation of package ‘arrow’ had non-zero exit status
ERROR: Service 'r' failed to build: The command '/bin/sh -c install2.r --error 
--deps TRUE arrow' returned a non-zero code: 1
/home/thomas/github/antarctic/r/Makefile:23: recipe for target 'build' failed

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-06 Thread Thomas Schm (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16945433#comment-16945433
 ] 

Thomas Schm commented on ARROW-6793:


Very recently i had the pleasure to install arrow on Linux. At this stage let 
me first remark that without the help of @xhochy and @kou I certainly would 
have failed. I have now managed to install(? still quite a lot of warning 
messages) in a rocker container. I have published the docker-image here:

https://hub.docker.com/r/tschm/rocker-arrow

Maybe one of the experts could fix and/or improve it? Many thanks

Thomas

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6793) [R] Arrow C++ binary packaging for Linux

2019-10-04 Thread Wes McKinney (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16944786#comment-16944786
 ] 

Wes McKinney commented on ARROW-6793:
-

+1. We should be able to take the manylinux2010 base image and tweak the 
CXXFLAGS to suit R's requirements. 

Note that we may have to generate two different libraries, one for pre-gcc5 ABI 
and one for post. I think manylinux2010 uses the pre-gcc5 ABI in the interest 
of broad spectrum compatibility. The R build may need to detect which ABI the 
active configuration needs. Not sure how easy that will be

> [R] Arrow C++ binary packaging for Linux
> 
>
> Key: ARROW-6793
> URL: https://issues.apache.org/jira/browse/ARROW-6793
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
> Fix For: 1.0.0
>
>
> Our current installation experience on Linux isn't ideal. Unless you've 
> already installed the Arrow C++ library, when you install the R package, you 
> get a shell that tells you to install the C++ library. That was a useful 
> approach to allow us to get the package on CRAN, which makes it easy for 
> macOS and Windows users to install, but it doesn't improve the installation 
> experience for Linux users. This is an impediment to adoption of arrow not 
> only by users but also by package maintainers who might want to depend on 
> arrow. 
> macOS and Windows have a better experience because at installation time, the 
> configure scripts download and statically link a prebuilt C++ library. CRAN 
> bundles the whole thing up and delivers that as a binary R package. 
> Python wheels do a similar thing: they're binaries that contain all external 
> dependencies. And there are pyarrow wheels for Linux. This suggests that we 
> could do something similar for R: build a generic Linux binary of the C++ 
> library and download it in the R package configure script at install time.
> I experimented with using the Arrow C++ binaries included in the Python 
> wheels in R. See discussion at the end of ARROW-5956. This worked on macOS 
> (not useful for R, but it proved the concept) and almost worked on Linux, but 
> it turned out that the "manylinux2010" standard is too archaic to work with 
> contemporary Rcpp. 
> Proposal: do a similar workflow to what the manylinux2010 pyarrow build does, 
> just with slightly more modern compiler/settings. Publish that C++ binary 
> package to bintray. Then download it in the R configure script if a 
> local/system package isn't found.
> Once we have a basic version working, test against various distros on 
> [R-hub|https://builder.r-hub.io/advanced] to make sure we're solid everywhere 
> and/or ensure the current fallback behavior when we encounter a distro that 
> this doesn't work for. If necessary, we can make multiple flavors of this C++ 
> binary for debian, centos, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)