[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369463#comment-16369463 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366789122 Thanks @wesm @xhochy and @felixcheung ! Since it can sometimes take a while to get Spark updated, ff we get to the point were this is ready to be put in the nightly builds, maybe I could submit a PR to patch Spark and we could configure the docker build to point to that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368275#comment-16368275 ] ASF GitHub Bot commented on ARROW-1579: --- wesm closed pull request #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/dev/docker-compose.yml b/dev/docker-compose.yml index a73fd1bfb..b1e593cf4 100644 --- a/dev/docker-compose.yml +++ b/dev/docker-compose.yml @@ -33,3 +33,8 @@ services: context: dask_integration volumes: - ../..:/apache-arrow + spark_integration: +build: + context: spark_integration +volumes: + - ../..:/apache-arrow diff --git a/dev/spark_integration/Dockerfile b/dev/spark_integration/Dockerfile new file mode 100644 index 0..d1b3cf89f --- /dev/null +++ b/dev/spark_integration/Dockerfile @@ -0,0 +1,70 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +FROM maven:3.5.2-jdk-8-slim + +# Basic OS utilities +RUN apt-get update && apt-get install -y \ +wget \ +git build-essential \ +software-properties-common + +# This will install conda in /home/ubuntu/miniconda +RUN wget -O /tmp/miniconda.sh \ +https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \ +bash /tmp/miniconda.sh -b -p /home/ubuntu/miniconda && \ +rm /tmp/miniconda.sh + +# Python dependencies +RUN apt-get install -y \ +pkg-config + +# Create Conda environment +ENV PATH="/home/ubuntu/miniconda/bin:${PATH}" +RUN conda create -y -q -n pyarrow-dev \ +# Python +python=2.7 \ +numpy \ +pandas \ +pytest \ +cython \ +ipython \ +matplotlib \ +six \ +setuptools \ +setuptools_scm \ +# C++ +boost-cpp \ +cmake \ +flatbuffers \ +rapidjson \ +thrift-cpp \ +snappy \ +zlib \ +gflags \ +brotli \ +jemalloc \ +lz4-c \ +zstd \ +-c conda-forge + +ADD . /apache-arrow +WORKDIR /apache-arrow + +CMD arrow/dev/spark_integration/spark_integration.sh + +# BUILD: $ docker build -f arrow/dev/spark_integration/Dockerfile -t spark-arrow . +# RUN: $ docker run -v $HOME/.m2:/root/.m2 spark-arrow diff --git a/dev/spark_integration/spark_integration.sh b/dev/spark_integration/spark_integration.sh new file mode 100755 index 0..8ca4dc3ac --- /dev/null +++ b/dev/spark_integration/spark_integration.sh @@ -0,0 +1,92 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Exit on any error +set -e + +# Set up environment and working directory +cd /apache-arrow + +# Activate our pyarrow-dev conda env +source activate pyarrow-dev + +export ARROW_HOME=$(pwd)/arrow +export ARROW_BUILD_TYPE=release +export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX +export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH} +export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m" + +# Build Arrow C++ +pushd arrow/cpp +rm -rf build/* +mkdir -p build +cd build/ +cmake -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" -DARROW_PYTHON=on -DA
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368273#comment-16368273 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366453717 Works for me. Merging now This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367938#comment-16367938 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366370963 We definitely need an upstream patch for Spark but I'm happy with merging this PR as-is as it shows that we can now test Spark with it. @wesm is this ok for you? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367767#comment-16367767 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366333654 That error is from #1490 being merged yesterday - what I was talking about in https://github.com/apache/arrow/pull/1319#issuecomment-358726711 but I was hoping to at least get this running for a little while first before breaking Spark! I guess we could either rebase this just before #1490 to ensure it works or I can provide a patch for Spark to update it for that breaking change in Java? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367740#comment-16367740 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366328470 Sadly I get the following error on master: ``` [warn] Class org.apache.avro.reflect.Stringable not found - continuing with a stub. [error] /apache-arrow/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala:65: not found: type NullableMapVector [error] case (StructType(_), vector: NullableMapVector) => [error]^ [error] /apache-arrow/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala:66: value size is not a member of org.apache.arrow.vector.ValueVector [error] val children = (0 until vector.size()).map { ordinal => [error]^ [error] /apache-arrow/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala:67: value getChildByOrdinal is not a member of org.apache.arrow.vector.ValueVector [error] createFieldWriter(vector.getChildByOrdinal(ordinal)) [error]^ [error] /apache-arrow/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala:318: not found: type NullableMapVector [error] val valueVector: NullableMapVector, [error] ^ [warn] Class org.apache.avro.reflect.Stringable not found - continuing with a stub. [warn] two warnings found [error] four errors found [error] Compile failed at Feb 16, 2018 6:49:56 PM [50.120s] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367706#comment-16367706 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366321795 I probably should have rebased here earlier, if anyone else is going to try this out let me know and I can do that first. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367630#comment-16367630 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366305004 As this is part of a docker-compose script, this should be run using `docker-compose build && docker-compose run spark_integration`. Then it will use the options declared in the `docker-compose.yml` (mostly the mount of the current working dir) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367631#comment-16367631 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366305165 To get past the build issues I had to rebase locally. Once the script has run through I will report back. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366568#comment-16366568 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366134692 yeah, we should have it better documented. I copied how the api docs script was done and it's run with docker compose, but I'm not quite sure how to run it with through that. I just used the regular `docker run` cmd This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366564#comment-16366564 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366134424 Thanks @wesm , I think that is because the brotli libraries were recently updated and #1554 should fix that. Could you check if you have the latest master checked out? The docker script just takes what is in your current dir - not sure if that is what we want or not. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366535#comment-16366535 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366126289 Build failed for me with ``` [ 52%] Building CXX object src/arrow/CMakeFiles/arrow_objlib.dir/ipc/writer.cc.o In file included from /apache-arrow/arrow/cpp/src/arrow/buffer.h:29:0, from /apache-arrow/arrow/cpp/src/arrow/array.h:27, from /apache-arrow/arrow/cpp/src/arrow/ipc/json-internal.cc:28: /apache-arrow/arrow/cpp/src/arrow/status.h: In function ‘arrow::Status arrow::ipc::internal::json::GetField(const Value&, const arrow::ipc::DictionaryMemo*, std::shared_ptr*)’: /apache-arrow/arrow/cpp/src/arrow/status.h:68:19: warning: ‘dictionary_id’ may be used uninitialized in this function [-Wmaybe-uninitialized] Status _s = (s); \ ^ /apache-arrow/arrow/cpp/src/arrow/ipc/json-internal.cc:869:13: note: ‘dictionary_id’ was declared here int64_t dictionary_id; ^ In file included from /apache-arrow/arrow/cpp/src/arrow/buffer.h:29:0, from /apache-arrow/arrow/cpp/src/arrow/array.h:27, from /apache-arrow/arrow/cpp/src/arrow/ipc/json-internal.cc:28: /apache-arrow/arrow/cpp/src/arrow/status.h: In function ‘arrow::Status arrow::ipc::internal::json::ReadSchema(const Value&, arrow::MemoryPool*, std::shared_ptr*)’: /apache-arrow/arrow/cpp/src/arrow/status.h:68:19: warning: ‘dictionary_id’ may be used uninitialized in this function [-Wmaybe-uninitialized] Status _s = (s); \ ^ /apache-arrow/arrow/cpp/src/arrow/ipc/json-internal.cc:1349:13: note: ‘dictionary_id’ was declared here int64_t dictionary_id; ^ [ 52%] Built target arrow_objlib Scanning dependencies of target arrow_static Scanning dependencies of target arrow_shared make[2]: *** No rule to make target '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/libbrotlidec.a', needed by 'release/libarrow.so.0.0.0'. Stop. CMakeFiles/Makefile2:693: recipe for target 'src/arrow/CMakeFiles/arrow_shared.dir/all' failed make[1]: *** [src/arrow/CMakeFiles/arrow_shared.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs [ 52%] Linking CXX static library ../../release/libarrow.a [ 52%] Built target arrow_static Makefile:140: recipe for target 'all' failed make: *** [all] Error 2 ``` Haven't dug too far into what's wrong. Can we document how to run this someplace other than the Dockerfile? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366536#comment-16366536 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-366126289 Build failed for me with ``` Scanning dependencies of target arrow_static Scanning dependencies of target arrow_shared make[2]: *** No rule to make target '/home/ubuntu/miniconda/envs/pyarrow-dev/lib/libbrotlidec.a', needed by 'release/libarrow.so.0.0.0'. Stop. CMakeFiles/Makefile2:693: recipe for target 'src/arrow/CMakeFiles/arrow_shared.dir/all' failed make[1]: *** [src/arrow/CMakeFiles/arrow_shared.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs [ 52%] Linking CXX static library ../../release/libarrow.a [ 52%] Built target arrow_static Makefile:140: recipe for target 'all' failed make: *** [all] Error 2 ``` Haven't dug too far into what's wrong. Can we document how to run this someplace other than the Dockerfile? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365898#comment-16365898 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365988801 Sweet, I can take this for a spin later today, or if @cpcloud wants to look that also works This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364624#comment-16364624 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365710909 This is working using ARROW_BUILD_TOOLCHAIN set to the conda evn, and I think this could be merged now. Is somebody else able to verify that it is working before merging? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364585#comment-16364585 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-344361239 This is currently a WIP, the Scala/Java tests are able to run Left TODO: - [x] Run PySpark Tests - [ ] Verify working with docker-compose and existing volumes in arrow/dev - [x] Check why Zinc is unable to run in mvn build, need to enable port 3030? - [x] Speed up pyarrow build using conda prefix as toolchain This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363349#comment-16363349 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365466851 Now you need to set the `PYARROW_CXXFLAGS` environment variable with the gcc5 abi flag This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363297#comment-16363297 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365456994 Well, I'm able to run the build with `ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` but I do need to keep `-DCMAKE_INSTALL_PREFIX=$ARROW_HOME` for the Python build to work (is this right?), and then `LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}` to locate libarrow.so. Now, I get an error with `import pyarrow` because of some undefined symbol: `ImportError: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/pyarrow-0.8.1.dev116+g7bf7b2e9-py2.7-linux-x86_64.egg/pyarrow/lib.so: undefined symbol: _ZN5arrow9timestampENS_8TimeUnit4typeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE` I don't think this is from an arrow library, but I can't tell where it's coming from, any ideas? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363294#comment-16363294 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365456994 Well, I'm able to run the build with `ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` but I do need to keep `-DCMAKE_INSTALL_PREFIX=$ARROW_HOME` for the Python build to work (is this right?), and then `LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}` to locate libarrow.so. Now, I get an error with `import pyarrow` because of some undefined symbol: `ImportError: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/pyarrow-0.8.1.dev116+g7bf7b2e9-py2.7-linux-x86_64.egg/pyarrow/lib.so: undefined symbol: _ZN5arrow9timestampENS_8TimeUnit4typeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE` I don't think this is from an arrow library, but I can't tell what from, any ideas? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363293#comment-16363293 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365456994 Well, I'm able to run the build with `ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` but I do need to keep `-DCMAKE_INSTALL_PREFIX=$ARROW_HOME` for the Python build to work (is this right?), and then `LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}` to locate libarrow.so. Now, I get an error with `import pyarrow` because of some undefined symbol: ``` ImportError: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/pyarrow-0.8.1.dev116+g7bf7b2e9-py2.7-linux-x86_64.egg/pyarrow/lib.so: undefined symbol: _ZN5arrow9timestampENS_8TimeUnit4typeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE ``` I don't think this is from an arrow library, but I can't tell what from, any ideas? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363292#comment-16363292 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365456994 Well, I'm able to run the build with `ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` but I do need to keep `-DCMAKE_INSTALL_PREFIX=$ARROW_HOME` for the Python build to work (is this right?), and then `LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}` to locate libarrow.so. Now, I get an error with `import pyarrow` because of some undefined symbol: ``` ImportError: /home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/pyarrow-0.8.1.dev116+g7bf7b2e9-py2.7-linux-x86_64.egg/pyarrow/lib.so: undefined symbol: _ZN5arrow9timestampENS_8TimeUnit4typeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE``` I don't think this is from an arrow library, but I can't tell what from, any ideas? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363173#comment-16363173 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365430455 thanks @wesm , that seemed to do the trick by adding it to the cmake command line (it didn't work as an env var) `cmake -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" -DARROW_PYTHON=on -DARROW_HDFS=on -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363099#comment-16363099 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365414411 Looks like that flag needs to be passed through to the ExternalProject declarations. Try using `CMAKE_CXX_FLAGS` instead of `ARROW_CXXFLAGS`? If this doesn't work I can dig in later today This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362957#comment-16362957 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365384766 I still get the linking error after setting `"ARROW_CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0"` ``` [ 78%] Linking CXX executable ../../../release/json-integration-test CMakeFiles/json-integration-test.dir/json-integration-test.cc.o: In function `_GLOBAL__sub_I__ZN5arrow4test25MakeRandomInt32PoolBufferElPNS_10MemoryPoolEPSt10shared_ptrINS_10PoolBufferEEj': json-integration-test.cc:(.text.startup+0x1de): undefined reference to `google::FlagRegisterer::FlagRegisterer, std::allocator > >(char const*, char const*, char const*, std::__cxx11::basic_string, std::allocator >*, std::__cxx11::basic_string, std::allocator >*)' json-integration-test.cc:(.text.startup+0x296): undefined reference to `google::FlagRegisterer::FlagRegisterer, std::allocator > >(char const*, char const*, char const*, std::__cxx11::basic_string, std::allocator >*, std::__cxx11::basic_string, std::allocator >*)' json-integration-test.cc:(.text.startup+0x34e): undefined reference to `google::FlagRegisterer::FlagRegisterer, std::allocator > >(char const*, char const*, char const*, std::__cxx11::basic_string, std::allocator >*, std::__cxx11::basic_string, std::allocator >*)' collect2: error: ld returned 1 exit status ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362935#comment-16362935 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-365380849 Thanks @xhochy , I'll try all this out! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359572#comment-16359572 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#discussion_r167406938 ## File path: dev/spark_integration/spark_integration.sh ## @@ -0,0 +1,107 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Set up environment and working directory Review comment: Call `set -e` here, then a command failure in the script will lead to a failure of the whole script. Then you can get rid of the `if [[ $? -ne 0 ]]; then` blocks later on This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359570#comment-16359570 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#discussion_r167406979 ## File path: dev/spark_integration/spark_integration.sh ## @@ -0,0 +1,107 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Set up environment and working directory +cd /apache-arrow + +# Activate our pyarrow-dev conda env +source activate pyarrow-dev + +export ARROW_BUILD_TYPE=Release +export ARROW_HOME=$(pwd)/arrow +#export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX +export BOOST_ROOT=$CONDA_PREFIX +CONDA_BASE=/home/ubuntu/miniconda +export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH} +export PYTHONPATH=${ARROW_HOME}/python:${PYTHONPATH} +export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m" + +# Build Arrow C++ +pushd arrow/cpp +rm -rf build/* +mkdir -p build +cd build/ +cmake -DARROW_PYTHON=on -DARROW_HDFS=on -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME .. +make -j4 +if [[ $? -ne 0 ]]; then +exit 1 +fi +make install +popd + +# Build pyarrow and install inplace +pushd arrow/python +python setup.py clean +python setup.py build_ext --build-type=release --inplace Review comment: Don't use `--inplace` but rather run `python setup.py build_ext --build-type=release install` to install the extension into the environment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359571#comment-16359571 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#discussion_r167407014 ## File path: dev/spark_integration/spark_integration.sh ## @@ -0,0 +1,107 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Set up environment and working directory +cd /apache-arrow + +# Activate our pyarrow-dev conda env +source activate pyarrow-dev + +export ARROW_BUILD_TYPE=Release +export ARROW_HOME=$(pwd)/arrow +#export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX +export BOOST_ROOT=$CONDA_PREFIX +CONDA_BASE=/home/ubuntu/miniconda +export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH} +export PYTHONPATH=${ARROW_HOME}/python:${PYTHONPATH} +export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m" + +# Build Arrow C++ +pushd arrow/cpp +rm -rf build/* +mkdir -p build +cd build/ +cmake -DARROW_PYTHON=on -DARROW_HDFS=on -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME .. +make -j4 +if [[ $? -ne 0 ]]; then +exit 1 +fi +make install +popd + +# Build pyarrow and install inplace +pushd arrow/python +python setup.py clean +python setup.py build_ext --build-type=release --inplace +if [[ $? -ne 0 ]]; then +exit 1 +fi +popd + +# Install Arrow to local maven repo and get the version +pushd arrow/java +echo "Building and installing Arrow Java" +mvn -DskipTests -Drat.skip=true clean install +ARROW_VERSION=`mvn org.apache.maven.plugins:maven-help-plugin:2.1.1:evaluate -Dexpression=project.version | sed -n -e '/^\[.*\]/ !{ /^[0-9]/ { p; q } }'` +echo "Using Arrow version $ARROW_VERSION" +popd + +# Build Spark with Arrow +SPARK_REPO=git://git.apache.org/spark.git +SPARK_BRANCH=master + +# Get the Spark repo if not in image already +if [ ! -d "$(pwd)/spark" ]; then +export GIT_COMMITTER_NAME="Nobody" +export GIT_COMMITTER_EMAIL="nob...@nowhere.com" +git clone "$SPARK_REPO" +fi + +pushd spark + +# Make sure branch has no modifications +git checkout "$SPARK_BRANCH" +git reset --hard HEAD + +# Update Spark pom with the Arrow version just installed and build Spark, need package phase for pyspark +sed -i -e "s/\(.*\).*\(<\/arrow.version>\)/\1$ARROW_VERSION\2/g" ./pom.xml +echo "Building Spark with Arrow $ARROW_VERSION" +build/mvn -DskipTests clean package + +# Run Arrow related Scala tests only, NOTE: -Dtest=_NonExist_ is to enable surefire test discovery without running any tests so that Scalatest can run +SPARK_SCALA_TESTS="org.apache.spark.sql.execution.arrow,org.apache.spark.sql.execution.vectorized.ColumnarBatchSuite,org.apache.spark.sql.execution.vectorized.ArrowColumnVectorSuite" +echo "Testing Spark: $SPARK_SCALA_TESTS" +# TODO: should be able to only build spark-sql tests with adding "-pl sql/core" but not currently working +build/mvn -Dtest=none -DwildcardSuites="$SPARK_SCALA_TESTS" test +if [[ $? -ne 0 ]]; then +exit 1 +fi + +# Run pyarrow related Python tests only +SPARK_PYTHON_TESTS="ArrowTests PandasUDFTests ScalarPandasUDFTests GroupedMapPandasUDFTests GroupedAggPandasUDFTests" +echo "Testing PySpark: $SPARK_PYTHON_TESTS" +SPARK_TESTING=1 bin/pyspark pyspark.sql.tests $SPARK_PYTHON_TESTS +if [[ $? -ne 0 ]]; then +exit 1 +fi +popd + +# Clean up +echo "Cleaning up.." Review comment: No need for these two line, at the end the environment is thrown away anyways This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 >
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359573#comment-16359573 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#discussion_r167407030 ## File path: python/pyarrow/__init__.py ## @@ -24,7 +24,7 @@ # package is not installed try: import setuptools_scm -__version__ = setuptools_scm.get_version('../') +__version__ = setuptools_scm.get_version(root='../../', relative_to=__file__) Review comment: See the suggestion above to get rid of this line. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359016#comment-16359016 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-364576706 We'll need to add some flags to `ARROW_CXXFLAGS` to disable the gcc5 ABI. I can try to take a look in a bit or this weekend This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357819#comment-16357819 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-364309234 @xhochy , I could not get Arrow C++ to build with `export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX`, I would get a linking error with gflags like "undefined reference google::FlagRegisterer::FlagRegisterer". I thought maybe it was because I wasn't using g++ 4.9, but I had no luck trying to get 4.9 installed since the base image I'm using is Ubuntu 16.04. Have you ever run into this? It seemed like it was some kind of template constructor that it couldn't find.. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357810#comment-16357810 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#discussion_r167120268 ## File path: python/pyarrow/__init__.py ## @@ -24,7 +24,7 @@ # package is not installed try: import setuptools_scm -__version__ = setuptools_scm.get_version('../') +__version__ = setuptools_scm.get_version(root='../../', relative_to=__file__) Review comment: @xhochy and @wesm , I needed to change this because it would only give a version if run under ARROW_HOME/python directory. So when running Spark tests, on importing pyarrow it would return `None`. Making it relative to the `__file__` seemed to fix it for all cases. I can make this a separate JIRA if you think that would be better. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357808#comment-16357808 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-364306888 Ok, I finally got this to build all and pass all tests! There are still a couple of issues to work out though, I'll discuss below.. Btw, to get the correct `pyarrow.__version__` from the dev env, you do need to have all git tags fetched and install `setuptools_scm` from pip or conda. @xhochy , `setuptools_scm` wasn't listed in any of the developer docs I could find, should it be added to the list of dependent packages for setting up a conda env? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357806#comment-16357806 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-364306888 Ok, I finally got this to build all and pass all tests! There are still a couple of issues to work out though, I'll discuss below.. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357805#comment-16357805 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-344361239 This is currently a WIP, the Scala/Java tests are able to run Left TODO: - [x] Run PySpark Tests - [ ] Verify working with docker-compose and existing volumes in arrow/dev - [x] Check why Zinc is unable to run in mvn build, need to enable port 3030? - [ ] Speed up pyarrow build using conda prefix as toolchain This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351792#comment-16351792 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-362908291 @BryanCutler The main change I did was to move `source activate pyarrow-dev` to the beginning of the file, set `export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` and removed `LD_LIBRARY_PATH` and `PYTHONPATH` from the file. This should pick up the conda environment and speed up Arrow builds a lot. That should actually not affect maven builds. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345909#comment-16345909 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-361753796 @BryanCutler I think all committers on GitBox have write access to your branch, so we can push directly This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345879#comment-16345879 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-361747906 @xhochy any pointers on how to get a correct pyarrow version? If you are able to do a PR to this branch for the errors you found, that would be great - I can make you a collaborator if that would be easier. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345872#comment-16345872 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#discussion_r164890683 ## File path: dev/spark_integration/spark_integration.sh ## @@ -0,0 +1,106 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Set up environment and working directory +cd /apache-arrow + +export ARROW_BUILD_TYPE=release +export ARROW_HOME=$(pwd)/arrow +CONDA_BASE=/home/ubuntu/miniconda +export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH} +export PYTHONPATH=${ARROW_HOME}/python:${PYTHONPATH} +export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m" + +# Activate our pyarrow-dev conda env +source activate pyarrow-dev + +# Build arrow-cpp and install +pushd arrow/cpp +rm -rf build/* +mkdir -p build +cd build/ +cmake -DARROW_PYTHON=on -DARROW_HDFS=on -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME .. +make -j4 +if [[ $? -ne 0 ]]; then +exit 1 +fi +make install +popd + +# Build pyarrow and install inplace +pushd arrow/python +python setup.py clean +python setup.py build_ext --build-type=release --inplace +if [[ $? -ne 0 ]]; then +exit 1 +fi +popd + +# Install Arrow to local maven repo and get the version +pushd arrow/java +echo "Building and installing Arrow Java" +mvn -DskipTests -Drat.skip=true clean install +ARROW_VERSION=`mvn org.apache.maven.plugins:maven-help-plugin:2.1.1:evaluate -Dexpression=project.version | sed -n -e '/^\[.*\]/ !{ /^[0-9]/ { p; q } }'` +echo "Using Arrow version $ARROW_VERSION" +popd + +# Build Spark with Arrow +SPARK_REPO=https://github.com/apache/spark.git Review comment: Thanks @felixcheung ! Yeah, it would probably be best to use Apache, I'll change it This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344616#comment-16344616 ] ASF GitHub Bot commented on ARROW-1579: --- felixcheung commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#discussion_r164658521 ## File path: dev/spark_integration/spark_integration.sh ## @@ -0,0 +1,106 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# Set up environment and working directory +cd /apache-arrow + +export ARROW_BUILD_TYPE=release +export ARROW_HOME=$(pwd)/arrow +CONDA_BASE=/home/ubuntu/miniconda +export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH} +export PYTHONPATH=${ARROW_HOME}/python:${PYTHONPATH} +export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m" + +# Activate our pyarrow-dev conda env +source activate pyarrow-dev + +# Build arrow-cpp and install +pushd arrow/cpp +rm -rf build/* +mkdir -p build +cd build/ +cmake -DARROW_PYTHON=on -DARROW_HDFS=on -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME .. +make -j4 +if [[ $? -ne 0 ]]; then +exit 1 +fi +make install +popd + +# Build pyarrow and install inplace +pushd arrow/python +python setup.py clean +python setup.py build_ext --build-type=release --inplace +if [[ $? -ne 0 ]]; then +exit 1 +fi +popd + +# Install Arrow to local maven repo and get the version +pushd arrow/java +echo "Building and installing Arrow Java" +mvn -DskipTests -Drat.skip=true clean install +ARROW_VERSION=`mvn org.apache.maven.plugins:maven-help-plugin:2.1.1:evaluate -Dexpression=project.version | sed -n -e '/^\[.*\]/ !{ /^[0-9]/ { p; q } }'` +echo "Using Arrow version $ARROW_VERSION" +popd + +# Build Spark with Arrow +SPARK_REPO=https://github.com/apache/spark.git Review comment: I think we should pull from apache git? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344175#comment-16344175 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-361420278 Not my area of expertise (@xhochy or @cpcloud would know better) -- usually that relates to not having fetched all the tags This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344066#comment-16344066 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-361396699 from `pyarrow.__init__.py` it looks like maybe I should install setuptools_scm but then I get the following version: ``` >>> pa.__version__ '0.1.1.dev1231+g0a49022' ``` I would expect something like '0.8.***', where is it getting the above number from? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344065#comment-16344065 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-361396699 from `pyarrow.__init__.py` it looks like maybe I should install setuptools_scm but then I get the following version: ``` >>> pa.__version__ '0.1.1.dev1231+g0a49022' ``` I would expect something like '0.8.***' This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344041#comment-16344041 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-361390175 Looks like the errors I'm getting is because `pa.__version__` return `None` ``` >>> import pyarrow as pa >>> from distutils.version import LooseVersion >>> LooseVersion(pa.__version__) < LooseVersion('0.8.0') Traceback (most recent call last): File "", line 1, in File "/home/bryan/miniconda2/envs/pyarrow-dev/lib/python2.7/distutils/version.py", line 296, in __cmp__ return cmp(self.version, other.version) AttributeError: LooseVersion instance has no attribute 'version' >>> pa.__version__ >>> ``` Is there something in the build I need to set to get a valid version? cc @wesm This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343902#comment-16343902 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-361364206 @xhochy that is strange, until I switched the image to be based on `FROM maven:3.5.2-jdk-8-slim` I was seeing similar problems with paths containing `?` but since then it's been working fine and installs under root `[INFO] Installing /apache-arrow/arrow/java/tools/target/arrow-tools-0.9.0-SNAPSHOT-jar-with-dependencies.jar to /root/.m2/repository/org/apache/arrow/arrow-tools/0.9.0-SNAPSHOT/arrow-tools-0.9.0-SNAPSHOT-jar-with-dependencies.jar` I did follow this docker post-installation step, maybe that has something to do with it? https://docs.docker.com/install/linux/linux-postinstall/#manage-docker-as-a-non-root-user I've also been running with my local .m2 repo mounted in the container with `docker run -v /home/bryan/.m2:/root/.m2 spark-arrow` to save from always downloading dependencies, but I tried skipping that still don't get any errors. Besides the strange path, what is the exact error that it gives you for `mvn install`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342660#comment-16342660 ] ASF GitHub Bot commented on ARROW-1579: --- xhochy commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-361082017 @BryanCutler I had a look into the build and found some errors with the Python/C++ scripts (will do a PR for that) but I also cannot get `mvn install` to work correctly. Any ideas why it installs to this location? ``` [INFO] Installing /apache-arrow/arrow/java/tools/target/arrow-tools-0.9.0-SNAPSHOT-jar-with-dependencies.jar to /apache-arrow/arrow/java/?/.m2/repository/org/apache/arrow/arrow-tools/0.9.0-SNAPSHOT/arrow-tools-0.9.0-SNAPSHOT-jar-with-dependencies.jar ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340424#comment-16340424 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-360658676 This is pretty much running for me now, but I'm getting a strange test failure when running pyspark tests ``` == FAIL: test_unsupported_datatype (pyspark.sql.tests.ArrowTests) -- AttributeError: 'LooseVersion' object has no attribute 'version' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/apache-arrow/spark/python/pyspark/sql/tests.py", line 3404, in test_unsupported_datatype df.toPandas() AssertionError: "Unsupported data type" does not match "'LooseVersion' object has no attribute 'version'" ``` Not sure why because the tests pass normally, maybe something with the conda setup? I'll try some more later.. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340422#comment-16340422 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-344361239 This is currently a WIP, the Scala/Java tests are able to run Left TODO: - [x] Run PySpark Tests - [ ] Verify working with docker-compose and existing volumes in arrow/dev - [x] Check why Zinc is unable to run in mvn build, need to enable port 3030? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330878#comment-16330878 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-358730516 I think that's fine. The failing nightly integration test will help with creating some urgency to get the downstream project fixed This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330851#comment-16330851 ] ASF GitHub Bot commented on ARROW-1579: --- BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-358726711 I'll try to pick this up again in the next few days @wesm. One question though, when we do an API breaking change, like rename MapVector, then this will fail until Spark is patched. Do we need to keep a branch with these kind of patches until they can be applied upstream? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration
[ https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330768#comment-16330768 ] ASF GitHub Bot commented on ARROW-1579: --- wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized Spark Integration tests URL: https://github.com/apache/arrow/pull/1319#issuecomment-358709337 @BryanCutler or @icexelloss could we get this set up? I'd like to look into setting up nightly integration test runs on a server someplace (we have a shared machine that @cpcloud and I have been using for nightly builds, and the pandas team users for benchmarks) so we can get warned immediately if something gets broken This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [Java] Add dockerized test setup to validate Spark integration > -- > > Key: ARROW-1579 > URL: https://issues.apache.org/jira/browse/ARROW-1579 > Project: Apache Arrow > Issue Type: Improvement > Components: Java - Vectors >Reporter: Wes McKinney >Assignee: Bryan Cutler >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > > cc [~bryanc] -- the goal of this will be to validate master-to-master to > catch any regressions in the Spark integration -- This message was sent by Atlassian JIRA (v7.6.3#76005)