subject:"\[jira\] \[Commented\] \(ARROW\-1579\) \[Java\] Add dockerized test setup to validate Spark integration"

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-19 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369463#comment-16369463
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366789122
 
 
   Thanks @wesm @xhochy and @felixcheung !  Since it can sometimes take a while 
to get Spark updated, ff we get to the point were this is ready to be put in 
the nightly builds, maybe I could submit a PR to patch Spark and we could 
configure the docker build to point to that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368275#comment-16368275
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm closed pull request #1319: ARROW-1579: [Java] Adding containerized Spark 
Integration tests
URL: https://github.com/apache/arrow/pull/1319
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/dev/docker-compose.yml b/dev/docker-compose.yml
index a73fd1bfb..b1e593cf4 100644
--- a/dev/docker-compose.yml
+++ b/dev/docker-compose.yml
@@ -33,3 +33,8 @@ services:
   context: dask_integration
 volumes:
  - ../..:/apache-arrow
+  spark_integration:
+build: 
+  context: spark_integration
+volumes:
+ - ../..:/apache-arrow
diff --git a/dev/spark_integration/Dockerfile b/dev/spark_integration/Dockerfile
new file mode 100644
index 0..d1b3cf89f
--- /dev/null
+++ b/dev/spark_integration/Dockerfile
@@ -0,0 +1,70 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+FROM maven:3.5.2-jdk-8-slim
+
+# Basic OS utilities
+RUN apt-get update && apt-get install -y \
+wget \
+git build-essential \
+software-properties-common
+
+# This will install conda in /home/ubuntu/miniconda
+RUN wget -O /tmp/miniconda.sh \
+https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
+bash /tmp/miniconda.sh -b -p /home/ubuntu/miniconda && \
+rm /tmp/miniconda.sh
+
+# Python dependencies
+RUN apt-get install -y \
+pkg-config
+
+# Create Conda environment
+ENV PATH="/home/ubuntu/miniconda/bin:${PATH}"
+RUN conda create -y -q -n pyarrow-dev \
+# Python
+python=2.7 \
+numpy \
+pandas \
+pytest \
+cython \
+ipython \
+matplotlib \
+six \
+setuptools \
+setuptools_scm \
+# C++
+boost-cpp \
+cmake \
+flatbuffers \
+rapidjson \
+thrift-cpp \
+snappy \
+zlib \
+gflags \
+brotli \
+jemalloc \
+lz4-c \
+zstd \
+-c conda-forge
+
+ADD . /apache-arrow
+WORKDIR /apache-arrow
+
+CMD arrow/dev/spark_integration/spark_integration.sh
+
+# BUILD: $ docker build -f arrow/dev/spark_integration/Dockerfile -t 
spark-arrow .
+# RUN:   $ docker run -v $HOME/.m2:/root/.m2 spark-arrow
diff --git a/dev/spark_integration/spark_integration.sh 
b/dev/spark_integration/spark_integration.sh
new file mode 100755
index 0..8ca4dc3ac
--- /dev/null
+++ b/dev/spark_integration/spark_integration.sh
@@ -0,0 +1,92 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Exit on any error
+set -e
+
+# Set up environment and working directory
+cd /apache-arrow
+
+# Activate our pyarrow-dev conda env
+source activate pyarrow-dev
+
+export ARROW_HOME=$(pwd)/arrow
+export ARROW_BUILD_TYPE=release
+export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX
+export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${LD_LIBRARY_PATH}
+export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
+
+# Build Arrow C++
+pushd arrow/cpp
+rm -rf build/*
+mkdir -p build
+cd build/
+cmake -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" -DARROW_PYTHON=on 
-DA

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-17 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16368273#comment-16368273
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark 
Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366453717
 
 
   Works for me. Merging now


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367938#comment-16367938
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark 
Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366370963
 
 
   We definitely need an upstream patch for Spark but I'm happy with merging 
this PR as-is as it shows that we can now test Spark with it. @wesm is this ok 
for you?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367767#comment-16367767
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366333654
 
 
   That error is from #1490 being merged yesterday - what I was talking about 
in https://github.com/apache/arrow/pull/1319#issuecomment-358726711 but I was 
hoping to at least get this running for a little while first before breaking 
Spark!
   
   I guess we could either rebase this just before #1490 to ensure it works or 
I can provide a patch for Spark to update it for that breaking change in Java?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367740#comment-16367740
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark 
Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366328470
 
 
   Sadly I get the following error on master: 
   
   ```
   [warn] Class org.apache.avro.reflect.Stringable not found - continuing with 
a stub.
   [error] 
/apache-arrow/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala:65:
 not found: type NullableMapVector
   [error]   case (StructType(_), vector: NullableMapVector) =>
   [error]^
   [error] 
/apache-arrow/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala:66:
 value size is not a member of org.apache.arrow.vector.ValueVector
   [error] val children = (0 until vector.size()).map { ordinal =>
   [error]^
   [error] 
/apache-arrow/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala:67:
 value getChildByOrdinal is not a member of org.apache.arrow.vector.ValueVector
   [error]   createFieldWriter(vector.getChildByOrdinal(ordinal))
   [error]^
   [error] 
/apache-arrow/spark/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowWriter.scala:318:
 not found: type NullableMapVector
   [error] val valueVector: NullableMapVector,
   [error]  ^
   [warn] Class org.apache.avro.reflect.Stringable not found - continuing with 
a stub.
   [warn] two warnings found
   [error] four errors found
   [error] Compile failed at Feb 16, 2018 6:49:56 PM [50.120s]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367706#comment-16367706
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366321795
 
 
   I probably should have rebased here earlier, if anyone else is going to try 
this out let me know and I can do that first.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367630#comment-16367630
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark 
Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366305004
 
 
   As this is part of a docker-compose script, this should be run using 
`docker-compose build && docker-compose run spark_integration`. Then it will 
use the options declared in the `docker-compose.yml` (mostly the mount of the 
current working dir)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-16 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367631#comment-16367631
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark 
Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366305165
 
 
   To get past the build issues I had to rebase locally. Once the script has 
run through I will report back.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366568#comment-16366568
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366134692
 
 
   yeah, we should have it better documented.  I copied how the api docs script 
was done and it's run with docker compose, but I'm not quite sure how to run it 
with through that.  I just used the regular `docker run` cmd


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366564#comment-16366564
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366134424
 
 
   Thanks @wesm , I think that is because the brotli libraries were recently 
updated and #1554 should fix that.  Could you check if you have the latest 
master checked out?  The docker script just takes what is in your current dir - 
not sure if that is what we want or not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366535#comment-16366535
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark 
Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366126289
 
 
   Build failed for me with 
   
   ```
   [ 52%] Building CXX object 
src/arrow/CMakeFiles/arrow_objlib.dir/ipc/writer.cc.o
   In file included from /apache-arrow/arrow/cpp/src/arrow/buffer.h:29:0,
from /apache-arrow/arrow/cpp/src/arrow/array.h:27,
from 
/apache-arrow/arrow/cpp/src/arrow/ipc/json-internal.cc:28:
   /apache-arrow/arrow/cpp/src/arrow/status.h: In function ‘arrow::Status 
arrow::ipc::internal::json::GetField(const Value&, const 
arrow::ipc::DictionaryMemo*, std::shared_ptr*)’:
   /apache-arrow/arrow/cpp/src/arrow/status.h:68:19: warning: ‘dictionary_id’ 
may be used uninitialized in this function [-Wmaybe-uninitialized]
Status _s = (s); \
  ^
   /apache-arrow/arrow/cpp/src/arrow/ipc/json-internal.cc:869:13: note: 
‘dictionary_id’ was declared here
int64_t dictionary_id;
^
   In file included from /apache-arrow/arrow/cpp/src/arrow/buffer.h:29:0,
from /apache-arrow/arrow/cpp/src/arrow/array.h:27,
from 
/apache-arrow/arrow/cpp/src/arrow/ipc/json-internal.cc:28:
   /apache-arrow/arrow/cpp/src/arrow/status.h: In function ‘arrow::Status 
arrow::ipc::internal::json::ReadSchema(const Value&, arrow::MemoryPool*, 
std::shared_ptr*)’:
   /apache-arrow/arrow/cpp/src/arrow/status.h:68:19: warning: ‘dictionary_id’ 
may be used uninitialized in this function [-Wmaybe-uninitialized]
Status _s = (s); \
  ^
   /apache-arrow/arrow/cpp/src/arrow/ipc/json-internal.cc:1349:13: note: 
‘dictionary_id’ was declared here
int64_t dictionary_id;
^
   [ 52%] Built target arrow_objlib
   Scanning dependencies of target arrow_static
   Scanning dependencies of target arrow_shared
   make[2]: *** No rule to make target 
'/home/ubuntu/miniconda/envs/pyarrow-dev/lib/libbrotlidec.a', needed by 
'release/libarrow.so.0.0.0'.  Stop.
   CMakeFiles/Makefile2:693: recipe for target 
'src/arrow/CMakeFiles/arrow_shared.dir/all' failed
   make[1]: *** [src/arrow/CMakeFiles/arrow_shared.dir/all] Error 2
   make[1]: *** Waiting for unfinished jobs
   [ 52%] Linking CXX static library ../../release/libarrow.a
   [ 52%] Built target arrow_static
   Makefile:140: recipe for target 'all' failed
   make: *** [all] Error 2
   ```
   
   Haven't dug too far into what's wrong. Can we document how to run this 
someplace other than the Dockerfile?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366536#comment-16366536
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark 
Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-366126289
 
 
   Build failed for me with 
   
   ```
   Scanning dependencies of target arrow_static
   Scanning dependencies of target arrow_shared
   make[2]: *** No rule to make target 
'/home/ubuntu/miniconda/envs/pyarrow-dev/lib/libbrotlidec.a', needed by 
'release/libarrow.so.0.0.0'.  Stop.
   CMakeFiles/Makefile2:693: recipe for target 
'src/arrow/CMakeFiles/arrow_shared.dir/all' failed
   make[1]: *** [src/arrow/CMakeFiles/arrow_shared.dir/all] Error 2
   make[1]: *** Waiting for unfinished jobs
   [ 52%] Linking CXX static library ../../release/libarrow.a
   [ 52%] Built target arrow_static
   Makefile:140: recipe for target 'all' failed
   make: *** [all] Error 2
   ```
   
   Haven't dug too far into what's wrong. Can we document how to run this 
someplace other than the Dockerfile?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-15 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365898#comment-16365898
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: ARROW-1579: [Java] Adding containerized Spark 
Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365988801
 
 
   Sweet, I can take this for a spin later today, or if @cpcloud wants to look 
that also works


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364624#comment-16364624
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365710909
 
 
   This is working using ARROW_BUILD_TOOLCHAIN set to the conda evn, and I 
think this could be merged now.  Is somebody else able to verify that it is 
working before merging?  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-14 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16364585#comment-16364585
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-344361239
 
 
   This is currently a WIP, the Scala/Java tests are able to run
   
   Left TODO:
   
   - [x] Run PySpark Tests
   - [ ] Verify working with docker-compose and existing volumes in arrow/dev
   - [x] Check why Zinc is unable to run in mvn build, need to enable port 3030?
   - [x] Speed up pyarrow build using conda prefix as toolchain


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363349#comment-16363349
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365466851
 
 
   Now you need to set the `PYARROW_CXXFLAGS` environment variable with the 
gcc5 abi flag


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363297#comment-16363297
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365456994
 
 
   Well, I'm able to run the build with `ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` 
but I do need to keep `-DCMAKE_INSTALL_PREFIX=$ARROW_HOME` for the Python build 
to work (is this right?), and then 
`LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}` to 
locate libarrow.so.
   
   Now, I get an error with `import pyarrow` because of some undefined symbol:
   `ImportError: 
/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/pyarrow-0.8.1.dev116+g7bf7b2e9-py2.7-linux-x86_64.egg/pyarrow/lib.so:
 undefined symbol: 
_ZN5arrow9timestampENS_8TimeUnit4typeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE`
   
   I don't think this is from an arrow library, but I can't tell where it's 
coming from, any ideas?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363294#comment-16363294
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365456994
 
 
   Well, I'm able to run the build with `ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` 
but I do need to keep `-DCMAKE_INSTALL_PREFIX=$ARROW_HOME` for the Python build 
to work (is this right?), and then 
`LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}` to 
locate libarrow.so.
   
   Now, I get an error with `import pyarrow` because of some undefined symbol:
   `ImportError: 
/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/pyarrow-0.8.1.dev116+g7bf7b2e9-py2.7-linux-x86_64.egg/pyarrow/lib.so:
 undefined symbol: 
_ZN5arrow9timestampENS_8TimeUnit4typeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE`
   
   I don't think this is from an arrow library, but I can't tell what from, any 
ideas?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363293#comment-16363293
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365456994
 
 
   Well, I'm able to run the build with `ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` 
but I do need to keep `-DCMAKE_INSTALL_PREFIX=$ARROW_HOME` for the Python build 
to work (is this right?), and then 
`LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}` to 
locate libarrow.so.
   
   Now, I get an error with `import pyarrow` because of some undefined symbol:
   ```
   ImportError: 
/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/pyarrow-0.8.1.dev116+g7bf7b2e9-py2.7-linux-x86_64.egg/pyarrow/lib.so:
 undefined symbol: 
_ZN5arrow9timestampENS_8TimeUnit4typeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
   ```
   
   I don't think this is from an arrow library, but I can't tell what from, any 
ideas?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363292#comment-16363292
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365456994
 
 
   Well, I'm able to run the build with `ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` 
but I do need to keep `-DCMAKE_INSTALL_PREFIX=$ARROW_HOME` for the Python build 
to work (is this right?), and then 
`LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}` to 
locate libarrow.so.
   
   Now, I get an error with `import pyarrow` because of some undefined symbol:
   ```
   ImportError: 
/home/ubuntu/miniconda/envs/pyarrow-dev/lib/python2.7/site-packages/pyarrow-0.8.1.dev116+g7bf7b2e9-py2.7-linux-x86_64.egg/pyarrow/lib.so:
 undefined symbol: 
_ZN5arrow9timestampENS_8TimeUnit4typeERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE```
   
   I don't think this is from an arrow library, but I can't tell what from, any 
ideas?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363173#comment-16363173
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365430455
 
 
   thanks @wesm , that seemed to do the trick by adding it to the cmake command 
line (it didn't work as an env var)
   `cmake -DCMAKE_CXX_FLAGS="-D_GLIBCXX_USE_CXX11_ABI=0" -DARROW_PYTHON=on 
-DARROW_HDFS=on -DCMAKE_BUILD_TYPE=release -DCMAKE_INSTALL_PREFIX=$ARROW_HOME 
..`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363099#comment-16363099
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365414411
 
 
   Looks like that flag needs to be passed through to the ExternalProject 
declarations. Try using `CMAKE_CXX_FLAGS` instead of `ARROW_CXXFLAGS`? If this 
doesn't work I can dig in later today 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362957#comment-16362957
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365384766
 
 
   I still get the linking error after setting 
`"ARROW_CXXFLAGS=-D_GLIBCXX_USE_CXX11_ABI=0"`
   ```
   [ 78%] Linking CXX executable ../../../release/json-integration-test
   CMakeFiles/json-integration-test.dir/json-integration-test.cc.o: In function 
`_GLOBAL__sub_I__ZN5arrow4test25MakeRandomInt32PoolBufferElPNS_10MemoryPoolEPSt10shared_ptrINS_10PoolBufferEEj':
   json-integration-test.cc:(.text.startup+0x1de): undefined reference to 
`google::FlagRegisterer::FlagRegisterer, std::allocator > >(char const*, char const*, char 
const*, std::__cxx11::basic_string, 
std::allocator >*, std::__cxx11::basic_string, std::allocator >*)'
   json-integration-test.cc:(.text.startup+0x296): undefined reference to 
`google::FlagRegisterer::FlagRegisterer, std::allocator > >(char const*, char const*, char 
const*, std::__cxx11::basic_string, 
std::allocator >*, std::__cxx11::basic_string, std::allocator >*)'
   json-integration-test.cc:(.text.startup+0x34e): undefined reference to 
`google::FlagRegisterer::FlagRegisterer, std::allocator > >(char const*, char const*, char 
const*, std::__cxx11::basic_string, 
std::allocator >*, std::__cxx11::basic_string, std::allocator >*)'
   collect2: error: ld returned 1 exit status
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-13 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16362935#comment-16362935
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-365380849
 
 
   Thanks @xhochy , I'll try all this out!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359572#comment-16359572
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] 
Adding containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#discussion_r167406938
 
 

 ##
 File path: dev/spark_integration/spark_integration.sh
 ##
 @@ -0,0 +1,107 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Set up environment and working directory
 
 Review comment:
   Call `set -e` here, then a command failure in the script will lead to a 
failure of the whole script. Then you can get rid of the `if [[ $? -ne 0 ]]; 
then` blocks later on


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359570#comment-16359570
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] 
Adding containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#discussion_r167406979
 
 

 ##
 File path: dev/spark_integration/spark_integration.sh
 ##
 @@ -0,0 +1,107 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Set up environment and working directory
+cd /apache-arrow
+
+# Activate our pyarrow-dev conda env
+source activate pyarrow-dev
+
+export ARROW_BUILD_TYPE=Release
+export ARROW_HOME=$(pwd)/arrow
+#export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX
+export BOOST_ROOT=$CONDA_PREFIX
+CONDA_BASE=/home/ubuntu/miniconda
+export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}
+export PYTHONPATH=${ARROW_HOME}/python:${PYTHONPATH}
+export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
+
+# Build Arrow C++
+pushd arrow/cpp
+rm -rf build/*
+mkdir -p build
+cd build/
+cmake -DARROW_PYTHON=on -DARROW_HDFS=on -DCMAKE_BUILD_TYPE=release 
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..
+make -j4
+if [[ $? -ne 0 ]]; then
+exit 1
+fi
+make install
+popd
+
+# Build pyarrow and install inplace
+pushd arrow/python
+python setup.py clean
+python setup.py build_ext --build-type=release --inplace
 
 Review comment:
   Don't use `--inplace` but rather run `python setup.py build_ext 
--build-type=release install` to install the extension into the environment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359571#comment-16359571
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] 
Adding containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#discussion_r167407014
 
 

 ##
 File path: dev/spark_integration/spark_integration.sh
 ##
 @@ -0,0 +1,107 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Set up environment and working directory
+cd /apache-arrow
+
+# Activate our pyarrow-dev conda env
+source activate pyarrow-dev
+
+export ARROW_BUILD_TYPE=Release
+export ARROW_HOME=$(pwd)/arrow
+#export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX
+export BOOST_ROOT=$CONDA_PREFIX
+CONDA_BASE=/home/ubuntu/miniconda
+export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}
+export PYTHONPATH=${ARROW_HOME}/python:${PYTHONPATH}
+export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
+
+# Build Arrow C++
+pushd arrow/cpp
+rm -rf build/*
+mkdir -p build
+cd build/
+cmake -DARROW_PYTHON=on -DARROW_HDFS=on -DCMAKE_BUILD_TYPE=release 
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..
+make -j4
+if [[ $? -ne 0 ]]; then
+exit 1
+fi
+make install
+popd
+
+# Build pyarrow and install inplace
+pushd arrow/python
+python setup.py clean
+python setup.py build_ext --build-type=release --inplace
+if [[ $? -ne 0 ]]; then
+exit 1
+fi
+popd
+
+# Install Arrow to local maven repo and get the version
+pushd arrow/java
+echo "Building and installing Arrow Java"
+mvn -DskipTests -Drat.skip=true clean install
+ARROW_VERSION=`mvn org.apache.maven.plugins:maven-help-plugin:2.1.1:evaluate 
-Dexpression=project.version | sed -n -e '/^\[.*\]/ !{ /^[0-9]/ { p; q } }'`
+echo "Using Arrow version $ARROW_VERSION"
+popd
+
+# Build Spark with Arrow
+SPARK_REPO=git://git.apache.org/spark.git
+SPARK_BRANCH=master
+
+# Get the Spark repo if not in image already
+if [ ! -d "$(pwd)/spark" ]; then
+export GIT_COMMITTER_NAME="Nobody"
+export GIT_COMMITTER_EMAIL="nob...@nowhere.com"
+git clone "$SPARK_REPO"
+fi
+
+pushd spark
+
+# Make sure branch has no modifications
+git checkout "$SPARK_BRANCH"
+git reset --hard HEAD
+
+# Update Spark pom with the Arrow version just installed and build Spark, need 
package phase for pyspark
+sed -i -e 
"s/\(.*\).*\(<\/arrow.version>\)/\1$ARROW_VERSION\2/g" ./pom.xml
+echo "Building Spark with Arrow $ARROW_VERSION"
+build/mvn -DskipTests clean package
+
+# Run Arrow related Scala tests only, NOTE: -Dtest=_NonExist_ is to enable 
surefire test discovery without running any tests so that Scalatest can run
+SPARK_SCALA_TESTS="org.apache.spark.sql.execution.arrow,org.apache.spark.sql.execution.vectorized.ColumnarBatchSuite,org.apache.spark.sql.execution.vectorized.ArrowColumnVectorSuite"
+echo "Testing Spark: $SPARK_SCALA_TESTS"
+# TODO: should be able to only build spark-sql tests with adding "-pl 
sql/core" but not currently working
+build/mvn -Dtest=none -DwildcardSuites="$SPARK_SCALA_TESTS" test
+if [[ $? -ne 0 ]]; then
+exit 1
+fi
+
+# Run pyarrow related Python tests only
+SPARK_PYTHON_TESTS="ArrowTests PandasUDFTests ScalarPandasUDFTests 
GroupedMapPandasUDFTests GroupedAggPandasUDFTests"
+echo "Testing PySpark: $SPARK_PYTHON_TESTS"
+SPARK_TESTING=1 bin/pyspark pyspark.sql.tests $SPARK_PYTHON_TESTS 
+if [[ $? -ne 0 ]]; then
+exit 1
+fi
+popd
+
+# Clean up
+echo "Cleaning up.."
 
 Review comment:
   No need for these two line, at the end the environment is thrown away anyways


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
>

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-10 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359573#comment-16359573
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on a change in pull request #1319: [WIP] ARROW-1579: [Java] 
Adding containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#discussion_r167407030
 
 

 ##
 File path: python/pyarrow/__init__.py
 ##
 @@ -24,7 +24,7 @@
# package is not installed
 try:
 import setuptools_scm
-__version__ = setuptools_scm.get_version('../')
+__version__ = setuptools_scm.get_version(root='../../', 
relative_to=__file__)
 
 Review comment:
   See the suggestion above to get rid of this line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16359016#comment-16359016
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-364576706
 
 
   We'll need to add some flags to `ARROW_CXXFLAGS` to disable the gcc5 ABI. I 
can try to take a look in a bit or this weekend


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357819#comment-16357819
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-364309234
 
 
   @xhochy , I could not get Arrow C++ to build with `export 
ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX`, I would get a linking error with gflags 
like "undefined reference google::FlagRegisterer::FlagRegisterer".  I thought 
maybe it was because I wasn't using g++ 4.9, but I had no luck trying to get 
4.9 installed since the base image I'm using is Ubuntu 16.04.  Have you ever 
run into this?  It seemed like it was some kind of template constructor that it 
couldn't find..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357810#comment-16357810
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on a change in pull request #1319: [WIP] ARROW-1579: 
[Java] Adding containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#discussion_r167120268
 
 

 ##
 File path: python/pyarrow/__init__.py
 ##
 @@ -24,7 +24,7 @@
# package is not installed
 try:
 import setuptools_scm
-__version__ = setuptools_scm.get_version('../')
+__version__ = setuptools_scm.get_version(root='../../', 
relative_to=__file__)
 
 Review comment:
   @xhochy and @wesm , I needed to change this because it would only give a 
version if run under ARROW_HOME/python directory.  So when running Spark tests, 
on importing pyarrow it would return `None`.  Making it relative to the 
`__file__` seemed to fix it for all cases.  I can make this a separate JIRA if 
you think that would be better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357808#comment-16357808
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-364306888
 
 
   Ok, I finally got this to build all and pass all tests!  There are still a 
couple of issues to work out though, I'll discuss below..
   
   Btw, to get the correct `pyarrow.__version__` from the dev env, you do need 
to have all git tags fetched and install `setuptools_scm` from pip or conda.  
@xhochy , `setuptools_scm` wasn't listed in any of the developer docs I could 
find, should it be added to the list of dependent packages for setting up a 
conda env?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357806#comment-16357806
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-364306888
 
 
   Ok, I finally got this to build all and pass all tests!  There are still a 
couple of issues to work out though, I'll discuss below..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-08 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16357805#comment-16357805
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-344361239
 
 
   This is currently a WIP, the Scala/Java tests are able to run
   
   Left TODO:
   
   - [x] Run PySpark Tests
   - [ ] Verify working with docker-compose and existing volumes in arrow/dev
   - [x] Check why Zinc is unable to run in mvn build, need to enable port 3030?
   - [ ] Speed up pyarrow build using conda prefix as toolchain


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-02-04 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16351792#comment-16351792
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-362908291
 
 
   @BryanCutler The main change I did was to move `source activate pyarrow-dev` 
to the beginning of the file, set `export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX` 
and removed `LD_LIBRARY_PATH` and `PYTHONPATH` from the file. This should pick 
up the conda environment and speed up Arrow builds a lot. That should actually 
not affect maven builds.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345909#comment-16345909
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-361753796
 
 
   @BryanCutler I think all committers on GitBox have write access to your 
branch, so we can push directly


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345879#comment-16345879
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-361747906
 
 
   @xhochy any pointers on how to get a correct pyarrow version?  If you are 
able to do a PR to this branch for the errors you found, that would be great - 
I can make you a collaborator if that would be easier.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-30 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16345872#comment-16345872
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on a change in pull request #1319: [WIP] ARROW-1579: 
[Java] Adding containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#discussion_r164890683
 
 

 ##
 File path: dev/spark_integration/spark_integration.sh
 ##
 @@ -0,0 +1,106 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Set up environment and working directory
+cd /apache-arrow
+
+export ARROW_BUILD_TYPE=release
+export ARROW_HOME=$(pwd)/arrow
+CONDA_BASE=/home/ubuntu/miniconda
+export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}
+export PYTHONPATH=${ARROW_HOME}/python:${PYTHONPATH}
+export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
+
+# Activate our pyarrow-dev conda env
+source activate pyarrow-dev
+
+# Build arrow-cpp and install
+pushd arrow/cpp
+rm -rf build/*
+mkdir -p build
+cd build/
+cmake -DARROW_PYTHON=on -DARROW_HDFS=on -DCMAKE_BUILD_TYPE=Release 
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..
+make -j4
+if [[ $? -ne 0 ]]; then
+exit 1
+fi
+make install
+popd
+
+# Build pyarrow and install inplace
+pushd arrow/python
+python setup.py clean
+python setup.py build_ext --build-type=release --inplace
+if [[ $? -ne 0 ]]; then
+exit 1
+fi
+popd
+
+# Install Arrow to local maven repo and get the version
+pushd arrow/java
+echo "Building and installing Arrow Java"
+mvn -DskipTests -Drat.skip=true clean install
+ARROW_VERSION=`mvn org.apache.maven.plugins:maven-help-plugin:2.1.1:evaluate 
-Dexpression=project.version | sed -n -e '/^\[.*\]/ !{ /^[0-9]/ { p; q } }'`
+echo "Using Arrow version $ARROW_VERSION"
+popd
+
+# Build Spark with Arrow
+SPARK_REPO=https://github.com/apache/spark.git
 
 Review comment:
   Thanks @felixcheung ! Yeah, it would probably be best to use Apache, I'll 
change it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344616#comment-16344616
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

felixcheung commented on a change in pull request #1319: [WIP] ARROW-1579: 
[Java] Adding containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#discussion_r164658521
 
 

 ##
 File path: dev/spark_integration/spark_integration.sh
 ##
 @@ -0,0 +1,106 @@
+#!/usr/bin/env bash
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Set up environment and working directory
+cd /apache-arrow
+
+export ARROW_BUILD_TYPE=release
+export ARROW_HOME=$(pwd)/arrow
+CONDA_BASE=/home/ubuntu/miniconda
+export LD_LIBRARY_PATH=${ARROW_HOME}/lib:${CONDA_BASE}/lib:${LD_LIBRARY_PATH}
+export PYTHONPATH=${ARROW_HOME}/python:${PYTHONPATH}
+export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
+
+# Activate our pyarrow-dev conda env
+source activate pyarrow-dev
+
+# Build arrow-cpp and install
+pushd arrow/cpp
+rm -rf build/*
+mkdir -p build
+cd build/
+cmake -DARROW_PYTHON=on -DARROW_HDFS=on -DCMAKE_BUILD_TYPE=Release 
-DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..
+make -j4
+if [[ $? -ne 0 ]]; then
+exit 1
+fi
+make install
+popd
+
+# Build pyarrow and install inplace
+pushd arrow/python
+python setup.py clean
+python setup.py build_ext --build-type=release --inplace
+if [[ $? -ne 0 ]]; then
+exit 1
+fi
+popd
+
+# Install Arrow to local maven repo and get the version
+pushd arrow/java
+echo "Building and installing Arrow Java"
+mvn -DskipTests -Drat.skip=true clean install
+ARROW_VERSION=`mvn org.apache.maven.plugins:maven-help-plugin:2.1.1:evaluate 
-Dexpression=project.version | sed -n -e '/^\[.*\]/ !{ /^[0-9]/ { p; q } }'`
+echo "Using Arrow version $ARROW_VERSION"
+popd
+
+# Build Spark with Arrow
+SPARK_REPO=https://github.com/apache/spark.git
 
 Review comment:
   I think we should pull from apache git?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344175#comment-16344175
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-361420278
 
 
   Not my area of expertise (@xhochy or @cpcloud would know better) -- usually 
that relates to not having fetched all the tags


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344066#comment-16344066
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-361396699
 
 
   from `pyarrow.__init__.py` it looks like maybe I should install 
setuptools_scm but then I get the following version:
   ```
   >>> pa.__version__
   '0.1.1.dev1231+g0a49022'
   ```
   I would expect something like '0.8.***', where is it getting the above 
number from?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344065#comment-16344065
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-361396699
 
 
   from `pyarrow.__init__.py` it looks like maybe I should install 
setuptools_scm but then I get the following version:
   ```
   >>> pa.__version__
   '0.1.1.dev1231+g0a49022'
   ```
   I would expect something like '0.8.***'


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16344041#comment-16344041
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-361390175
 
 
   Looks like the errors I'm getting is because `pa.__version__` return `None`
   
   ```
   >>> import pyarrow as pa
   >>> from distutils.version import LooseVersion
   >>> LooseVersion(pa.__version__) < LooseVersion('0.8.0')
   Traceback (most recent call last):
 File "", line 1, in 
 File 
"/home/bryan/miniconda2/envs/pyarrow-dev/lib/python2.7/distutils/version.py", 
line 296, in __cmp__
   return cmp(self.version, other.version)
   AttributeError: LooseVersion instance has no attribute 'version'
   >>> pa.__version__
   >>>
   ```
   Is there something in the build I need to set to get a valid version? cc 
@wesm 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343902#comment-16343902
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-361364206
 
 
   @xhochy that is strange, until I switched the image to be based on `FROM 
maven:3.5.2-jdk-8-slim` I was seeing similar problems with paths containing `?` 
but since then it's been working fine and installs under root
   
   `[INFO] Installing 
/apache-arrow/arrow/java/tools/target/arrow-tools-0.9.0-SNAPSHOT-jar-with-dependencies.jar
 to 
/root/.m2/repository/org/apache/arrow/arrow-tools/0.9.0-SNAPSHOT/arrow-tools-0.9.0-SNAPSHOT-jar-with-dependencies.jar`
   
   I did follow this docker post-installation step, maybe that has something to 
do with it? 
https://docs.docker.com/install/linux/linux-postinstall/#manage-docker-as-a-non-root-user
   
   I've also been running with my local .m2 repo mounted in the container with 
`docker run -v /home/bryan/.m2:/root/.m2 spark-arrow` to save from always 
downloading dependencies, but I tried skipping that still don't get any errors.
   
   Besides the strange path, what is the exact error that it gives you for `mvn 
install`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-28 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16342660#comment-16342660
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

xhochy commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-361082017
 
 
   @BryanCutler I had a look into the build and found some errors with the 
Python/C++ scripts (will do a PR for that) but I also cannot get `mvn install` 
to work correctly. Any ideas why it installs to this location? 
   
   ```
   [INFO] Installing 
/apache-arrow/arrow/java/tools/target/arrow-tools-0.9.0-SNAPSHOT-jar-with-dependencies.jar
 to 
/apache-arrow/arrow/java/?/.m2/repository/org/apache/arrow/arrow-tools/0.9.0-SNAPSHOT/arrow-tools-0.9.0-SNAPSHOT-jar-with-dependencies.jar
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340424#comment-16340424
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-360658676
 
 
   This is pretty much running for me now, but I'm getting a strange test 
failure when running pyspark tests
   ```
   ==
   FAIL: test_unsupported_datatype (pyspark.sql.tests.ArrowTests)
   --
   AttributeError: 'LooseVersion' object has no attribute 'version'
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
 File "/apache-arrow/spark/python/pyspark/sql/tests.py", line 3404, in 
test_unsupported_datatype
   df.toPandas()
   AssertionError: "Unsupported data type" does not match "'LooseVersion' 
object has no attribute 'version'"
   ```
   
   Not sure why because the tests pass normally, maybe something with the conda 
setup?  I'll try some more later..


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16340422#comment-16340422
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-344361239
 
 
   This is currently a WIP, the Scala/Java tests are able to run
   
   Left TODO:
   
   - [x] Run PySpark Tests
   - [ ] Verify working with docker-compose and existing volumes in arrow/dev
   - [x] Check why Zinc is unable to run in mvn build, need to enable port 3030?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330878#comment-16330878
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-358730516
 
 
   I think that's fine. The failing nightly integration test will help with 
creating some urgency to get the downstream project fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330851#comment-16330851
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

BryanCutler commented on issue #1319: [WIP] ARROW-1579: [Java] Adding 
containerized Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-358726711
 
 
   I'll try to pick this up again in the next few days @wesm.  One question 
though, when we do an API breaking change, like rename MapVector, then this 
will fail until Spark is patched.  Do we need to keep a branch with these kind 
of patches until they can be applied upstream?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-1579) [Java] Add dockerized test setup to validate Spark integration

2018-01-18 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/ARROW-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330768#comment-16330768
 ] 

ASF GitHub Bot commented on ARROW-1579:
---

wesm commented on issue #1319: [WIP] ARROW-1579: [Java] Adding containerized 
Spark Integration tests
URL: https://github.com/apache/arrow/pull/1319#issuecomment-358709337
 
 
   @BryanCutler or @icexelloss could we get this set up? I'd like to look into 
setting up nightly integration test runs on a server someplace (we have a 
shared machine that @cpcloud and I have been using for nightly builds, and the 
pandas team users for benchmarks) so we can get warned immediately if something 
gets broken


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Java] Add dockerized test setup to validate Spark integration
> --
>
> Key: ARROW-1579
> URL: https://issues.apache.org/jira/browse/ARROW-1579
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java - Vectors
>Reporter: Wes McKinney
>Assignee: Bryan Cutler
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> cc [~bryanc] -- the goal of this will be to validate master-to-master to 
> catch any regressions in the Spark integration



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

51 matches

Mail list logo