[jira] [Comment Edited] (ARROW-785) possible issue on writing parquet via pyarrow, subsequently read in Hive
[ https://issues.apache.org/jira/browse/ARROW-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976721#comment-16976721 ] albertoramon edited comment on ARROW-785 at 11/18/19 5:32 PM: -- I saw this:(SparkSQL 2.4.4 PyArrow 0.15) The problem is Create table with INT columns (BIGINT works properly) SOL: Change INT to BIGINT works fine (I tried to use Double but didn't work) in create table In my case: these Parquet Files are from SSB benchmark {code:java} SELECT MAX(LO_CUSTKEY), MAX(LO_PARTKEY), MAX (LO_SUPPKEY) FROM SSB.LINEORDER; Returns: 2 20 2000 {code} In my Column_Types I Had,: (thus I need review my Python Code :)): {code:java} 'lo_custkey':'int64', 'lo_partkey':'int64', 'lo_suppkey':'int64',{code} was (Author: albertoramon): I saw this:(SparkSQL 2.4.4 PyArrow 0.15) The problem is Create table with INT columns (BIGINT works properly) SOL: Change INT to BIGINT works fine (I tried to use Double but didn't work) in create table In my case: these Parquet Files are from SSB benchmark {code:java} SELECT MAX(LO_CUSTKEY), MAX(LO_PARTKEY), MAX (LO_SUPPKEY) FROM SSB.LINEORDER; Returns: 2 20 2000 {code} In my Column_Types I Had, thus I need review my Python Code :) : {code:java} 'lo_custkey':'int64', 'lo_partkey':'int64', 'lo_suppkey':'int64',{code} > possible issue on writing parquet via pyarrow, subsequently read in Hive > > > Key: ARROW-785 > URL: https://issues.apache.org/jira/browse/ARROW-785 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Jeff Reback >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.5.0 > > > details here: > http://stackoverflow.com/questions/43268872/parquet-creation-conversion-from-pandas-dataframe-to-pyarrow-table-not-working-f > This round trips in pandas->parquet->pandas just fine on released pandas > (0.19.2) and pyarrow (0.2). > OP stats that it is not readable in Hive however. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-785) possible issue on writing parquet via pyarrow, subsequently read in Hive
[ https://issues.apache.org/jira/browse/ARROW-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976721#comment-16976721 ] albertoramon commented on ARROW-785: I saw this:(SparkSQL 2.4.4 PyArrow 0.15) The problem is Create table with INT columns (BIGINT works properly) SOL: Change INT to BIGINT works fine (I tried to use Double but didn't work) in create table In my case: these Parquet Files are from SSB benchmark {code:java} SELECT MAX(LO_CUSTKEY), MAX(LO_PARTKEY), MAX (LO_SUPPKEY) FROM SSB.LINEORDER; Returns: 2 20 2000 {code} In my Column_Types I Had, thus I need review my Python Code :) : {code:java} 'lo_custkey':'int64', 'lo_partkey':'int64', 'lo_suppkey':'int64',{code} > possible issue on writing parquet via pyarrow, subsequently read in Hive > > > Key: ARROW-785 > URL: https://issues.apache.org/jira/browse/ARROW-785 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Jeff Reback >Assignee: Wes McKinney >Priority: Minor > Fix For: 0.5.0 > > > details here: > http://stackoverflow.com/questions/43268872/parquet-creation-conversion-from-pandas-dataframe-to-pyarrow-table-not-working-f > This round trips in pandas->parquet->pandas just fine on released pandas > (0.19.2) and pyarrow (0.2). > OP stats that it is not readable in Hive however. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (ARROW-6129) Row_groups duplicate Rows
[ https://issues.apache.org/jira/browse/ARROW-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] albertoramon closed ARROW-6129. --- Resolution: Not A Problem This is the expected behavior > Row_groups duplicate Rows > - > > Key: ARROW-6129 > URL: https://issues.apache.org/jira/browse/ARROW-6129 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.14.1 >Reporter: albertoramon >Priority: Major > Labels: parquetWriter > Attachments: tes_output.png, test01.py, top10.csv > > > Using Row_Groups to write Parquet, duplicate rows: > Input: CSV 10 Rows > Row_Groups=1 --> Output 10 Rows > Row_Groups=2 --> Output 20 Rows > !tes_output.png! > Is this the expected? > attached code snippet and CSV -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-6129) Row_groups duplicate Rows
[ https://issues.apache.org/jira/browse/ARROW-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] albertoramon updated ARROW-6129: Component/s: C++ > Row_groups duplicate Rows > - > > Key: ARROW-6129 > URL: https://issues.apache.org/jira/browse/ARROW-6129 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.14.1 >Reporter: albertoramon >Priority: Major > Attachments: tes_output.png, test01.py, top10.csv > > > Using Row_Groups to write Parquet, duplicate rows: > Input: CSV 10 Rows > Row_Groups=1 --> Output 10 Rows > Row_Groups=2 --> Output 20 Rows > !tes_output.png! > Is this the expected? > attached code snippet and CSV -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-6129) Row_groups duplicate Rows
[ https://issues.apache.org/jira/browse/ARROW-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] albertoramon updated ARROW-6129: Labels: parquetWriter (was: ) > Row_groups duplicate Rows > - > > Key: ARROW-6129 > URL: https://issues.apache.org/jira/browse/ARROW-6129 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Python >Affects Versions: 0.14.1 >Reporter: albertoramon >Priority: Major > Labels: parquetWriter > Attachments: tes_output.png, test01.py, top10.csv > > > Using Row_Groups to write Parquet, duplicate rows: > Input: CSV 10 Rows > Row_Groups=1 --> Output 10 Rows > Row_Groups=2 --> Output 20 Rows > !tes_output.png! > Is this the expected? > attached code snippet and CSV -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (ARROW-6129) Row_groups duplicate Rows
[ https://issues.apache.org/jira/browse/ARROW-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] albertoramon updated ARROW-6129: Description: Using Row_Groups to write Parquet, duplicate rows: Input: CSV 10 Rows Row_Groups=1 --> Output 10 Rows Row_Groups=2 --> Output 20 Rows !tes_output.png! Is this the expected? attached code snippet and CSV was: Using Row_Groups to write Parquet, duplicate date: Input: CSV 10 Rows Row_Groups=1 --> Output 10 Rows !tes_output.png! Row_Groups=2 --> Output 20 Rows Is this the expected? [^test01.py] > Row_groups duplicate Rows > - > > Key: ARROW-6129 > URL: https://issues.apache.org/jira/browse/ARROW-6129 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.14.1 >Reporter: albertoramon >Priority: Major > Attachments: tes_output.png, test01.py, top10.csv > > > Using Row_Groups to write Parquet, duplicate rows: > Input: CSV 10 Rows > Row_Groups=1 --> Output 10 Rows > Row_Groups=2 --> Output 20 Rows > !tes_output.png! > Is this the expected? > attached code snippet and CSV -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (ARROW-6129) Row_groups duplicate Rows
albertoramon created ARROW-6129: --- Summary: Row_groups duplicate Rows Key: ARROW-6129 URL: https://issues.apache.org/jira/browse/ARROW-6129 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.14.1 Reporter: albertoramon Attachments: tes_output.png, test01.py, top10.csv Using Row_Groups to write Parquet, duplicate date: Input: CSV 10 Rows Row_Groups=1 --> Output 10 Rows !tes_output.png! Row_Groups=2 --> Output 20 Rows Is this the expected? [^test01.py] -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (ARROW-3203) [C++] Build error on Debian Buster
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761829#comment-16761829 ] albertoramon edited comment on ARROW-3203 at 2/7/19 12:33 PM: -- Now both Debian Fail: {code:java} CMake Error at /usr/local/src/arrow/cpp/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-configure-RELEASE.cmake:16 (message): Command failed: 1 './autogen.sh' 'AR=/usr/bin/ar' 'CC=/usr/bin/cc' '--prefix=/usr/local/src/arrow/cpp/jemalloc_ep-prefix/src/jemalloc_ep/dist/' '--with-jemalloc-prefix=je_arrow_' '--with-private-namespace=je_arrow_private_' '--disable-tls' See also /usr/local/src/arrow/cpp/jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-configure-*.log make[2]: *** [CMakeFiles/jemalloc_ep.dir/build.make:107: jemalloc_ep-prefix/src/jemalloc_ep-stamp/jemalloc_ep-configure] Error 1 make[1]: *** [CMakeFiles/Makefile2:293: CMakeFiles/jemalloc_ep.dir/all] Error 2 make: *** [Makefile:141: all] Error 2{code} Nowadays I'm using Conda to avoid other bug If you want you can close this Jira and if somebody have the same error, use this script to reproduce it was (Author: albertoramon): Still? I don't know, I'm using Debian Strech as Base in my docker images, for now and works fine I can Debian Buster only for testing proposes, give me some days I will report here > [C++] Build error on Debian Buster > -- > > Key: ARROW-3203 > URL: https://issues.apache.org/jira/browse/ARROW-3203 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: albertoramon >Priority: Major > Attachments: DockerfileRV, flatbuffers_ep-build-err.log > > > There is a error with Debian Buster (In Debian Stretch works fine) > You can test it easily change the first line from dockerfile (attached) > > *To reproduce it:* > {code:java} > docker build -f DockerfileRV -t arrow_rw . > docker run -it arrow_rw bash > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3203) [C++] Build error on Debian Buster
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16761829#comment-16761829 ] albertoramon commented on ARROW-3203: - Still? I don't know, I'm using Debian Strech as Base in my docker images, for now and works fine I can Debian Buster only for testing proposes, give me some days I will report here > [C++] Build error on Debian Buster > -- > > Key: ARROW-3203 > URL: https://issues.apache.org/jira/browse/ARROW-3203 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: albertoramon >Priority: Major > Attachments: DockerfileRV, flatbuffers_ep-build-err.log > > > There is a error with Debian Buster (In Debian Stretch works fine) > You can test it easily change the first line from dockerfile (attached) > > *To reproduce it:* > {code:java} > docker build -f DockerfileRV -t arrow_rw . > docker run -it arrow_rw bash > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ARROW-3203) [C++] Build error on Debian Buster
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609409#comment-16609409 ] albertoramon edited comment on ARROW-3203 at 9/10/18 3:42 PM: -- Depends of version of Debian works or not {code:java} FROM debian:buster #FROM debian:stretch {code} The first FAIL The second work OK was (Author: albertoramon): Depends of version of Debian works or not {code:java} FROM debian:buster #FROM debian:stretch {code} The first FAIL The second work OK > [C++] Build error on Debian Buster > -- > > Key: ARROW-3203 > URL: https://issues.apache.org/jira/browse/ARROW-3203 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: albertoramon >Priority: Major > Attachments: DockerfileRV, flatbuffers_ep-build-err.log > > > There is a error with Debian Buster (In Debian Stretch works fine) > You can test it easily change the first line from dockerfile (attached) > > *To reproduce it:* > {code:java} > docker build -f DockerfileRV -t arrow_rw . > docker run -it arrow_rw bash > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3203) [C++] Build error on Debian Buster
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609409#comment-16609409 ] albertoramon commented on ARROW-3203: - Depends of version of Debian works or not {code:java} FROM debian:buster #FROM debian:stretch {code} The first FAIL The second work OK > [C++] Build error on Debian Buster > -- > > Key: ARROW-3203 > URL: https://issues.apache.org/jira/browse/ARROW-3203 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: albertoramon >Priority: Major > Attachments: DockerfileRV, flatbuffers_ep-build-err.log > > > There is a error with Debian Buster (In Debian Stretch works fine) > You can test it easily change the first line from dockerfile (attached) > > *To reproduce it:* > {code:java} > docker build -f DockerfileRV -t arrow_rw . > docker run -it arrow_rw bash > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ARROW-3203) [C++] Build error on Debian Buster
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16609404#comment-16609404 ] albertoramon commented on ARROW-3203: - I attached flatbuffers_ep-build-err.log to JIRA {code:java} CMake Error at /usr/local/src/arrow/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-RELEASE.cmake:16 (message): Command failed: 2 'make' See also /usr/local/src/arrow/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log make[2]: *** [CMakeFiles/flatbuffers_ep.dir/build.make:112: flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build] Error 1 make[1]: *** [CMakeFiles/Makefile2:418: CMakeFiles/flatbuffers_ep.dir/all] Error 2 make: *** [Makefile:141: all] Error 2{code} > [C++] Build error on Debian Buster > -- > > Key: ARROW-3203 > URL: https://issues.apache.org/jira/browse/ARROW-3203 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: albertoramon >Priority: Major > Attachments: DockerfileRV, flatbuffers_ep-build-err.log > > > There is a error with Debian Buster (In Debian Stretch works fine) > You can test it easily change the first line from dockerfile (attached) > > *To reproduce it:* > {code:java} > docker build -f DockerfileRV -t arrow_rw . > docker run -it arrow_rw bash > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3203) [C++] Build error on Debian Buster
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] albertoramon updated ARROW-3203: Attachment: flatbuffers_ep-build-err.log > [C++] Build error on Debian Buster > -- > > Key: ARROW-3203 > URL: https://issues.apache.org/jira/browse/ARROW-3203 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: albertoramon >Priority: Major > Attachments: DockerfileRV, flatbuffers_ep-build-err.log > > > There is a error with Debian Buster (In Debian Stretch works fine) > You can test it easily change the first line from dockerfile (attached) > > *To reproduce it:* > {code:java} > docker build -f DockerfileRV -t arrow_rw . > docker run -it arrow_rw bash > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3206) ARROW_HIVESERVER2, hiveserver2-test
albertoramon created ARROW-3206: --- Summary: ARROW_HIVESERVER2, hiveserver2-test Key: ARROW-3206 URL: https://issues.apache.org/jira/browse/ARROW-3206 Project: Apache Arrow Issue Type: Bug Reporter: albertoramon Fix For: 0.10.0 Hello Activate, support for hive, generate exception in CMake: {quote}_CMake Error at src/arrow/dbi/hiveserver2/[CMakeLists.txt:116|https://github.com/apache/arrow/blob/a42d4bf1b0cef37849be0b019c34c96bf56a62f9/cpp/src/arrow/dbi/hiveserver2/CMakeLists.txt#L116] (set_property):_ _set_property could not find TARGET hiveserver2-test. Perhaps it has not_ _yet been created_ {quote} {code:java} RUN cmake \ -DARROW_HIVESERVER2=ON \ -DCMAKE_BUILD_TYPE=Release \ -DARROW_BUILD_TESTS=OFF \ . {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3203) Arrow in Debian Buster
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] albertoramon updated ARROW-3203: Description: There is a error with Debian Buster (In Debian Stretch works fine) You can test it easily change the first line from dockerfile (attached) *To reproduce it:* {code:java} docker build -f DockerfileRV -t arrow_rw . docker run -it arrow_rw bash {code} was: There is a error with Debian Buster (In Debian Stretch works fine) You can test it easily change the first line from dockerfile (attached) {code:java} docker build -f DockerfileRV -t arrow_rw . docker run -it arrow_rw bash {code} > Arrow in Debian Buster > -- > > Key: ARROW-3203 > URL: https://issues.apache.org/jira/browse/ARROW-3203 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: albertoramon >Priority: Major > Attachments: DockerfileRV > > > There is a error with Debian Buster (In Debian Stretch works fine) > You can test it easily change the first line from dockerfile (attached) > > *To reproduce it:* > {code:java} > docker build -f DockerfileRV -t arrow_rw . > docker run -it arrow_rw bash > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ARROW-3203) Arrow in Debian Buster
[ https://issues.apache.org/jira/browse/ARROW-3203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] albertoramon updated ARROW-3203: Description: There is a error with Debian Buster (In Debian Stretch works fine) You can test it easily change the first line from dockerfile (attached) {code:java} docker build -f DockerfileRV -t arrow_rw . docker run -it arrow_rw bash {code} was: There is a error with Debian Buster (In Debian Stretch works fine) You can test it easily change the first line from dockerfile (attached) {code:java} docker build -f DockerfileRV -t arrow_rw . docker run -it arrow_rw bash {code} > Arrow in Debian Buster > -- > > Key: ARROW-3203 > URL: https://issues.apache.org/jira/browse/ARROW-3203 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.10.0 >Reporter: albertoramon >Priority: Major > Attachments: DockerfileRV > > > There is a error with Debian Buster (In Debian Stretch works fine) > You can test it easily change the first line from dockerfile (attached) > > {code:java} > docker build -f DockerfileRV -t arrow_rw . > docker run -it arrow_rw bash > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3203) Arrow in Debian Buster
albertoramon created ARROW-3203: --- Summary: Arrow in Debian Buster Key: ARROW-3203 URL: https://issues.apache.org/jira/browse/ARROW-3203 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.10.0 Reporter: albertoramon Attachments: DockerfileRV There is a error with Debian Buster (In Debian Stretch works fine) You can test it easily change the first line from dockerfile (attached) {code:java} docker build -f DockerfileRV -t arrow_rw . docker run -it arrow_rw bash {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ARROW-3202) Make Install: not declared in this scope
albertoramon created ARROW-3202: --- Summary: Make Install: not declared in this scope Key: ARROW-3202 URL: https://issues.apache.org/jira/browse/ARROW-3202 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.10.0 Reporter: albertoramon Fix For: 0.11.0 When execute _make install_ : Error step for 'orc_ep': * Timezone.cc:748:7: error: 'uint' was not declared in this scope * Timezone.cc:749:11: error: 'nameStart' was not declared in this scope * Timezone.cc:756:59: error: 'nameStart' was not declared in this scope *To Reproduce it:* {code:java} docker pull python:2.7.15-alpine3.8 docker run -it python:2.7.15-alpine3.8 /bin/sh apk add wget git apk add gcc musl-dev cmake make boost-dev g++ apk add unixodbc-dev pybind11 apk add mysql-client postgresql-client pip install numpy cython #pip install pandas #Optional git clone https://github.com/apache/arrow.git cd arrow/cpp mkdir build cd build cmake .. -DARROW_PYTHON=on -DARROW_ORC=on-DCMAKE_BUILD_TYPE=Release make install #ERROR export ARROW_HOME=$PWD make unittest pip install pyarrow turbodbc export LC_ALL="en_US.UTF-8" {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)