parquet-mr next release with PARQUET-1217?

2018-03-29 Thread Henry Robinson
Hi all -

While using Spark, I got hit by PARQUET-1217 today on some data written by
Impala. This is a pretty nasty bug, and one that affects Apache Spark right
now because, AFAICT, there's no release to move to that contains the fix,
and parquet-mr 1.9.0 is affected. There is a workaround, but it's expensive
in terms of lost performance.

I'm new to the community, so wanted to see if there was a plan to make a
release (1.9.1?) in the near future. I'd rather that than have to build
short-term workarounds into Spark.

Best,
Henry


[jira] [Commented] (PARQUET-1143) Update Java for format 2.4.0 changes

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419833#comment-16419833
 ] 

ASF GitHub Bot commented on PARQUET-1143:
-

scottcarey commented on issue #430: PARQUET-1143: Update to Parquet format 
2.4.0.
URL: https://github.com/apache/parquet-mr/pull/430#issuecomment-377383897
 
 
   FWIW, I tested out the current master code, overriding the version in my 
spark projects.  I could not output zstandard parquet files because spark-sql's 
`ParquetOptions` class intercepts the config strings and maps them to a 
`CompressionCodecName` in parquet-hadoop, rather than just delegating the name 
lookup to parquet-hadoop, so it does not understand the string 'zstd'.
   
   This coupling means that using this from spark will require a new version of 
spark-sql.  Honestly, the code here should be responsible for converting from a 
simple name to the codec, not spark.  Then one could upgrade only the parquet 
version and gain access to new compression codecs without recompiling/releasing 
spark.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update Java for format 2.4.0 changes
> 
>
> Key: PARQUET-1143
> URL: https://issues.apache.org/jira/browse/PARQUET-1143
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-mr
>Affects Versions: 1.9.0, 1.8.2
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1258) Update scm developer connection to github

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419214#comment-16419214
 ] 

ASF GitHub Bot commented on PARQUET-1258:
-

zivanfi closed pull request #462: PARQUET-1258: Update scm developer connection 
to github
URL: https://github.com/apache/parquet-mr/pull/462
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/pom.xml b/pom.xml
index c8c8ccf44..f0117780a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -19,7 +19,7 @@
   
 scm:git:g...@github.com:apache/parquet-mr.git
 scm:git:g...@github.com:apache/parquet-mr.git
-
scm:git:https://git-wip-us.apache.org/repos/asf/parquet-mr.git
+
scm:git:g...@github.com:apache/parquet-mr.git
 HEAD
   
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Update scm developer connection to github
> -
>
> Key: PARQUET-1258
> URL: https://issues.apache.org/jira/browse/PARQUET-1258
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format, parquet-mr
>Affects Versions: 1.10.0, format-2.5.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Minor
> Fix For: 1.10.0, format-2.5.0
>
>
> After moving to gitbox the old apache repo 
> (https://git-wip-us.apache.org/repos/asf/parquet-format.git) is not working 
> anymore. The pom.xml shall be updated accordingly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1262) [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift

2018-03-29 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated PARQUET-1262:
-
Fix Version/s: cpp-1.5.0

> [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift 
> -
>
> Key: PARQUET-1262
> URL: https://issues.apache.org/jira/browse/PARQUET-1262
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: cpp-1.5.0
>
>
> When building Thrift using the ExternalProject facility, we do not pass on 
> the variables for a custom Boost variant. Thus if the user uses a differently 
> flavoured/located Boost, Thrift does not pick it up. As a cause of this, we 
> explicitly build Thrift during the Arrow OS X Wheel build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1262) [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift

2018-03-29 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated PARQUET-1262:
-
Component/s: parquet-cpp

> [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift 
> -
>
> Key: PARQUET-1262
> URL: https://issues.apache.org/jira/browse/PARQUET-1262
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: cpp-1.5.0
>
>
> When building Thrift using the ExternalProject facility, we do not pass on 
> the variables for a custom Boost variant. Thus if the user uses a differently 
> flavoured/located Boost, Thrift does not pick it up. As a cause of this, we 
> explicitly build Thrift during the Arrow OS X Wheel build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1262) [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift

2018-03-29 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated PARQUET-1262:
-
Description: When building Thrift using the ExternalProject facility, we do 
not pass on the variables for a custom Boost variant. Thus if the user uses a 
differently flavoured/located Boost, Thrift does not pick it up. As a cause of 
this, we explicitly build Thrift during the Arrow OS X Wheel build.

> [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift 
> -
>
> Key: PARQUET-1262
> URL: https://issues.apache.org/jira/browse/PARQUET-1262
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-cpp
>Reporter: Uwe L. Korn
>Priority: Major
> Fix For: cpp-1.5.0
>
>
> When building Thrift using the ExternalProject facility, we do not pass on 
> the variables for a custom Boost variant. Thus if the user uses a differently 
> flavoured/located Boost, Thrift does not pick it up. As a cause of this, we 
> explicitly build Thrift during the Arrow OS X Wheel build.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1262) [C++] Use the same BOOST_ROOT and Boost_NAMESPACE as

2018-03-29 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-1262:


 Summary: [C++] Use the same BOOST_ROOT and Boost_NAMESPACE as 
 Key: PARQUET-1262
 URL: https://issues.apache.org/jira/browse/PARQUET-1262
 Project: Parquet
  Issue Type: Improvement
Reporter: Uwe L. Korn






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1262) [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift

2018-03-29 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn updated PARQUET-1262:
-
Summary: [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift   
(was: [C++] Use the same BOOST_ROOT and Boost_NAMESPACE as )

> [C++] Use the same BOOST_ROOT and Boost_NAMESPACE for Thrift 
> -
>
> Key: PARQUET-1262
> URL: https://issues.apache.org/jira/browse/PARQUET-1262
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Uwe L. Korn
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1255) [C++] Exceptions thrown in some tests

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418985#comment-16418985
 ] 

ASF GitHub Bot commented on PARQUET-1255:
-

majetideepak commented on issue #448: PARQUET-1255: Fix error message when 
PARQUET_TEST_DATA isn't defined
URL: https://github.com/apache/parquet-cpp/pull/448#issuecomment-377234307
 
 
   I missed this step! Thank you!
   
   On Thu, Mar 29, 2018 at 7:40 AM, Uwe L. Korn 
   wrote:
   
   > @majetideepak  Make sure your ASF and
   > github account is correctly linked: https://gitbox.apache.org/ (you will
   > need to activate 2 factor auth in Github).
   >
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub
   > ,
   > or mute the thread
   > 

   > .
   >
   
   
   
   -- 
   regards,
   Deepak Majeti
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Exceptions thrown in some tests
> -
>
> Key: PARQUET-1255
> URL: https://issues.apache.org/jira/browse/PARQUET-1255
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: cpp-1.5.0
>
>
> Some tests (not all) throw a basic_string exception. Example:
> {code}
> $ ./debug/reader-test 
> Running main() from gtest_main.cc
> [==] Running 11 tests from 4 test cases.
> [--] Global test environment set-up.
> [--] 7 tests from TestAllTypesPlain
> [ RUN  ] TestAllTypesPlain.NoopConstructDestruct
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.NoopConstructDestruct (0 ms)
> [ RUN  ] TestAllTypesPlain.TestBatchRead
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.TestBatchRead (0 ms)
> [ RUN  ] TestAllTypesPlain.TestFlatScannerInt32
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.TestFlatScannerInt32 (0 ms)
> [ RUN  ] TestAllTypesPlain.TestSetScannerBatchSize
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.TestSetScannerBatchSize (0 ms)
> [ RUN  ] TestAllTypesPlain.DebugPrintWorks
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.DebugPrintWorks (0 ms)
> [ RUN  ] TestAllTypesPlain.ColumnSelection
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.ColumnSelection (0 ms)
> [ RUN  ] TestAllTypesPlain.ColumnSelectionOutOfRange
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.ColumnSelectionOutOfRange (0 ms)
> [--] 7 tests from TestAllTypesPlain (0 ms total)
> [--] 2 tests from TestLocalFile
> [ RUN  ] TestLocalFile.FileClosedOnDestruction
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestLocalFile.FileClosedOnDestruction (0 ms)
> [ RUN  ] TestLocalFile.OpenWithMetadata
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestLocalFile.OpenWithMetadata (0 ms)
> [--] 2 tests from TestLocalFile (0 ms total)
> [--] 1 test from TestFileReaderAdHoc
> [ RUN  ] TestFileReaderAdHoc.NationDictTruncatedDataPage
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in the test body.
> [  FAILED  ] TestFileReaderAdHoc.NationDictTruncatedDataPage (1 ms)
> [--] 1 test from TestFileReaderAdHoc (1 ms total)
> [--] 1 test from TestJSONWithLocalFile
> [ RUN  ] TestJSONWithLocalFile.JSONOutput
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct 

[jira] [Commented] (PARQUET-1260) Add Zoltan Ivanfi's code signing key to the KEYS file

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418980#comment-16418980
 ] 

ASF GitHub Bot commented on PARQUET-1260:
-

zivanfi closed pull request #91: PARQUET-1260: Add Zoltan Ivanfi's code signing 
key to the KEYS file
URL: https://github.com/apache/parquet-format/pull/91
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/KEYS b/KEYS
index 1dda8b9b..47079383 100644
--- a/KEYS
+++ b/KEYS
@@ -335,4 +335,64 @@ 
rEx//hthc5qG2W49kASK+2sK0gIqeHEkCBudcdH8rpfoIXx7cRfR3Pk+3o5GrZVf
 BM83UyGyWEjVQCR3/E/ag0jKwmsnlX6ofGFfS6xSqKK+H/FoLsbI23dS4o6bF4QA
 HT2hxY8ondF9eKU5rnzLGRFYmm1+Pw==
 =gSQT
+
 -END PGP PUBLIC KEY BLOCK-
+pub   4096R/90DE59A3 2018-03-23
+uid  Zoltan Ivanfi (CODE SIGNING KEY) 
+sig 390DE59A3 2018-03-23  Zoltan Ivanfi (CODE SIGNING KEY) 

+sub   4096R/5842E3B5 2018-03-23
+sig  90DE59A3 2018-03-23  Zoltan Ivanfi (CODE SIGNING KEY) 

+
+-BEGIN PGP PUBLIC KEY BLOCK-
+Version: GnuPG v1
+
+mQINBFq1Ew4BEADHh5yEROn9b0g2iVFdNeSNBidHKuErYQReqWWEYfReRL5gu8OX
+AePJyIC94inupY38vt6yxj9oQzoSwbSP9jRJODGH2AMxbZhMHqrfrAJLBVYHmv8x
+J8BP1lG/A0TVkQTTSkysKllWcz+QJB8sz5EksLOOTp/hFjJrGMntzmM94wJorCo7
+9kGksY195WJEYaFGwf5ZRbYksPj8c6il45b5eFxAZ1H3cNoCZDAMxVDayezY81Do
+MBHfdZO6/scZ13KDGO0zHXFHxp44AZIyCbqB09QRz7RPlrrUiHa4oV8gJEav8BqV
+833m0ajfncpeqtyLoQ2bweRPdc7WokhqgwFx/5YIXTE7xrEECxzFv0n2Ekg2na1K
+Z/uf7B5rduoNGNvuf/M6ySdzSfHV0Q7/oYXeUaFRqHlVtH4+HMxKt/oOlAxRsnRf
+6NjtxRd93u2WJarUK2tGyo+KcNck+0/W8s987WwhYXnMq8YgP/YhPD0Zw8A4axOa
+wrhZ8SePEtLTffk3h5uJDQZdzopONVLvmufvbvUL1vqYQ6bTM6C06FurQfI3aJA9
+b3Vlr/JkZI2gmfLmQ4ReJsC1XfZ1IVjibzvyi0njIvlTQhMd5qluBbKlFRcf2S15
+Fn1WRX1gNSeZdpEbR62NcAnqgIycuYPVDhfs9fm+Ogd7mRfCrhpOvIMCFQARAQAB
+tDVab2x0YW4gSXZhbmZpIChDT0RFIFNJR05JTkcgS0VZKSA8eml2YW5maUBhcGFj
+aGUub3JnPokCNwQTAQIAIQIbAwIeAQIXgAUCWrUZrAULCQgHAwUVCgkICwUWAgMB
+AAAKCRDzAcr1kN5Zo4PqEACGahN0HtTbt1kJhtYS3nMwQYTI73PjL5QSWqHlTdNx
+OfjRU5jMjaNpeNwjdx6hxLp/KnI5DZR+19MwA5trUQ3ZEAYkCqU19dmfaIB9rsVv
+JMeLXNLuSv11reOrvLYFs8AcWzwIzhPBNz4q9xZqloVE4aCsRqm25xpJae5a8eDG
+mPZdbjIBSD6Na+hai9l2egNQdYbvzD6Qydb4XDq8Se3RMq05f2RLOTYId8qb4inD
+es1jQi+apUDSZ+WIL7C5UtS6nlzXDnXQtIfHfJsAJl2IW91b6wnoJPMlHtt+3BJg
+82nI8XGIEeDRQGGhLC/ZfkWc5OXapOhDhYykxuGBurvLzq7dPp+iJcs5F1W4PX7I
+xzZD/2x23G/Eg09DmVWYkeKeh3HmwqcDbYN0ApgrUmRuwueAqXvhoEe6kxaZcLrj
+otDSmZD0vECOadhOgst0kYHdFCgQL5MoPQqJNHZDPsciq7WiiAU9aF9DtWJy+6Zb
+0b5TyaCoT4RaqdJj7AU5bR44BYwHwVTy55UEsa8jZxyvK4kGPFgXwqPwW+lxteiv
+k3edHALBEdVZEFs2+xmiz0ns3F4QZHdj9qBG5GGw3jf9iKBDqaerIviEdJS1/yzm
+u980v7jcpOwg2ZsyTKh/PFmUO8tDHszj68RbhPzdBPNXpXhtEYSdOfSOdeK/g87c
+L7kCDQRatRMOARAAzSPx83m+FbeODkApJreD7A14rlT+gMsMaQTapjD5XDHmuS42
+sO4PtV4pGAD4q/KnZzorV2u9tcRxteinALcCoKlP7PoB87tpqUELLkUwgDZjNfNz
+/GipyJFSdcT2waBY+/03bVpthceCxIV3b6xTm2owrJgS0Exd0b21X3zELKiV9UC6
+Pjtd1qLsKgf6N+RvIbT8De2CrFzyy+iISvnZTFMEDE9rnkXuwY93OLtOHjW9rncp
+x2aLYmxuoUh8fKZTcWTXe/uG7/elED08aUwb8JINjSNTYBugs/2OTOpKW3jbti0h
+GOGk/AD+sKNndTG66/nYD5ED6NW0/NleHCDNO+vh0vzjSds08daotj21Z/2sWY06
+qxYGOkTEQy4i0DyTxylxxvPk+c5pTIHupcLsRjmjl3J45vPANnkj4lkNMTdlkabJ
+P2lglwOV+fmW+nxGmW/83AxvNun1dMrHCV5oZXIR5eblyHGMwBpzonl7kOFTIagG
+wcJJK/erJxvFOdAYuiXkq51/DxlK5KNBIT/G1U71EzFRCU/jK+rdI+fAMmoiJ794
+F2PTQwF5NxEr28lM6qOC1QjF5gxVAQU2N6klP5R2Ir1OrIo6RFrhWO+j1AGnUYjE
+zcKLf/DuNzGkO1CTp25Z2mROHSc9vdhSm17EcfCzSPKIrCjkEKeW6Xi7N98AEQEA
+AYkCHwQYAQIACQIbDAUCWrUYgwAKCRDzAcr1kN5Zo5jVD/0UUCdJL4rEQ0PfQoMs
+Gtxx0xMl4ASQQM4ENVBPIzfhXMe3g9iRZkOrNAuRF2KZ3Hr1ekfM4FtcOX4ZGB7t
+TL9ai0QIWJYHj7eWQIpno1sHIQQhx0VpA2Av4gxVdfR7aL3O+rm7QLZU2TPXWd3o
+wiBn3BnWKgv0j6XmvWH1Yn13OpFuWjt+QEcE2W0wNg8MP7J+fz3XjC84BucMnBQv
+hgz7WkFATnWfwwDm+UB3pmibTqC/Kvia/GZzWrwGc/v73XckxnALMfUXV35KHAY4
+YXaLDrHu3h5SnXdoKFnyBkHwFZFlFYWSt47SYpeYvaWDUF1aplMXgH/xYoySeGMt
+2GL0xZKE9SI2xwNblqR2dmTOfTjO9HnkI6fYW4VuulBrp850DAWDaluKGoggQaq0
+t7qTBxOB4xA9tci9x347Oeq1QnBJZJnkOnEqY56GVG/0ACyemVaPNEg+0B/sD4Uq
+3JyQhtn/+UAlyL8Qg98ExOXqVMGK2+wo9P3aZJbR/TCjmNEsPJPWIITVxVHrr5is
+3Y4InJ6F8pt4etNyRtreOA7OpJfL4z2fYgtxPeOeSkKtI8/hU/x7pbJP40PKiNog
+EHa3g1YBk2sRqia3cCVZDEYjLymiJAUnyCWMktGWajs+931V44QSGGM+vWi/DauA
+VHP5p3w+PsIm1Xf2o1gQl2N2rA==
+=a8/z
+-END PGP PUBLIC KEY BLOCK-
+


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Zoltan Ivanfi's code signing key to the KEYS file
> -
>
> Key: PARQUET-1260
> URL: 

[jira] [Commented] (PARQUET-1261) Parquet-format interns strings when reading filemetadata

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418876#comment-16418876
 ] 

ASF GitHub Bot commented on PARQUET-1261:
-

robert3005 opened a new pull request #92: PARQUET-1261 - Remove string interning
URL: https://github.com/apache/parquet-format/pull/92
 
 
   As explained on the issue - it's questionable whether this brings any 
benefits and can cause harm


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Parquet-format interns strings when reading filemetadata
> 
>
> Key: PARQUET-1261
> URL: https://issues.apache.org/jira/browse/PARQUET-1261
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.9.0
>Reporter: Robert Kruszewski
>Priority: Major
>
> Parquet-format when deserializing metadata will intern strings. References I 
> could find suggested that it had been done to reduce memory pressure early 
> on. Java (and jvm in particular) went a long way since then and interning is 
> generally discouraged, see 
> [https://shipilev.net/jvm-anatomy-park/10-string-intern/] for a good 
> explanation. What is more since java 8 there's string deduplication 
> implemented at GC level per [http://openjdk.java.net/jeps/192.] During our 
> usage and testing we found the interning to cause significant gc pressure for 
> long running applications due to bigger GC root set.
> This issue proposes removing interning given it's questionable whether it 
> should be used in modern jvms.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418874#comment-16418874
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar opened a new pull request #463: PARQUET-1253: Support for new 
logical type representation
URL: https://github.com/apache/parquet-mr/pull/463
 
 
   This PR implements the new logical type representation in parquet-mr which 
is already available in parquet-format. Reviews are welcome!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1255) [C++] Exceptions thrown in some tests

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418818#comment-16418818
 ] 

ASF GitHub Bot commented on PARQUET-1255:
-

xhochy commented on issue #448: PARQUET-1255: Fix error message when 
PARQUET_TEST_DATA isn't defined
URL: https://github.com/apache/parquet-cpp/pull/448#issuecomment-377208606
 
 
   @majetideepak Make sure your ASF and github account is correctly linked: 
https://gitbox.apache.org/ (you will need to activate 2 factor auth in Github).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Exceptions thrown in some tests
> -
>
> Key: PARQUET-1255
> URL: https://issues.apache.org/jira/browse/PARQUET-1255
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: cpp-1.5.0
>
>
> Some tests (not all) throw a basic_string exception. Example:
> {code}
> $ ./debug/reader-test 
> Running main() from gtest_main.cc
> [==] Running 11 tests from 4 test cases.
> [--] Global test environment set-up.
> [--] 7 tests from TestAllTypesPlain
> [ RUN  ] TestAllTypesPlain.NoopConstructDestruct
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.NoopConstructDestruct (0 ms)
> [ RUN  ] TestAllTypesPlain.TestBatchRead
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.TestBatchRead (0 ms)
> [ RUN  ] TestAllTypesPlain.TestFlatScannerInt32
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.TestFlatScannerInt32 (0 ms)
> [ RUN  ] TestAllTypesPlain.TestSetScannerBatchSize
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.TestSetScannerBatchSize (0 ms)
> [ RUN  ] TestAllTypesPlain.DebugPrintWorks
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.DebugPrintWorks (0 ms)
> [ RUN  ] TestAllTypesPlain.ColumnSelection
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.ColumnSelection (0 ms)
> [ RUN  ] TestAllTypesPlain.ColumnSelectionOutOfRange
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.ColumnSelectionOutOfRange (0 ms)
> [--] 7 tests from TestAllTypesPlain (0 ms total)
> [--] 2 tests from TestLocalFile
> [ RUN  ] TestLocalFile.FileClosedOnDestruction
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestLocalFile.FileClosedOnDestruction (0 ms)
> [ RUN  ] TestLocalFile.OpenWithMetadata
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestLocalFile.OpenWithMetadata (0 ms)
> [--] 2 tests from TestLocalFile (0 ms total)
> [--] 1 test from TestFileReaderAdHoc
> [ RUN  ] TestFileReaderAdHoc.NationDictTruncatedDataPage
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in the test body.
> [  FAILED  ] TestFileReaderAdHoc.NationDictTruncatedDataPage (1 ms)
> [--] 1 test from TestFileReaderAdHoc (1 ms total)
> [--] 1 test from TestJSONWithLocalFile
> [ RUN  ] TestJSONWithLocalFile.JSONOutput
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in the test body.
> [  FAILED  ] TestJSONWithLocalFile.JSONOutput (0 ms)
> [--] 1 test from TestJSONWithLocalFile (0 ms total)
> [--] Global test environment tear-down
> [==] 11 tests from 4 test cases ran. (1 ms total)
> [  PASSED  ] 0 tests.
> [  FAILED  ] 11 tests, listed below:
> [  FAILED  ] TestAllTypesPlain.NoopConstructDestruct
> [  FAILED  ] TestAllTypesPlain.TestBatchRead
> [  FAILED  ] TestAllTypesPlain.TestFlatScannerInt32
> [  FAILED  ] TestAllTypesPlain.TestSetScannerBatchSize
> [  FAILED  ] 

[jira] [Commented] (PARQUET-1255) [C++] Exceptions thrown in some tests

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418774#comment-16418774
 ] 

ASF GitHub Bot commented on PARQUET-1255:
-

majetideepak commented on issue #448: PARQUET-1255: Fix error message when 
PARQUET_TEST_DATA isn't defined
URL: https://github.com/apache/parquet-cpp/pull/448#issuecomment-377202205
 
 
   @xhochy I tried to merge this commit yesterday and followed the instructions 
in `dev/committers-guide.md ` at `2. Add your gpg key to the Apache Parquet 
{format,mr,cpp} KEYS file:`
   I get the following error:
   ```
   $ git clone g...@github.com:apache/parquet-cpp.git
   Cloning into 'parquet-cpp'...
   Permission denied (publickey).
   fatal: Could not read from remote repository.
   
   Please make sure you have the correct access rights
   and the repository exists.
   ```
   Can you help me? Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Exceptions thrown in some tests
> -
>
> Key: PARQUET-1255
> URL: https://issues.apache.org/jira/browse/PARQUET-1255
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: cpp-1.5.0
>
>
> Some tests (not all) throw a basic_string exception. Example:
> {code}
> $ ./debug/reader-test 
> Running main() from gtest_main.cc
> [==] Running 11 tests from 4 test cases.
> [--] Global test environment set-up.
> [--] 7 tests from TestAllTypesPlain
> [ RUN  ] TestAllTypesPlain.NoopConstructDestruct
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.NoopConstructDestruct (0 ms)
> [ RUN  ] TestAllTypesPlain.TestBatchRead
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.TestBatchRead (0 ms)
> [ RUN  ] TestAllTypesPlain.TestFlatScannerInt32
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.TestFlatScannerInt32 (0 ms)
> [ RUN  ] TestAllTypesPlain.TestSetScannerBatchSize
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.TestSetScannerBatchSize (0 ms)
> [ RUN  ] TestAllTypesPlain.DebugPrintWorks
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.DebugPrintWorks (0 ms)
> [ RUN  ] TestAllTypesPlain.ColumnSelection
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.ColumnSelection (0 ms)
> [ RUN  ] TestAllTypesPlain.ColumnSelectionOutOfRange
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestAllTypesPlain.ColumnSelectionOutOfRange (0 ms)
> [--] 7 tests from TestAllTypesPlain (0 ms total)
> [--] 2 tests from TestLocalFile
> [ RUN  ] TestLocalFile.FileClosedOnDestruction
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestLocalFile.FileClosedOnDestruction (0 ms)
> [ RUN  ] TestLocalFile.OpenWithMetadata
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in SetUp().
> [  FAILED  ] TestLocalFile.OpenWithMetadata (0 ms)
> [--] 2 tests from TestLocalFile (0 ms total)
> [--] 1 test from TestFileReaderAdHoc
> [ RUN  ] TestFileReaderAdHoc.NationDictTruncatedDataPage
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in the test body.
> [  FAILED  ] TestFileReaderAdHoc.NationDictTruncatedDataPage (1 ms)
> [--] 1 test from TestFileReaderAdHoc (1 ms total)
> [--] 1 test from TestJSONWithLocalFile
> [ RUN  ] TestJSONWithLocalFile.JSONOutput
> unknown file: Failure
> C++ exception with description "basic_string::_S_construct null not valid" 
> thrown in the test body.
> [  FAILED  ] TestJSONWithLocalFile.JSONOutput (0 ms)
> [--] 1 test from TestJSONWithLocalFile (0 ms total)
> [--] Global test environment tear-down
>