Re: [VOTE] Release Apache Parquet Java 1.10.0 RC0

2018-04-05 Thread Ryan Blue
+1 (binding)

Built, tested, validated signature and checksums, and tested the Iceberg
build with the artifacts.

On Thu, Apr 5, 2018 at 2:15 PM, Ryan Blue  wrote:

> Hi everyone,
>
> I propose the following RC to be released as official Apache Parquet Java
> 1.10.0 release.
>
> The commit id is 031a6654009e3b82020012a18434c582bd74c73a
>
>- This corresponds to the tag: apache-parquet-1.10.0
>- https://github.com/apache/parquet-mr/tree/031a665
>
> The release tarball, signature, and checksums are here:
>
>- https://dist.apache.org/repos/dist/dev/parquet/apache-
>parquet-1.10.0-rc0/
>
> You can find the KEYS file here:
>
>- https://dist.apache.org/repos/dist/dev/parquet/KEYS
>
> Binary artifacts are staged in Nexus here:
>
>- https://repository.apache.org/content/groups/staging/org/
>apache/parquet/parquet/1.10.0/
>
> This release includes:
>
>- The new Parquet command-line tool
>- New APIs to avoid leaking Hadoop classes
>- Fixed sort order for logical types
>- Fixed stats handling for NaN and other floating point edge cases
>
> The full change log is available here:
>
>- https://github.com/apache/parquet-mr/blob/031a665/CHANGES.md
>
> Please download, verify, and test.
>
> Please vote by Tuesday, 10 April 2018.
>
> [ ] +1 Release this as Apache Parquet Java 1.10.0
> [ ] +0
> [ ] -1 Do not release this because…
> ​
> --
> Ryan Blue
>



-- 
Ryan Blue


[jira] [Resolved] (PARQUET-1264) Update Javadoc for Java 1.8

2018-04-05 Thread Ryan Blue (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved PARQUET-1264.

Resolution: Fixed

> Update Javadoc for Java 1.8
> ---
>
> Key: PARQUET-1264
> URL: https://issues.apache.org/jira/browse/PARQUET-1264
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.9.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 1.10.0
>
>
> After moving the build to Java 1.8, the release procedure no longer works 
> because Javadoc generation fails.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: parquet-mr next release with PARQUET-1217?

2018-04-05 Thread Ryan Blue
I just sent a vote for this. Took longer than expected because I had to fix
all of the javadoc warnings for java 8. Please test it out and vote.

On Fri, Mar 30, 2018 at 10:44 AM, Ryan Blue  wrote:

> I have no plan for 1.9.1.
>
> On Fri, Mar 30, 2018 at 10:42 AM, Henry Robinson 
> wrote:
>
>> Great! Do you know of any plans to do a 1.9.1?
>>
>> On 30 March 2018 at 09:35, Ryan Blue  wrote:
>>
>>> I'm planning on getting a 1.10.0 rc out today, if I don't find problems
>>> with the stats changes.
>>>
>>> On Thu, Mar 29, 2018 at 4:18 PM, Henry Robinson 
>>> wrote:
>>>
>>> > Hi all -
>>> >
>>> > While using Spark, I got hit by PARQUET-1217 today on some data
>>> written by
>>> > Impala. This is a pretty nasty bug, and one that affects Apache Spark
>>> right
>>> > now because, AFAICT, there's no release to move to that contains the
>>> fix,
>>> > and parquet-mr 1.9.0 is affected. There is a workaround, but it's
>>> expensive
>>> > in terms of lost performance.
>>> >
>>> > I'm new to the community, so wanted to see if there was a plan to make
>>> a
>>> > release (1.9.1?) in the near future. I'd rather that than have to build
>>> > short-term workarounds into Spark.
>>> >
>>> > Best,
>>> > Henry
>>> >
>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>>
>> --
>> Henry Robinson
>> Software Engineer
>> Cloudera
>> 415-994-6679 <(415)%20994-6679>
>>
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>



-- 
Ryan Blue
Software Engineer
Netflix


[jira] [Commented] (PARQUET-1265) Segfault on static ApplicationVersion initialization

2018-04-05 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426871#comment-16426871
 ] 

Uwe L. Korn commented on PARQUET-1265:
--

[~llchan] It is a known problem that statically linking {{parquet-cpp}} with 
Boost is a bit brittle and thus we changed the linking of it in all Arrow 
builds to shared Boost libraries. Nevertheless, if the above fixes the problem 
for you, it would be nice if you could make a PR with the change and a short 
comment why this is needed.

> Segfault on static ApplicationVersion initialization
> 
>
> Key: PARQUET-1265
> URL: https://issues.apache.org/jira/browse/PARQUET-1265
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Affects Versions: cpp-1.4.0
>Reporter: Lawrence Chan
>Priority: Major
>
> I'm seeing a segfault when I link/run with a shared libparquet.so with 
> statically linked boost. Given the backtrace, it seems that this is due to 
> the static ApplicationVersion constants, likely due to some static 
> initialization order issue. The problem goes away if I turn those static vars 
> into static funcs returning function-local statics.
> Backtrace:
> {code}
> #0  0x7753cf8b in std::basic_string std::allocator >::basic_string(std::string const&) () from 
> /lib64/libstdc++.so.6
> #1  0x77aeae9c in 
> boost::re_detail_106600::cpp_regex_traits_char_layer::init() () from 
> debug/libparquet.so.1
> #2  0x77adcc2b in 
> boost::object_cache boost::re_detail_106600::cpp_regex_traits_implementation 
> >::do_get(boost::re_detail_106600::cpp_regex_traits_base const&, 
> unsigned long) () from debug/libparquet.so.1
> #3  0x77ae9023 in boost::basic_regex boost::cpp_regex_traits > >::do_assign(char const*, char const*, 
> unsigned int) () from debug/libparquet.so.1
> #4  0x77a5ed98 in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff5580, 
> p1=0x77af66d8 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  p2=0x77af6720 "", f=0) at 
> /tmp/boost-1.66.0/include/boost/regex/v4/basic_regex.hpp:381
> #5  0x77a5b653 in boost::basic_regex boost::cpp_regex_traits > >::assign (this=0x7fff5580, 
> p=0x77af66d8 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0) at /tmp/boost-1.66.0/include/boost/regex/v4/basic_regex.hpp:366
> #6  0x77a57049 in boost::basic_regex boost::cpp_regex_traits > >::basic_regex (this=0x7fff5580, 
> p=0x77af66d8 
> "(.*?)\\s*(?:(version\\s*(?:([^(]*?)\\s*(?:\\(\\s*build\\s*([^)]*?)\\s*\\))?)?)?)",
>  f=0) at /tmp/boost-1.66.0/include/boost/regex/v4/basic_regex.hpp:335
> #7  0x77a4fa1f in parquet::ApplicationVersion::ApplicationVersion 
> (this=0x77ddbfc0 
> , 
> created_by="parquet-mr version 1.8.0") at 
> /tmp/parquet-cpp-apache-parquet-cpp-1.4.0/src/parquet/metadata.cc:477
> #8  0x77a516c5 in __static_initialization_and_destruction_0 
> (__initialize_p=1, __priority=65535) at 
> /tmp/parquet-cpp-apache-parquet-cpp-1.4.0/src/parquet/metadata.cc:58
> #9  0x77a5179e in _GLOBAL__sub_I_metadata.cc(void) () at 
> /tmp/parquet-cpp-apache-parquet-cpp-1.4.0/src/parquet/metadata.cc:913
> #10 0x77dec1e3 in _dl_init_internal () from 
> /lib64/ld-linux-x86-64.so.2
> #11 0x77dde21a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
> #12 0x0001 in ?? ()
> #13 0x7fff5ff5 in ?? ()
> #14 0x in ?? ()
> {code}
> Versions:
> - gcc-4.8.5
> - boost-1.66.0
> - parquet-cpp-1.4.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426863#comment-16426863
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-378924606
 
 
   @costimuraru @lukasnalezenec do we have an agreement? Can we merge the 2 
PRs? 
   These changes have been in prod here for some time, and I would like them to 
be merged so that we can start to work on other tickets that would otherwise 
require an expansive rebase after this one is merged.
   
   @virtualluke These changes are backward compatible for reading so you won't 
need to set any flag when reading. Once you have read the data, you will be 
able to write them using the "specs-compliant" schemas using the 
`parquet.proto.writeSpecsCompliant` flag


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426861#comment-16426861
 ] 

ASF GitHub Bot commented on PARQUET-968:


BenoitHanotte commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-378924606
 
 
   @costimuraru do we have an agreement? Can we merge the 2 PRs? 
   These changes have been in prod here for some time, and I would like them to 
be merged so that we can start to work on other tickets that would otherwise 
require an expansive rebase after this one is merged.
   
   @virtualluke These changes are backward compatible for reading so you won't 
need to set any flag when reading. Once you have read the data, you will be 
able to write them using the "specs-compliant" schemas using the 
`parquet.proto.writeSpecsCompliant` flag


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-968) Add Hive/Presto support in ProtoParquet

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426797#comment-16426797
 ] 

ASF GitHub Bot commented on PARQUET-968:


virtualluke commented on issue #411: PARQUET-968 Add Hive/Presto support in 
ProtoParquet
URL: https://github.com/apache/parquet-mr/pull/411#issuecomment-378911550
 
 
   When this is implemented there will be a flag where I could read parquet 
files which have the 2-level repetition style and convert them to the 3-level 
style so they would be compatible with the current state of parquet parsing 
libraries built with parquet-cpp (thinking pyarrow here) ?  Does this look like 
it will be merged soon?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Add Hive/Presto support in ProtoParquet
> ---
>
> Key: PARQUET-968
> URL: https://issues.apache.org/jira/browse/PARQUET-968
> Project: Parquet
>  Issue Type: Task
>Reporter: Constantin Muraru
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1266) LogicalTypes union in parquet-format doesn't include UUID

2018-04-05 Thread Nandor Kollar (JIRA)

 [ 
https://issues.apache.org/jira/browse/PARQUET-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar updated PARQUET-1266:
---
Priority: Minor  (was: Major)

> LogicalTypes union in parquet-format doesn't include UUID
> -
>
> Key: PARQUET-1266
> URL: https://issues.apache.org/jira/browse/PARQUET-1266
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Minor
>
> parquet-format new logical type representation doesn't include UUID type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1266) LogicalTypes union in parquet-format doesn't include UUID

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426604#comment-16426604
 ] 

ASF GitHub Bot commented on PARQUET-1266:
-

nandorKollar opened a new pull request #93: PARQUET-1266: LogicalTypes union in 
parquet-format doesn't include UUID
URL: https://github.com/apache/parquet-format/pull/93
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> LogicalTypes union in parquet-format doesn't include UUID
> -
>
> Key: PARQUET-1266
> URL: https://issues.apache.org/jira/browse/PARQUET-1266
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> parquet-format new logical type representation doesn't include UUID type



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-05 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426584#comment-16426584
 ] 

Nandor Kollar commented on PARQUET-1253:


Thanks Ryan for clarifying my questions!

> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426582#comment-16426582
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r179373718
 
 

 ##
 File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ParquetMetadata.java
 ##
 @@ -41,6 +40,10 @@
 
   private static final ObjectMapper objectMapper = new ObjectMapper();
 
+  static {
+objectMapper.configure(SerializationConfig.Feature.FAIL_ON_EMPTY_BEANS, 
false);
 
 Review comment:
   Sure, no problem, I will.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)