[jira] [Assigned] (PARQUET-1353) The random data generator used for tests repeats the same value over and over again

2019-12-02 Thread Zoltan Ivanfi (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1353:
--

Assignee: (was: Zoltan Ivanfi)

> The random data generator used for tests repeats the same value over and over 
> again
> ---
>
> Key: PARQUET-1353
> URL: https://issues.apache.org/jira/browse/PARQUET-1353
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Zoltan Ivanfi
>Priority: Minor
>  Labels: pull-request-available
>
> The RandomValues class returns references to its internal buffer as random 
> values. This buffer gets a random value every time a new random value is 
> requested, but since earlier values reference the same internal buffer, they 
> get changed to the same value as well. So even if successive calls return 
> different values each time, the actual list of these values will always 
> consist of a single value repeated multiple times. For example:
> ||n-th call||returned value||accumulated list expected||accumulated list 
> actual||
> |1|6C|6C|6C|
> |2|8F|6C 8F|8F 8F|
> |3|52|6C 8F 52|52 52 52|
> |4|B8|6C 8F 52 B8|B8 B8 B8 B8|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (PARQUET-1337) Current block alignment logic may lead to several row groups per block

2019-09-26 Thread Zoltan Ivanfi (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1337:
--

Assignee: (was: Zoltan Ivanfi)

> Current block alignment logic may lead to several row groups per block
> --
>
> Key: PARQUET-1337
> URL: https://issues.apache.org/jira/browse/PARQUET-1337
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> When the size of buffered data gets near the desired row group size, Parquet 
> flushes the data to a row group. However, at this point the data for the last 
> page is not yet encoded nor compressed, thereby the row group may end up 
> being significantly smaller than it was intended.
> If the row group ends up being so small that it is farther away from the next 
> disk block boundary than the maximum padding, Parquet will try to create a 
> new group in the same disk block, this time targeting the remaning space. 
> This may also be flushed prematurely, leading to the creation of an even 
> smaller row group, which may lead to an even smaller one... This gets 
> repeated until we get sufficiently close to the block boundary so that 
> padding can be finally applied. The resulting superflous row groups can lead 
> to bad read performance.
> An example of the structure of a Parquet file suffering from this problem can 
> be seen below. For easier interpretation, the row groups are visually grouped 
> by disk blocks:
> {noformat}
> row group 1:  RC:18774 TS:22182960 OFFSET:   4
> row group 2:  RC: 2896 TS: 3428160 OFFSET: 6574564
> row group 3:  RC: 1964 TS: 2322560 OFFSET: 7679844
> row group 4:  RC: 1074 TS: 1268880 OFFSET: 8732964
> {noformat}
> {noformat}
> row group 5:  RC:18808 TS:8560 OFFSET:1000
> row group 6:  RC: 2872 TS: 3389520 OFFSET:16612640
> row group 7:  RC: 1930 TS: 2284960 OFFSET:17716800
> row group 8:  RC: 1040 TS: 1233520 OFFSET:18768240
> {noformat}
> {noformat}
> row group 9:  RC:18852 TS:22275520 OFFSET:2000
> row group 10: RC: 2831 TS: 3345680 OFFSET:26656320
> row group 11: RC: 1893 TS: 2244640 OFFSET:27757200
> row group 12: RC: 1008 TS: 1195520 OFFSET:28806560
> {noformat}
> {noformat}
> row group 13: RC:18841 TS:22263360 OFFSET:3000
> row group 14: RC: 2835 TS: 3350480 OFFSET:36652000
> row group 15: RC: 1900 TS: 2249040 OFFSET:37753600
> row group 16: RC: 1016 TS: 1198640 OFFSET:38803600
> {noformat}
> {noformat}
> row group 17: RC: 1466 TS: 1740320 OFFSET:4000
> {noformat}
> In this example, both the disk block size and the row group size was set to 
> 1000. The data would fit in 5 row groups of this size, but instead, each 
> of the disk blocks (except the last) is split into 4 row groups of 
> progressively decreasing size.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-1628) Accept local timestamps annotated with the legacy timestamp types

2019-07-18 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1628:
--

 Summary: Accept local timestamps annotated with the legacy 
timestamp types
 Key: PARQUET-1628
 URL: https://issues.apache.org/jira/browse/PARQUET-1628
 Project: Parquet
  Issue Type: Task
  Components: parquet-mr
Reporter: Zoltan Ivanfi
Assignee: Nandor Kollar


The rules for TIMESTAMP forward-compatibility were created based on the 
assumption that TIMESTAMP_MILLIS and TIMESTAMP_MICROS have only been used in 
the instant aka. UTC-normalized semantics so far.

>From this false premise it followed that TIMESTAMPs with local semantics were 
>a new type and did not need to be annotated with the old types to maintain 
>compatibility. In fact, annotating them with the old types were considered to 
>be harmful, since it would have mislead older readers into thinking that they 
>can read TIMESTAMPs with local semantics, when in reality they would have 
>misinterpreted them as TIMESTAMPs with instant semantics. This would have lead 
>to a difference of several hours, corresponding to the time zone offset.

In reality, however, this misinterpretation of timestamps has already been 
going on for a while, since Arrow annotates local timestamps with 
TIMESTAMP_MILLIS or TIMESTMAP_MICROS.

To maintain forward compatibilty of local timestamps, Arrow annotates them with 
the legacy timestamp logical types. However, the Java library considers these 
logical types to be incompatible and discards the new type in favour of the 
legacy ones (since doing the other way around would change the behaviour). 
Parquet-mr should be updated so that it accepts this combination of new and old 
logical types.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (PARQUET-1627) Update specification so that legacy timestamp logical types can be written for local semantics as well

2019-07-18 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1627:
--

 Summary: Update specification so that legacy timestamp logical 
types can be written for local semantics as well
 Key: PARQUET-1627
 URL: https://issues.apache.org/jira/browse/PARQUET-1627
 Project: Parquet
  Issue Type: Task
  Components: parquet-format
Reporter: Zoltan Ivanfi
Assignee: Nandor Kollar


The rules for TIMESTAMP forward-compatibility were created based on the 
assumption that TIMESTAMP_MILLIS and TIMESTAMP_MICROS have only been used in 
the instant aka. UTC-normalized semantics so far.

>From this false premise it followed that TIMESTAMPs with local semantics were 
>a new type and did not need to be annotated with the old types to maintain 
>compatibility. In fact, annotating them with the old types were considered to 
>be harmful, since it would have mislead older readers into thinking that they 
>can read TIMESTAMPs with local semantics, when in reality they would have 
>misinterpreted them as TIMESTAMPs with instant semantics. This would have lead 
>to a difference of several hours, corresponding to the time zone offset.

In reality, however, this misinterpretation of timestamps has already been 
going on for a while, since Arrow annotates local timestamps with 
TIMESTAMP_MILLIS or TIMESTMAP_MICROS.

To maintain forward compatibilty of local timestamps, the specification should 
allow annotating them with the legacy timestamp logical types.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (PARQUET-1222) Specify a well-defined sorting order for float and double types

2019-07-10 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1222:
---
Description: 
Currently parquet-format specifies the sort order for floating point numbers as 
follows:
{code:java}
   *   FLOAT - signed comparison of the represented value
   *   DOUBLE - signed comparison of the represented value
{code}
The problem is that the comparison of floating point numbers is only a partial 
ordering with strange behaviour in specific corner cases. For example, 
according to IEEE 754, -0 is neither less nor more than \+0 and comparing NaN 
to anything always returns false. This ordering is not suitable for statistics. 
Additionally, the Java implementation already uses a different (total) ordering 
that handles these cases correctly but differently than the C\+\+ 
implementations, which leads to interoperability problems.

TypeDefinedOrder for doubles and floats should be deprecated and a new 
TotalFloatingPointOrder should be introduced. The default for writing doubles 
and floats would be the new TotalFloatingPointOrder. This ordering should be 
effective and easy to implement in all programming languages.

  was:
Currently parquet-format specifies the sort order for floating point numbers as 
follows:
{code:java}
   *   FLOAT - signed comparison of the represented value
   *   DOUBLE - signed comparison of the represented value
{code}
The problem is that the comparison of floating point numbers is only a partial 
ordering with strange behaviour in specific corner cases. For example, 
according to IEEE 754, -0 is neither less nor more than +0 and comparing NaN to 
anything always returns false. This ordering is not suitable for statistics. 
Additionally, the Java implementation already uses a different (total) ordering 
that handles these cases correctly but differently than the C++ 
implementations, which leads to interoperability problems.

TypeDefinedOrder for doubles and floats should be deprecated and a new 
TotalFloatingPointOrder should be introduced. The default for writing doubles 
and floats would be the new TotalFloatingPointOrder. This ordering should be 
effective and easy to implement in all programming languages.


> Specify a well-defined sorting order for float and double types
> ---
>
> Key: PARQUET-1222
> URL: https://issues.apache.org/jira/browse/PARQUET-1222
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Zoltan Ivanfi
>Priority: Critical
>
> Currently parquet-format specifies the sort order for floating point numbers 
> as follows:
> {code:java}
>*   FLOAT - signed comparison of the represented value
>*   DOUBLE - signed comparison of the represented value
> {code}
> The problem is that the comparison of floating point numbers is only a 
> partial ordering with strange behaviour in specific corner cases. For 
> example, according to IEEE 754, -0 is neither less nor more than \+0 and 
> comparing NaN to anything always returns false. This ordering is not suitable 
> for statistics. Additionally, the Java implementation already uses a 
> different (total) ordering that handles these cases correctly but differently 
> than the C\+\+ implementations, which leads to interoperability problems.
> TypeDefinedOrder for doubles and floats should be deprecated and a new 
> TotalFloatingPointOrder should be introduced. The default for writing doubles 
> and floats would be the new TotalFloatingPointOrder. This ordering should be 
> effective and easy to implement in all programming languages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1588) Bump Apache Thrift to 0.12.0 in parquet-format

2019-06-12 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1588.

Resolution: Fixed

> Bump Apache Thrift to 0.12.0 in parquet-format
> --
>
> Key: PARQUET-1588
> URL: https://issues.apache.org/jira/browse/PARQUET-1588
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Fix For: format-2.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1588) Bump Apache Thrift to 0.12.0 in parquet-format

2019-06-12 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1588:
---
Fix Version/s: format-2.7.0

> Bump Apache Thrift to 0.12.0 in parquet-format
> --
>
> Key: PARQUET-1588
> URL: https://issues.apache.org/jira/browse/PARQUET-1588
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Fix For: format-2.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1588) Bump Apache Thrift to 0.12.0 in parquet-format

2019-06-12 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861932#comment-16861932
 ] 

Zoltan Ivanfi commented on PARQUET-1588:


It already existed, just not as "2.7.0" but as "format-2.7.0" instead.

> Bump Apache Thrift to 0.12.0 in parquet-format
> --
>
> Key: PARQUET-1588
> URL: https://issues.apache.org/jira/browse/PARQUET-1588
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Fix For: format-2.7.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (PARQUET-1588) Bump Apache Thrift to 0.12.0

2019-06-12 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reopened PARQUET-1588:


As we discussed, let's stick to your original approach of separate JIRA-s for 
parquet-mr and parquet-format to better track what gets released in which 
version.

> Bump Apache Thrift to 0.12.0
> 
>
> Key: PARQUET-1588
> URL: https://issues.apache.org/jira/browse/PARQUET-1588
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1588) Bump Apache Thrift to 0.12.0 in parquet-format

2019-06-12 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1588:
---
Summary: Bump Apache Thrift to 0.12.0 in parquet-format  (was: Bump Apache 
Thrift to 0.12.0)

> Bump Apache Thrift to 0.12.0 in parquet-format
> --
>
> Key: PARQUET-1588
> URL: https://issues.apache.org/jira/browse/PARQUET-1588
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1590) [parquet-format] Add Java 11 to Travis

2019-06-11 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1590:
---
Summary: [parquet-format] Add Java 11 to Travis  (was: Build against Java 
11)

> [parquet-format] Add Java 11 to Travis
> --
>
> Key: PARQUET-1590
> URL: https://issues.apache.org/jira/browse/PARQUET-1590
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (PARQUET-1590) Build against Java 11

2019-06-11 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reopened PARQUET-1590:


> Build against Java 11
> -
>
> Key: PARQUET-1590
> URL: https://issues.apache.org/jira/browse/PARQUET-1590
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1499) [parquet-mr] Add Java 11 to Travis

2019-06-11 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1499:
---
Summary: [parquet-mr] Add Java 11 to Travis  (was: Add Java 11 build to the 
repository)

> [parquet-mr] Add Java 11 to Travis
> --
>
> Key: PARQUET-1499
> URL: https://issues.apache.org/jira/browse/PARQUET-1499
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1585) Update old external links in the code base

2019-05-24 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1585.

   Resolution: Fixed
Fix Version/s: 1.11.0

> Update old external links in the code base
> --
>
> Key: PARQUET-1585
> URL: https://issues.apache.org/jira/browse/PARQUET-1585
> Project: Parquet
>  Issue Type: Task
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1585) Update old external links in the code base

2019-05-24 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1585:
--

 Summary: Update old external links in the code base
 Key: PARQUET-1585
 URL: https://issues.apache.org/jira/browse/PARQUET-1585
 Project: Parquet
  Issue Type: Task
Reporter: Zoltan Ivanfi
Assignee: Zoltan Ivanfi






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1572) Clarify the definition of timestamp types

2019-05-09 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1572:
--

 Summary: Clarify the definition of timestamp types
 Key: PARQUET-1572
 URL: https://issues.apache.org/jira/browse/PARQUET-1572
 Project: Parquet
  Issue Type: Task
  Components: parquet-format
Reporter: Zoltan Ivanfi
Assignee: Zoltan Ivanfi


The current definition only makes sense for the isUtcAdjusted=true case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PARQUET-1496) [Java] Update Scala for JDK 11 compatibility

2019-05-03 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801910#comment-16801910
 ] 

Zoltan Ivanfi edited comment on PARQUET-1496 at 5/3/19 3:01 PM:


There seems to be an unresolvable circular incompatibility issue here:
 * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12
 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 
19.
 * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons:
 ** It doesn't handle one of our empty structs correctly. Update: This turned 
out to be due to using a javadoc-style comment in the empty struct.
 ** It doesn't handle the {{String}} logical type correctly, because in the 
code it generates it does not use fully qualified names. Since the name of this 
logical type shadows the stock String type, this leads to a compilation failure 
in the generated {{LogicalType.scala}} file.


was (Author: zi):
There seems to be an unresolvable circular incompatibility issue here:
 * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12
 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 
19.
 * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons:
 ** It doesn't handle empty structs correctly. For further experimentation, 
this can be hacked around by changing each
{noformat}
struct whatever {}
{noformat}
to
{noformat}
struct whatever {32767: optional i32 dummy;}
{noformat}
 ** It doesn't handle the {{String}} logical type correctly, because in the 
code it generates it does not use fully qualified names. Since the name of this 
logical type shadows the stock String type, this leads to a compilation failure 
in the generated {{LogicalType.scala}} file.

> [Java] Update Scala for JDK 11 compatibility
> 
>
> Key: PARQUET-1496
> URL: https://issues.apache.org/jira/browse/PARQUET-1496
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, 
> the build fails for me in {{parquet-scala}} with:
> {code:java}
> [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 
> ---
> [INFO] Checking for multiple versions of scala
> [INFO] includes = [**/*.java,**/*.scala,]
> [INFO] excludes = []
> [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: 
> info: compiling
> [INFO] Compiling 1 source files to 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at 
> 1547922718010
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class)
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class)
> [ERROR] error: scala.reflect.internal.MissingRequirementError: object 
> java.lang.Object in compiler mirror not found.
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407)
> [INFO] at 
> 

[jira] [Updated] (PARQUET-1556) Problem with Maven repo specifications in POMs of dependencies in some development environments

2019-04-04 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1556:
---
Description: 
Running {{mvn verify}} based on the instructions in the README results in this 
error
{code:java}
Could not resolve dependencies for project 
org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
As a workaround, the local {{~/.m2/settings.xml}} file can be modified to 
include the twitter maven repo:

{code:java}

  twitter
  twitter
  http://maven.twttr.com

{code}

After adding this, {{mvn verify}} works. This should not be necessary though, 
since the artifact is a transitive dependency and the POM of the direct 
dependency (elephant-bird) contains the repo specification, which works in most 
environments.


  was:
Running {{mvn verify}} based on the instructions in the README results in this 
error
{code:java}
Could not resolve dependencies for project 
org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
As a workaround, the local {{~/.m2/settings.xml}} file can be modified to 
include the twitter maven repo:

{code:java}

  twitter
  twitter
  http://maven.twttr.com

{code}

After adding this, {{mvn verify}} works. The proper solution, however, is to 
include this repo in the POM files.



> Problem with Maven repo specifications in POMs of dependencies in some 
> development environments
> ---
>
> Key: PARQUET-1556
> URL: https://issues.apache.org/jira/browse/PARQUET-1556
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.12.0
>
>
> Running {{mvn verify}} based on the instructions in the README results in 
> this error
> {code:java}
> Could not resolve dependencies for project 
> org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
> com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
> As a workaround, the local {{~/.m2/settings.xml}} file can be modified to 
> include the twitter maven repo:
> {code:java}
> 
>   twitter
>   twitter
>   http://maven.twttr.com
> 
> {code}
> After adding this, {{mvn verify}} works. This should not be necessary though, 
> since the artifact is a transitive dependency and the POM of the direct 
> dependency (elephant-bird) contains the repo specification, which works in 
> most environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1556) Problem with Maven repo specifications in POMs of dependencies in some development environments

2019-04-04 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1556:
---
Summary: Problem with Maven repo specifications in POMs of dependencies in 
some development environments  (was: Add twitter maven repo to POM for 
hadoop-lzo dependency)

> Problem with Maven repo specifications in POMs of dependencies in some 
> development environments
> ---
>
> Key: PARQUET-1556
> URL: https://issues.apache.org/jira/browse/PARQUET-1556
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.12.0
>
>
> Running {{mvn verify}} based on the instructions in the README results in 
> this error
> {code:java}
> Could not resolve dependencies for project 
> org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
> com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
> As a workaround, the local {{~/.m2/settings.xml}} file can be modified to 
> include the twitter maven repo:
> {code:java}
> 
>   twitter
>   twitter
>   http://maven.twttr.com
> 
> {code}
> After adding this, {{mvn verify}} works. The proper solution, however, is to 
> include this repo in the POM files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency

2019-04-04 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809841#comment-16809841
 ] 

Zoltan Ivanfi commented on PARQUET-1556:


I came to the conclusion that the only possible source of the twitter repo is 
the [POM of the elephant-bird 
dependecy|https://github.com/twitter/elephant-bird/blob/master/pom.xml#L94]. 
However, I have no idea why this doesn't happen for you, [~andygrove]. I have 
tried it with both Maven 3.5.2 and 3.6.0 and both are able to download the 
transitive dependecy. What version do you use?

> Add twitter maven repo to POM for hadoop-lzo dependency
> ---
>
> Key: PARQUET-1556
> URL: https://issues.apache.org/jira/browse/PARQUET-1556
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.12.0
>
>
> Running {{mvn verify}} based on the instructions in the README results in 
> this error
> {code:java}
> Could not resolve dependencies for project 
> org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
> com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
> As a workaround, the local {{~/.m2/settings.xml}} file can be modified to 
> include the twitter maven repo:
> {code:java}
> 
>   twitter
>   twitter
>   http://maven.twttr.com
> 
> {code}
> After adding this, {{mvn verify}} works. The proper solution, however, is to 
> include this repo in the POM files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency

2019-04-04 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809841#comment-16809841
 ] 

Zoltan Ivanfi edited comment on PARQUET-1556 at 4/4/19 1:16 PM:


I came to the conclusion that the only possible source of the twitter repo is 
the [POM of the elephant-bird 
dependecy|https://github.com/twitter/elephant-bird/blob/master/pom.xml#L94]. 
However, I have no idea why this doesn't work for you, [~andygrove]. I have 
tried it with both Maven 3.5.2 and 3.6.0 and both are able to download the 
transitive dependecy. What version do you use?


was (Author: zi):
I came to the conclusion that the only possible source of the twitter repo is 
the [POM of the elephant-bird 
dependecy|https://github.com/twitter/elephant-bird/blob/master/pom.xml#L94]. 
However, I have no idea why this doesn't happen for you, [~andygrove]. I have 
tried it with both Maven 3.5.2 and 3.6.0 and both are able to download the 
transitive dependecy. What version do you use?

> Add twitter maven repo to POM for hadoop-lzo dependency
> ---
>
> Key: PARQUET-1556
> URL: https://issues.apache.org/jira/browse/PARQUET-1556
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.12.0
>
>
> Running {{mvn verify}} based on the instructions in the README results in 
> this error
> {code:java}
> Could not resolve dependencies for project 
> org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
> com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
> As a workaround, the local {{~/.m2/settings.xml}} file can be modified to 
> include the twitter maven repo:
> {code:java}
> 
>   twitter
>   twitter
>   http://maven.twttr.com
> 
> {code}
> After adding this, {{mvn verify}} works. The proper solution, however, is to 
> include this repo in the POM files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency

2019-04-03 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1556:
---
Description: 
Running {{mvn verify}} based on the instructions in the README results in this 
error
{code:java}
Could not resolve dependencies for project 
org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
As a workaround, the local {{~/.m2/settings.xml}} file can be modified to 
include the twitter maven repo:

{code:java}

  twitter
  twitter
  http://maven.twttr.com

{code}

After adding this, {{mvn verify}} works. The proper solution, however, is to 
include this repo in the POM files.


  was:
Running mvn verify based on the instructions in the README results in this error
{code:java}
Could not resolve dependencies for project 
org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
As a workaround, the local ~/.m2/settings.xml file can be modified to include 
the twitter maven repo:

{code:java}

  twitter
  twitter
  http://maven.twttr.com

{code}

After adding this, {{mvn verify}} works. The proper solution, however, is to 
include this repo in the POM files.



> Add twitter maven repo to POM for hadoop-lzo dependency
> ---
>
> Key: PARQUET-1556
> URL: https://issues.apache.org/jira/browse/PARQUET-1556
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.12.0
>
>
> Running {{mvn verify}} based on the instructions in the README results in 
> this error
> {code:java}
> Could not resolve dependencies for project 
> org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
> com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
> As a workaround, the local {{~/.m2/settings.xml}} file can be modified to 
> include the twitter maven repo:
> {code:java}
> 
>   twitter
>   twitter
>   http://maven.twttr.com
> 
> {code}
> After adding this, {{mvn verify}} works. The proper solution, however, is to 
> include this repo in the POM files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency

2019-04-03 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808877#comment-16808877
 ] 

Zoltan Ivanfi commented on PARQUET-1556:


That's a very good point, thanks for raising it. We don't use Hadoop-LZO 
ourselves. Running {{mvn dependency:tree}} shows that this is a compile-time 
transitive dependecy:
{code}
[INFO] org.apache.parquet:parquet-thrift:jar:1.12.0-SNAPSHOT
[INFO] +- com.twitter.elephantbird:elephant-bird-core:jar:4.4:compile
[INFO] |  \- com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16:compile
{code}

Before adding it to the POM we have to understand:
* Why it can be downloaded for most people even without a corresponding repo 
entry.
* Why it fails for others.
* What it would mean to add the repo to the POM (would it lead to shipping a 
GPL dependency).
* Can we avoid pulling this in all together?

> Add twitter maven repo to POM for hadoop-lzo dependency
> ---
>
> Key: PARQUET-1556
> URL: https://issues.apache.org/jira/browse/PARQUET-1556
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.12.0
>
>
> Running mvn verify based on the instructions in the README results in this 
> error
> {code:java}
> Could not resolve dependencies for project 
> org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
> com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
> As a workaround, the local ~/.m2/settings.xml file can be modified to include 
> the twitter maven repo:
> {code:java}
> 
>   twitter
>   twitter
>   http://maven.twttr.com
> 
> {code}
> After adding this, {{mvn verify}} works. The proper solution, however, is to 
> include this repo in the POM files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency

2019-04-03 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1556:
---
Description: 
Running mvn verify based on the instructions in the README results in this error
{code:java}
Could not resolve dependencies for project 
org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
As a workaround, the local ~/.m2/settings.xml file can be modified to include 
the twitter maven repo:

{code:java}

  twitter
  twitter
  http://maven.twttr.com

{code}

After adding this, {{mvn verify}} works. The proper solution, however, is to 
include this repo in the POM files.


  was:
Running mvn verify based on the instructions in the README results in this error
{code:java}
Could not resolve dependencies for project 
org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
To fix this, it was necessary to configure my local ~/.m2/settings.xml to 
include the twitter maven repo:
{code:java}


twitter

twitter

http://maven.twttr.com

{code}
After adding this, mvn verify worked.

We should add these instructions to the README.


> Add twitter maven repo to POM for hadoop-lzo dependency
> ---
>
> Key: PARQUET-1556
> URL: https://issues.apache.org/jira/browse/PARQUET-1556
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.12.0
>
>
> Running mvn verify based on the instructions in the README results in this 
> error
> {code:java}
> Could not resolve dependencies for project 
> org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
> com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
> As a workaround, the local ~/.m2/settings.xml file can be modified to include 
> the twitter maven repo:
> {code:java}
> 
>   twitter
>   twitter
>   http://maven.twttr.com
> 
> {code}
> After adding this, {{mvn verify}} works. The proper solution, however, is to 
> include this repo in the POM files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency

2019-04-03 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1556:
---
Summary: Add twitter maven repo to POM for hadoop-lzo dependency  (was: 
Instructions are missing for configuring twitter maven repo for hadoop-lzo 
dependency)

> Add twitter maven repo to POM for hadoop-lzo dependency
> ---
>
> Key: PARQUET-1556
> URL: https://issues.apache.org/jira/browse/PARQUET-1556
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.12.0
>
>
> Running mvn verify based on the instructions in the README results in this 
> error
> {code:java}
> Could not resolve dependencies for project 
> org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
> com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
> To fix this, it was necessary to configure my local ~/.m2/settings.xml to 
> include the twitter maven repo:
> {code:java}
> 
> twitter
> twitter
> http://maven.twttr.com
> {code}
> After adding this, mvn verify worked.
> We should add these instructions to the README.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1556) Instructions are missing for configuring twitter maven repo for hadoop-lzo dependency

2019-04-03 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808851#comment-16808851
 ] 

Zoltan Ivanfi commented on PARQUET-1556:


Now that is strange. If I issue this command:

{code}
mvn dependency:get -Dartifact=com.hadoop.gplcompression:hadoop-lzo:0.4.16
{code}

I get the same error. But it is still able to download the artifact somehow 
when I run the following:

{code}
mvn -Dmaven.repo.local=/tmp/fresh-clean-empty-local-repo clean install 
-DskipTests | grep hadoop-lzo
Downloading from jitpack.io: 
https://jitpack.io/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.pom
Downloading from central: 
https://repo.maven.apache.org/maven2/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.pom
Downloading from twitter: 
http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.pom
Downloaded from twitter: 
http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.pom
 (1.3 kB at 1.3 kB/s)
Downloading from jitpack.io: 
https://jitpack.io/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar
Downloading from central: 
https://repo.maven.apache.org/maven2/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar
Downloading from twitter: 
http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar
Downloaded from twitter: 
http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar
 (63 kB at 86 kB/s)
{code}

> Instructions are missing for configuring twitter maven repo for hadoop-lzo 
> dependency
> -
>
> Key: PARQUET-1556
> URL: https://issues.apache.org/jira/browse/PARQUET-1556
> Project: Parquet
>  Issue Type: Improvement
>Affects Versions: 1.11.0
>Reporter: Andy Grove
>Assignee: Andy Grove
>Priority: Major
> Fix For: 1.12.0
>
>
> Running mvn verify based on the instructions in the README results in this 
> error
> {code:java}
> Could not resolve dependencies for project 
> org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact 
> com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code}
> To fix this, it was necessary to configure my local ~/.m2/settings.xml to 
> include the twitter maven repo:
> {code:java}
> 
> twitter
> twitter
> http://maven.twttr.com
> {code}
> After adding this, mvn verify worked.
> We should add these instructions to the README.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PARQUET-1496) [Java] Update Scala for JDK 11 compatibility

2019-03-27 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803012#comment-16803012
 ] 

Zoltan Ivanfi edited comment on PARQUET-1496 at 3/27/19 4:55 PM:
-

According to 
[https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html,] Scala 
2.10.7 also supports JDK 11, which may provide a resolution for this problem. 
Update: No, it doesn't. Parquet-scrooge doesn't compile with Scala 2.10.7 
either.


was (Author: zi):
According to 
[https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html,] Scala 
2.10.7 also supports JDK 11, which may provide a resolution for this problem.

> [Java] Update Scala for JDK 11 compatibility
> 
>
> Key: PARQUET-1496
> URL: https://issues.apache.org/jira/browse/PARQUET-1496
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, 
> the build fails for me in {{parquet-scala}} with:
> {code:java}
> [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 
> ---
> [INFO] Checking for multiple versions of scala
> [INFO] includes = [**/*.java,**/*.scala,]
> [INFO] excludes = []
> [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: 
> info: compiling
> [INFO] Compiling 1 source files to 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at 
> 1547922718010
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class)
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class)
> [ERROR] error: scala.reflect.internal.MissingRequirementError: object 
> java.lang.Object in compiler mirror not found.
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass(Definitions.scala:407)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1154)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261)
> [INFO] at scala.tools.nsc.Global$Run.(Global.scala:1290)
> [INFO] at scala.tools.nsc.Driver.doCompile(Driver.scala:32)
> [INFO] at scala.tools.nsc.Main$.doCompile(Main.scala:79)
> [INFO] at scala.tools.nsc.Driver.process(Driver.scala:54)
> [INFO] at scala.tools.nsc.Driver.main(Driver.scala:67)
> [INFO] at scala.tools.nsc.Main.main(Main.scala)
> [INFO] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [INFO] at 
> 

[jira] [Updated] (PARQUET-1496) [Java] Update Scala for JDK 11 compatibility

2019-03-27 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1496:
---
Summary: [Java] Update Scala for JDK 11 compatibility  (was: [Java] Update 
Scala to 2.12)

> [Java] Update Scala for JDK 11 compatibility
> 
>
> Key: PARQUET-1496
> URL: https://issues.apache.org/jira/browse/PARQUET-1496
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, 
> the build fails for me in {{parquet-scala}} with:
> {code:java}
> [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 
> ---
> [INFO] Checking for multiple versions of scala
> [INFO] includes = [**/*.java,**/*.scala,]
> [INFO] excludes = []
> [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: 
> info: compiling
> [INFO] Compiling 1 source files to 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at 
> 1547922718010
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class)
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class)
> [ERROR] error: scala.reflect.internal.MissingRequirementError: object 
> java.lang.Object in compiler mirror not found.
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass(Definitions.scala:407)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1154)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261)
> [INFO] at scala.tools.nsc.Global$Run.(Global.scala:1290)
> [INFO] at scala.tools.nsc.Driver.doCompile(Driver.scala:32)
> [INFO] at scala.tools.nsc.Main$.doCompile(Main.scala:79)
> [INFO] at scala.tools.nsc.Driver.process(Driver.scala:54)
> [INFO] at scala.tools.nsc.Driver.main(Driver.scala:67)
> [INFO] at scala.tools.nsc.Main.main(Main.scala)
> [INFO] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [INFO] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [INFO] at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [INFO] at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> [INFO] at 
> org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
> [INFO] at 
> org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26){code}
> This 

[jira] [Commented] (PARQUET-1496) [Java] Update Scala to 2.12

2019-03-27 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803012#comment-16803012
 ] 

Zoltan Ivanfi commented on PARQUET-1496:


According to 
[https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html,] Scala 
2.10.7 also supports JDK 11, which may provide a resolution for this problem.

> [Java] Update Scala to 2.12
> ---
>
> Key: PARQUET-1496
> URL: https://issues.apache.org/jira/browse/PARQUET-1496
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, 
> the build fails for me in {{parquet-scala}} with:
> {code:java}
> [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 
> ---
> [INFO] Checking for multiple versions of scala
> [INFO] includes = [**/*.java,**/*.scala,]
> [INFO] excludes = []
> [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: 
> info: compiling
> [INFO] Compiling 1 source files to 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at 
> 1547922718010
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class)
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class)
> [ERROR] error: scala.reflect.internal.MissingRequirementError: object 
> java.lang.Object in compiler mirror not found.
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass(Definitions.scala:407)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1154)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261)
> [INFO] at scala.tools.nsc.Global$Run.(Global.scala:1290)
> [INFO] at scala.tools.nsc.Driver.doCompile(Driver.scala:32)
> [INFO] at scala.tools.nsc.Main$.doCompile(Main.scala:79)
> [INFO] at scala.tools.nsc.Driver.process(Driver.scala:54)
> [INFO] at scala.tools.nsc.Driver.main(Driver.scala:67)
> [INFO] at scala.tools.nsc.Main.main(Main.scala)
> [INFO] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [INFO] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [INFO] at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [INFO] at java.base/java.lang.reflect.Method.invoke(Method.java:564)
> [INFO] at 
> org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161)
> [INFO] at 
> 

[jira] [Resolved] (PARQUET-1497) [Java] javax annotations dependency missing for Java 11

2019-03-27 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1497.

   Resolution: Fixed
Fix Version/s: 1.11.0

> [Java] javax annotations dependency missing for Java 11
> ---
>
> Key: PARQUET-1497
> URL: https://issues.apache.org/jira/browse/PARQUET-1497
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-thrift
>Affects Versions: 1.10.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> When trying to build with OpenJDK 11, I get errors due to the Generated 
> annotation not being resolved:
> {code:java}
> [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
> parquet-format-structures ---
> [INFO] Changes detected - recompiling the module!
> [INFO] Compiling 51 source files to 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/classes
> [INFO] -
> [WARNING] COMPILATION WARNING :
> [INFO] -
> [WARNING] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java:
>  
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java
>  uses or overrides a deprecated API.
> [WARNING] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java:
>  Recompile with -Xlint:deprecation for details.
> [INFO] 2 warnings
> [INFO] -
> [INFO] -
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/NanoSeconds.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/NanoSeconds.java:[37,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/StringType.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/StringType.java:[40,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/DataPageHeaderV2.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/DataPageHeaderV2.java:[43,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/Statistics.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/Statistics.java:[41,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/SortingColumn.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/SortingColumn.java:[40,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimestampType.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimestampType.java:[42,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimeUnit.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/MilliSeconds.java:[32,24]
>  package javax.annotation does not exist
> 

[jira] [Comment Edited] (PARQUET-1496) [Java] Update Scala to 2.12

2019-03-26 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801910#comment-16801910
 ] 

Zoltan Ivanfi edited comment on PARQUET-1496 at 3/26/19 4:35 PM:
-

There seems to be an unresolvable circular incompatibility issue here:
 * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12
 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 
19.
 * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons:
 ** It doesn't handle empty structs correctly. For further experimentation, 
this can be hacked around by changing each
{noformat}
struct whatever {}
{noformat}
to
{noformat}
struct whatever {32767: optional i32 dummy;}
{noformat}
 ** It doesn't handle the {{String}} logical type correctly, because in the 
code it generates it does not use fully qualified names. Since the name of this 
logical type shadows the stock String type, this leads to a compilation failure 
in the generated {{LogicalType.scala}} file.


was (Author: zi):
There seems to be an unresolvable circular incompatibility issue here:
 * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12
 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 
19.
 * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons:
 ** It doesn't handle empty structs correctly. For further experimentation, 
this can be hacked around by changing each
{noformat}
struct whatever {}
{noformat}
to
{noformat}
struct whatever {32767: optional i32 dummy;}
{noformat}

 ** It doesn't handle the {{String}} logical type correctly, because in the 
code it generates it does not use fully qualified names. Since the name of this 
logical type shadows the stock String type, this leads to a compilation failure 
in the generated {{LogicalType.scala}} file.

> [Java] Update Scala to 2.12
> ---
>
> Key: PARQUET-1496
> URL: https://issues.apache.org/jira/browse/PARQUET-1496
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, 
> the build fails for me in {{parquet-scala}} with:
> {code:java}
> [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 
> ---
> [INFO] Checking for multiple versions of scala
> [INFO] includes = [**/*.java,**/*.scala,]
> [INFO] excludes = []
> [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: 
> info: compiling
> [INFO] Compiling 1 source files to 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at 
> 1547922718010
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class)
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class)
> [ERROR] error: scala.reflect.internal.MissingRequirementError: object 
> java.lang.Object in compiler mirror not found.
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120)
> [INFO] at 
> 

[jira] [Commented] (PARQUET-1496) [Java] Update Scala to 2.12

2019-03-26 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801910#comment-16801910
 ] 

Zoltan Ivanfi commented on PARQUET-1496:


There seems to be an unresolvable circular incompatibility issue here:
 * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12
 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 
19.
 * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons:
 ** It doesn't handle empty structs correctly. For further experimentation, 
this can be hacked around by changing each {{struct whatever {}}} to {{struct 
whatever {32767: optional i32 dummy;}}}
 ** It doesn't handle the {{String}} logical type correctly, because in the 
code it generates it does use fully qualified names. Since the name of this 
logical type shadows the stock String type, this leads to a compilation failure 
in the generated {{LogicalType.scala}} file.

> [Java] Update Scala to 2.12
> ---
>
> Key: PARQUET-1496
> URL: https://issues.apache.org/jira/browse/PARQUET-1496
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, 
> the build fails for me in {{parquet-scala}} with:
> {code:java}
> [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 
> ---
> [INFO] Checking for multiple versions of scala
> [INFO] includes = [**/*.java,**/*.scala,]
> [INFO] excludes = []
> [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: 
> info: compiling
> [INFO] Compiling 1 source files to 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at 
> 1547922718010
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class)
> [ERROR] error: error while loading package, Missing dependency 'object 
> java.lang.Object in compiler mirror', required by 
> /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class)
> [ERROR] error: scala.reflect.internal.MissingRequirementError: object 
> java.lang.Object in compiler mirror not found.
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
> [ERROR] at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99)
> [INFO] at 
> scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass(Definitions.scala:407)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1154)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196)
> [INFO] at 
> scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261)
> [INFO] at scala.tools.nsc.Global$Run.(Global.scala:1290)
> [INFO] at scala.tools.nsc.Driver.doCompile(Driver.scala:32)
> [INFO] at scala.tools.nsc.Main$.doCompile(Main.scala:79)
> [INFO] at scala.tools.nsc.Driver.process(Driver.scala:54)

[jira] [Updated] (PARQUET-1497) [Java] javax annotations dependency missing for Java 11

2019-03-26 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1497:
---
Summary: [Java] javax annotations dependency missing for Java 11  (was: 
[Java] Building on OSX fails with OpenJDK 11)

> [Java] javax annotations dependency missing for Java 11
> ---
>
> Key: PARQUET-1497
> URL: https://issues.apache.org/jira/browse/PARQUET-1497
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-thrift
>Affects Versions: 1.10.0
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
>
> When trying to build with OpenJDK 11, I get errors due to the Generated 
> annotation not being resolved:
> {code:java}
> [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ 
> parquet-format-structures ---
> [INFO] Changes detected - recompiling the module!
> [INFO] Compiling 51 source files to 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/classes
> [INFO] -
> [WARNING] COMPILATION WARNING :
> [INFO] -
> [WARNING] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java:
>  
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java
>  uses or overrides a deprecated API.
> [WARNING] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java:
>  Recompile with -Xlint:deprecation for details.
> [INFO] 2 warnings
> [INFO] -
> [INFO] -
> [ERROR] COMPILATION ERROR :
> [INFO] -
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/NanoSeconds.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/NanoSeconds.java:[37,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/StringType.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/StringType.java:[40,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/DataPageHeaderV2.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/DataPageHeaderV2.java:[43,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/Statistics.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/Statistics.java:[41,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/SortingColumn.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/SortingColumn.java:[40,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimestampType.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimestampType.java:[42,2]
>  cannot find symbol
>   symbol: class Generated
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimeUnit.java:[32,24]
>  package javax.annotation does not exist
> [ERROR] 
> /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/MilliSeconds.java:[32,24]
>  

[jira] [Created] (PARQUET-1551) Support Java 11 - top-level JIRA

2019-03-26 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1551:
--

 Summary: Support Java 11 - top-level JIRA
 Key: PARQUET-1551
 URL: https://issues.apache.org/jira/browse/PARQUET-1551
 Project: Parquet
  Issue Type: Task
Reporter: Zoltan Ivanfi


This JIRA groups all other JIRA-s related to Java 11.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1550) CleanUtil does not work in Java 11

2019-03-26 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1550:
---
Issue Type: Bug  (was: Task)

> CleanUtil does not work in Java 11
> --
>
> Key: PARQUET-1550
> URL: https://issues.apache.org/jira/browse/PARQUET-1550
> Project: Parquet
>  Issue Type: Bug
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> I'm trying to run the tests with Java 11 using the {{mvn clean install}} 
> command. After various dependency updates and some workarounds, the tests are 
> green, but the output is littered with warnings about swallowed 
> IllegalAccessExceptions caused by CleanUtil. One example of many indentical 
> ones:
> {code}
> 2019-03-26 15:07:34 WARN CleanUtil - Clean failed for buffer DirectByteBuffer
> java.lang.IllegalAccessException: class 
> org.apache.parquet.hadoop.codec.CleanUtil cannot access class 
> jdk.internal.ref.Cleaner (in module java.base) because module java.base does 
> not export jdk.internal.ref to unnamed module @413f69cc
>   at 
> java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:361)
>   at 
> java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:591)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:558)
>   at org.apache.parquet.hadoop.codec.CleanUtil.clean(CleanUtil.java:64)
>   at 
> org.apache.parquet.hadoop.codec.SnappyDecompressor.setInput(SnappyDecompressor.java:109)
>   at 
> org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:46)
>   at java.base/java.io.DataInputStream.readFully(DataInputStream.java:200)
>   at java.base/java.io.DataInputStream.readFully(DataInputStream.java:170)
>   at 
> org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:279)
>   at 
> org.apache.parquet.hadoop.TestDirectCodecFactory.test(TestDirectCodecFactory.java:114)
>   at 
> org.apache.parquet.hadoop.TestDirectCodecFactory.compressionCodecs(TestDirectCodecFactory.java:168)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.base/java.lang.reflect.Method.invoke(Method.java:566)
>   at 
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
>   at 
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
>   at 
> org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
>   at 
> org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
> {code}



--
This message was sent by Atlassian JIRA

[jira] [Created] (PARQUET-1550) CleanUtil does not work in Java 11

2019-03-26 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1550:
--

 Summary: CleanUtil does not work in Java 11
 Key: PARQUET-1550
 URL: https://issues.apache.org/jira/browse/PARQUET-1550
 Project: Parquet
  Issue Type: Task
Reporter: Zoltan Ivanfi


I'm trying to run the tests with Java 11 using the {{mvn clean install}} 
command. After various dependency updates and some workarounds, the tests are 
green, but the output is littered with warnings about swallowed 
IllegalAccessExceptions caused by CleanUtil. One example of many indentical 
ones:

{code}
2019-03-26 15:07:34 WARN CleanUtil - Clean failed for buffer DirectByteBuffer
java.lang.IllegalAccessException: class 
org.apache.parquet.hadoop.codec.CleanUtil cannot access class 
jdk.internal.ref.Cleaner (in module java.base) because module java.base does 
not export jdk.internal.ref to unnamed module @413f69cc
at 
java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:361)
at 
java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:591)
at java.base/java.lang.reflect.Method.invoke(Method.java:558)
at org.apache.parquet.hadoop.codec.CleanUtil.clean(CleanUtil.java:64)
at 
org.apache.parquet.hadoop.codec.SnappyDecompressor.setInput(SnappyDecompressor.java:109)
at 
org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:46)
at java.base/java.io.DataInputStream.readFully(DataInputStream.java:200)
at java.base/java.io.DataInputStream.readFully(DataInputStream.java:170)
at 
org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:279)
at 
org.apache.parquet.hadoop.TestDirectCodecFactory.test(TestDirectCodecFactory.java:114)
at 
org.apache.parquet.hadoop.TestDirectCodecFactory.compressionCodecs(TestDirectCodecFactory.java:168)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at 
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at 
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at 
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at 
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1543) Execute the TIMESTAMP types roadmap

2019-02-28 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1543.

Resolution: Not A Problem

Accidentally opened JIRA for the wrong project.

> Execute the TIMESTAMP types roadmap
> ---
>
> Key: PARQUET-1543
> URL: https://issues.apache.org/jira/browse/PARQUET-1543
> Project: Parquet
>  Issue Type: Task
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> This is the top-level JIRA for tracking the addition and/or alteration of 
> different TIMESTAMP types in order to eventually reach the desired state as 
> specified in [the design doc for TIMESTAMP 
> types|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1543) Execute the TIMESTAMP types roadmap

2019-02-28 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1543:
--

 Summary: Execute the TIMESTAMP types roadmap
 Key: PARQUET-1543
 URL: https://issues.apache.org/jira/browse/PARQUET-1543
 Project: Parquet
  Issue Type: Task
Reporter: Zoltan Ivanfi


This is the top-level JIRA for tracking the addition and/or alteration of 
different TIMESTAMP types in order to eventually reach the desired state as 
specified in [the design doc for TIMESTAMP 
types|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1533) TestSnappy() throws OOM exception with Parquet-1485 change

2019-02-19 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1533:
---
Description: 
Parquet-1485 initialize the buffer size(inputBuffer and outputBuffer) from 0 to 
128M in total. This cause the unit test TestSnappy() failed with OOM exception. 
This is on my Mac laptop. 

To solve the unit test failure, we can increase the size of -Xmx from 512m to 
1024m like below. However, we need to evaluate whether or not the increase of 
the initial size of direct memory usage for inputBuffer and outputBuffer will 
cause real Parquet application OOM or not, if that application is not with big 
enough -Xmx size. 

org.apache.maven.plugins
 maven-surefire-plugin
 ...
 -Xmx1014m

...

For details of the exception, the commit page 
([https://github.com/apache/parquet-mr/commit/7dcdcdcf0eb5e91618c443d4a84973bf7883d79b])
 has the detail. 

  was:
Parquet-1485 initialize the buffer size(inputBuffer and outputBuffer) from 0 to 
128M in total. This cause the unit test TestSnappy() failed with OOM exception. 
This is on my Mac laptop. 

To solve the unit test failure, we can increase the size of -Xmx from 512m to 
1024m like below. However, we need to evaluate whether or not the increase of 
the initial size of direct memory usage for inputBuffer and outputBuffer will 
cause real Parquet application OOM or not, if that application is not with big 
enough -Xmx size. 

org.apache.maven.plugins
maven-surefire-plugin
...
-Xmx1014m

...

For details of the exception, the pull 
request(https://github.com/apache/parquet-mr/commit/7dcdcdcf0eb5e91618c443d4a84973bf7883d79b)
 has the detail. 


> TestSnappy() throws OOM exception with Parquet-1485 change 
> ---
>
> Key: PARQUET-1533
> URL: https://issues.apache.org/jira/browse/PARQUET-1533
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.11.0
> Environment: Mac OS 10.14.1
>Reporter: Xinli Shang
>Priority: Minor
>
> Parquet-1485 initialize the buffer size(inputBuffer and outputBuffer) from 0 
> to 128M in total. This cause the unit test TestSnappy() failed with OOM 
> exception. This is on my Mac laptop. 
> To solve the unit test failure, we can increase the size of -Xmx from 512m to 
> 1024m like below. However, we need to evaluate whether or not the increase of 
> the initial size of direct memory usage for inputBuffer and outputBuffer will 
> cause real Parquet application OOM or not, if that application is not with 
> big enough -Xmx size. 
> org.apache.maven.plugins
>  maven-surefire-plugin
>  ...
>  -Xmx1014m
> ...
> For details of the exception, the commit page 
> ([https://github.com/apache/parquet-mr/commit/7dcdcdcf0eb5e91618c443d4a84973bf7883d79b])
>  has the detail. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1491) Conditional debug logging in InternalParquetRecordReader to reduce GC

2019-01-14 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1491.

Resolution: Not A Problem

> Conditional debug logging in InternalParquetRecordReader to reduce GC
> -
>
> Key: PARQUET-1491
> URL: https://issues.apache.org/jira/browse/PARQUET-1491
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-avro
>Reporter: Artavazd Balaian
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2019-01-12-04-03-48-005.png, 
> image-2019-01-12-04-09-18-359.png, image-2019-01-12-04-10-49-230.png
>
>
> Currently there is no check for the log level in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.javaL249|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java#L230;L249.]
>  which causes a lot of memory allocation and performance degradation.
> Link to parquet file which was used: 
> [https://drive.google.com/open?id=1xCMZrUPWvlS4KOFO8m9EmtkvDy-SiRHq]
> Screenshot of Java Mission Control comparison with fix and without (link to 
> the JFR files 
> [https://drive.google.com/open?id=1blSeF-AyAhQyRYaqVsihyzy7pJCJt7U3):]
> !image-2019-01-12-04-03-48-005.png|width=956,height=538!
> !image-2019-01-12-04-10-49-230.png|width=1403,height=760!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1490) Add branch-specific Travis steps

2019-01-09 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1490:
---
Description: 
The script for the main branch has to make sure that POM files in the master 
branch do not refer to SNAPSHOT versions.

The possiblity of scripts for feature branches will allow building a SNAPSHOT 
version of parquet-format and depending on it in the POM files.

  was:
The script for the main branch has to make sure that POM files in the master 
branch do not refer to SNAPSHOT versions.

The script for feature branches will allow building a SNAPSHOT version of 
parquet-mr and depending on it in the POM files.


> Add branch-specific Travis steps
> 
>
> Key: PARQUET-1490
> URL: https://issues.apache.org/jira/browse/PARQUET-1490
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
>
> The script for the main branch has to make sure that POM files in the master 
> branch do not refer to SNAPSHOT versions.
> The possiblity of scripts for feature branches will allow building a SNAPSHOT 
> version of parquet-format and depending on it in the POM files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1490) Add branch-specific Travis steps

2019-01-09 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1490:
--

 Summary: Add branch-specific Travis steps
 Key: PARQUET-1490
 URL: https://issues.apache.org/jira/browse/PARQUET-1490
 Project: Parquet
  Issue Type: Improvement
Reporter: Zoltan Ivanfi


The script for the main branch has to make sure that POM files in the master 
branch do not refer to SNAPSHOT versions.

The script for feature branches will allow building a SNAPSHOT version of 
parquet-mr and depending on it in the POM files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1461) Third party code does not compile after parquet-mr minor version update

2019-01-09 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1461.

Resolution: Fixed

> Third party code does not compile after parquet-mr minor version update
> ---
>
> Key: PARQUET-1461
> URL: https://issues.apache.org/jira/browse/PARQUET-1461
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Zoltan Ivanfi
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Third party code implemented public void initFromPage(int valueCount, 
> ByteBuffer page, int offset), but new version has public abstract 
> initFromPage(int valueCount, ByteBufferInputStream in) instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1487) Do not write original type for timezone-agnostic timestamps

2019-01-07 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1487:
--

 Summary: Do not write original type for timezone-agnostic 
timestamps
 Key: PARQUET-1487
 URL: https://issues.apache.org/jira/browse/PARQUET-1487
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Affects Versions: 1.11.0
Reporter: Zoltan Ivanfi
Assignee: Nandor Kollar
 Fix For: 1.11.0


Historically, the TIMESTAMP_MILLIS and TIMESTAMP_MICROS original types used for 
the INT64 physical type were always UTC-normalized.

The new TIMESTAMP logical type allows both UTC-normalized and timezone-agnostic 
timestamps and writes the legacy original types for compatibility reasons. 
However, the latter should only be written for UTC-normalized timestamps, 
because legacy readers are not prepared to handle timezone-agnostic timestamps 
correctly and the original type would just be misleading.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1478) Can't read spec compliant, 3-level lists via parquet-proto

2019-01-03 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1478:
---
Affects Version/s: 1.11.0

> Can't read spec compliant, 3-level lists via parquet-proto
> --
>
> Key: PARQUET-1478
> URL: https://issues.apache.org/jira/browse/PARQUET-1478
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Nandor Kollar
>Priority: Major
>  Labels: pull-request-available
>
> I noticed that ProtoInputOutputFormatTest doesn't test the following case 
> properly: when lists are written using the spec compliant 3-level structure. 
> The test actually doesn't write 3-level list, because the passed 
> configuration is not used at all, a new one is created each time. See 
> attached PR.
> When I fixed this test, it turned out that it is failing: now it writes the 
> correct 3-level structure, but looks like the read path is broken. Is it 
> indeed a bug, or I'm doing something wrong?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1478) Can't read spec compliant, 3-level lists via parquet-proto

2019-01-03 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1478:
---
Fix Version/s: 1.11.0

> Can't read spec compliant, 3-level lists via parquet-proto
> --
>
> Key: PARQUET-1478
> URL: https://issues.apache.org/jira/browse/PARQUET-1478
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Nandor Kollar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> I noticed that ProtoInputOutputFormatTest doesn't test the following case 
> properly: when lists are written using the spec compliant 3-level structure. 
> The test actually doesn't write 3-level list, because the passed 
> configuration is not used at all, a new one is created each time. See 
> attached PR.
> When I fixed this test, it turned out that it is failing: now it writes the 
> correct 3-level structure, but looks like the read path is broken. Is it 
> indeed a bug, or I'm doing something wrong?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1462) Allow specifying new development version in prepare-release.sh

2018-12-04 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1462.

   Resolution: Fixed
Fix Version/s: format-2.7.0
   1.12.0

> Allow specifying new development version in prepare-release.sh
> --
>
> Key: PARQUET-1462
> URL: https://issues.apache.org/jira/browse/PARQUET-1462
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format, parquet-mr
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0, format-2.7.0
>
>
> Currently prepare-release.sh only takes the release version as a parameter, 
> the new development version is asked interactively for each individual 
> pom.xml file, which makes answering them tedious.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1462) Allow specifying new development version in prepare-release.sh

2018-11-23 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1462:
--

 Summary: Allow specifying new development version in 
prepare-release.sh
 Key: PARQUET-1462
 URL: https://issues.apache.org/jira/browse/PARQUET-1462
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-format, parquet-mr
Reporter: Zoltan Ivanfi
Assignee: Zoltan Ivanfi


Currently prepare-release.sh only takes the release version as a parameter, the 
new development version is asked interactively for each individual pom.xml 
file, which makes answering them tedious.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1461) Third party code does not compile after parquet-mr minor version update

2018-11-22 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1461:
--

 Summary: Third party code does not compile after parquet-mr minor 
version update
 Key: PARQUET-1461
 URL: https://issues.apache.org/jira/browse/PARQUET-1461
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.11.0
Reporter: Zoltan Ivanfi
Assignee: Gabor Szadovszky
 Fix For: 1.11.0


Third party code implemented public void initFromPage(int valueCount, 
ByteBuffer page, int offset), but new version has public abstract 
initFromPage(int valueCount, ByteBufferInputStream in) instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1460) Fix javadoc errors and include javadoc checking in Travis checks

2018-11-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1460:
--

Assignee: Gabor Szadovszky  (was: Zoltan Ivanfi)

> Fix javadoc errors and include javadoc checking in Travis checks
> 
>
> Key: PARQUET-1460
> URL: https://issues.apache.org/jira/browse/PARQUET-1460
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Zoltan Ivanfi
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Javadoc generation fails with various errors, preventing us from running the 
> release script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1460) Fix javadoc errors and include javadoc checking in Travis checks

2018-11-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1460.

Resolution: Fixed

> Fix javadoc errors and include javadoc checking in Travis checks
> 
>
> Key: PARQUET-1460
> URL: https://issues.apache.org/jira/browse/PARQUET-1460
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Zoltan Ivanfi
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Javadoc generation fails with various errors, preventing us from running the 
> release script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1460) Fix javadoc errors and include javadoc checking in Travis checks

2018-11-21 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1460:
--

 Summary: Fix javadoc errors and include javadoc checking in Travis 
checks
 Key: PARQUET-1460
 URL: https://issues.apache.org/jira/browse/PARQUET-1460
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.10.0
Reporter: Zoltan Ivanfi
Assignee: Zoltan Ivanfi
 Fix For: 1.11.0


Javadoc generation fails with various errors, preventing us from running the 
release script.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1388) Nanosecond precision time and timestamp - parquet-mr

2018-11-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1388:
---
Issue Type: New Feature  (was: Improvement)

> Nanosecond precision time and timestamp - parquet-mr
> 
>
> Key: PARQUET-1388
> URL: https://issues.apache.org/jira/browse/PARQUET-1388
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1253) Support for new logical type representation

2018-11-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1253:
---
Issue Type: New Feature  (was: Improvement)

> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
> Fix For: 1.11.0
>
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1365) Don't write page level statistics

2018-11-19 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1365.

Resolution: Fixed

> Don't write page level statistics
> -
>
> Key: PARQUET-1365
> URL: https://issues.apache.org/jira/browse/PARQUET-1365
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Page level statistics are never used in production and after adding column 
> indexes they are completely useless. Fortunately, statistics are optional in 
> both the v1 and v2 pages therefore, we can safely stop writing them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1451) Deprecate old logical types API

2018-10-29 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1451.

Resolution: Duplicate

JIRA was not responding, accidentally created the issue twice.

> Deprecate old logical types API
> ---
>
> Key: PARQUET-1451
> URL: https://issues.apache.org/jira/browse/PARQUET-1451
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Zoltan Ivanfi
>Assignee: Nandor Kollar
>Priority: Major
> Fix For: 1.11.0
>
>
> Now that the new logical types API is ready, we should deprecate the old one 
> because new types will not support it (in fact, nano precision has already 
> been added without support in the old API).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1452) Deprecate old logical types API

2018-10-29 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1452:
--

 Summary: Deprecate old logical types API
 Key: PARQUET-1452
 URL: https://issues.apache.org/jira/browse/PARQUET-1452
 Project: Parquet
  Issue Type: Task
  Components: parquet-mr
Reporter: Zoltan Ivanfi
Assignee: Nandor Kollar
 Fix For: 1.11.0


Now that the new logical types API is ready, we should deprecate the old one 
because new types will not support it (in fact, nano precision has already been 
added without support in the old API).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1451) Deprecate old logical types API

2018-10-29 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1451:
--

 Summary: Deprecate old logical types API
 Key: PARQUET-1451
 URL: https://issues.apache.org/jira/browse/PARQUET-1451
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Reporter: Zoltan Ivanfi
Assignee: Nandor Kollar
 Fix For: 1.11.0


Now that the new logical types API is ready, we should deprecate the old one 
because new types will not support it (in fact, nano precision has already been 
added without support in the old API).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1440) Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale

2018-10-10 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1440:
--

Assignee: Ryan Gardner

> Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file 
> aren't displayed with their proper scale
> --
>
> Key: PARQUET-1440
> URL: https://issues.apache.org/jira/browse/PARQUET-1440
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Ryan Gardner
>Assignee: Ryan Gardner
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> When working with the parquet-tools, I noticed that decimal values that were 
> stored with int32 or int64 were not being displayed properly.
> I opened up a pull request to fix this:
> https://github.com/apache/parquet-mr/pull/530#issuecomment-428137066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1440) Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale

2018-10-10 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1440.

Resolution: Fixed

> Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file 
> aren't displayed with their proper scale
> --
>
> Key: PARQUET-1440
> URL: https://issues.apache.org/jira/browse/PARQUET-1440
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Ryan Gardner
>Assignee: Ryan Gardner
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> When working with the parquet-tools, I noticed that decimal values that were 
> stored with int32 or int64 were not being displayed properly.
> I opened up a pull request to fix this:
> https://github.com/apache/parquet-mr/pull/530#issuecomment-428137066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1440) Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale

2018-10-10 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1440:
---
Fix Version/s: 1.11.0

> Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file 
> aren't displayed with their proper scale
> --
>
> Key: PARQUET-1440
> URL: https://issues.apache.org/jira/browse/PARQUET-1440
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Ryan Gardner
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> When working with the parquet-tools, I noticed that decimal values that were 
> stored with int32 or int64 were not being displayed properly.
> I opened up a pull request to fix this:
> https://github.com/apache/parquet-mr/pull/530#issuecomment-428137066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1440) Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale

2018-10-10 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1440:
---
Summary: Parquet-tools: Decimal values stored in an int32 or int64 in the 
parquet file aren't displayed with their proper scale  (was: Decimal values 
stored in an int32 or int64 in the parquet file aren't displayed with their 
proper scale)

> Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file 
> aren't displayed with their proper scale
> --
>
> Key: PARQUET-1440
> URL: https://issues.apache.org/jira/browse/PARQUET-1440
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Ryan Gardner
>Priority: Major
>  Labels: pull-request-available
>
> When working with the parquet-tools, I noticed that decimal values that were 
> stored with int32 or int64 were not being displayed properly.
> I opened up a pull request to fix this:
> https://github.com/apache/parquet-mr/pull/530#issuecomment-428137066



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1437) Misleading comment in parquet.thrift

2018-10-08 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1437:
---
Fix Version/s: 2.7.0

> Misleading comment in parquet.thrift
> 
>
> Key: PARQUET-1437
> URL: https://issues.apache.org/jira/browse/PARQUET-1437
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Zoltan Ivanfi
>Priority: Major
> Fix For: format-2.7.0
>
>
> The documentation for {{list column_orders}} states that "Each 
> sort order corresponds to one column, determined by its position in the list, 
> matching the position of the column in the schema."
> However, in reality, while the order of elements in these two lists (schema 
> and sort order) are the same, only leaf nodes are represented in the list of 
> sort orders, so the positions do *not* match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1437) Misleading comment in parquet.thrift

2018-10-08 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1437:
---
Fix Version/s: (was: 2.7.0)
   format-2.7.0

> Misleading comment in parquet.thrift
> 
>
> Key: PARQUET-1437
> URL: https://issues.apache.org/jira/browse/PARQUET-1437
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-format
>Reporter: Zoltan Ivanfi
>Priority: Major
> Fix For: format-2.7.0
>
>
> The documentation for {{list column_orders}} states that "Each 
> sort order corresponds to one column, determined by its position in the list, 
> matching the position of the column in the schema."
> However, in reality, while the order of elements in these two lists (schema 
> and sort order) are the same, only leaf nodes are represented in the list of 
> sort orders, so the positions do *not* match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1437) Misleading comment in parquet.thrift

2018-10-08 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1437:
--

 Summary: Misleading comment in parquet.thrift
 Key: PARQUET-1437
 URL: https://issues.apache.org/jira/browse/PARQUET-1437
 Project: Parquet
  Issue Type: Bug
  Components: parquet-format
Reporter: Zoltan Ivanfi


The documentation for {{list column_orders}} states that "Each 
sort order corresponds to one column, determined by its position in the list, 
matching the position of the column in the schema."

However, in reality, while the order of elements in these two lists (schema and 
sort order) are the same, only leaf nodes are represented in the list of sort 
orders, so the positions do *not* match.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1436) TimestampMicrosStringifier shows wrong microseconds for timestamps before 1970

2018-10-03 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1436:
--

 Summary: TimestampMicrosStringifier shows wrong microseconds for 
timestamps before 1970
 Key: PARQUET-1436
 URL: https://issues.apache.org/jira/browse/PARQUET-1436
 Project: Parquet
  Issue Type: Task
  Components: parquet-mr
Reporter: Zoltan Ivanfi
 Fix For: 1.11.0


testTimestampMicrosStringifier takes the timestamp 1848-03-15T09:23:59.765 and 
subtracts 1 microseconds from it. The result (both expected and actual) is 
1848-03-15T09:23:59.765001, but it should be 1848-03-15T09:23:59.764999 instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1381) Add merge blocks command to parquet-tools

2018-10-02 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1381:
--

Assignee: Ekaterina Galieva

> Add merge blocks command to parquet-tools
> -
>
> Key: PARQUET-1381
> URL: https://issues.apache.org/jira/browse/PARQUET-1381
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Ekaterina Galieva
>Assignee: Ekaterina Galieva
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Current implementation of merge command in parquet-tools doesn't merge row 
> groups, just places one after the other. Add API and command option to be 
> able to merge small blocks into larger ones up to specified size limit.
> h6. Implementation details:
> Blocks are not reordered not to break possible initial predicate pushdown 
> optimizations.
> Blocks are not divided to fit upper bound perfectly. 
> This is an intentional performance optimization. 
> This gives an opportunity to form new blocks by coping full content of 
> smaller blocks by column, not by row.
> h6. Examples:
>  # Input files with blocks sizes:
> {code:java}
> [128 | 35], [128 | 40], [120]{code}
> Expected output file blocks sizes:
> {{merge }}
> {code:java}
> [128 | 35 | 128 | 40 | 120]
> {code}
> {{merge -b}}
> {code:java}
> [128 | 35 | 128 | 40 | 120]
> {code}
> {{merge -b -l 256 }}
> {code:java}
> [163 | 168 | 120]
> {code}
>  # Input files with blocks sizes:
> {code:java}
> [128 | 35], [40], [120], [6] {code}
> Expected output file blocks sizes:
> {{merge}}
> {code:java}
> [128 | 35 | 40 | 120 | 6] 
> {code}
> {{merge -b}}
> {code:java}
> [128 | 75 | 126] 
> {code}
> {{merge -b -l 256}}
> {code:java}
> [203 | 126]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1368) ParquetFileReader should close its input stream for the failure in constructor

2018-10-02 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1368:
--

Assignee: Hyukjin Kwon

> ParquetFileReader should close its input stream for the failure in constructor
> --
>
> Key: PARQUET-1368
> URL: https://issues.apache.org/jira/browse/PARQUET-1368
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> I was trying to replace deprecated usage {{readFooter}} to 
> {{ParquetFileReader.open}} according to the node:
> {code}
> [warn] 
> /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:368:
>  method readFooter in object ParquetFileReader is deprecated: see 
> corresponding Javadoc for more information.
> [warn] ParquetFileReader.readFooter(sharedConf, filePath, 
> SKIP_ROW_GROUPS).getFileMetaData
> [warn]   ^
> [warn] 
> /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:545:
>  method readFooter in object ParquetFileReader is deprecated: see 
> corresponding Javadoc for more information.
> [warn] ParquetFileReader.readFooter(
> [warn]   ^
> {code}
> Then, I realised some test suites reports resource leak:
> {code}
> java.lang.Throwable
>   at 
> org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36)
>   at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70)
>   at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
>   at 
> org.apache.parquet.hadoop.util.HadoopInputFile.newStream(HadoopInputFile.java:65)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:687)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:595)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$.createParquetReader(ParquetUtils.scala:67)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$.readFooter(ParquetUtils.scala:46)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:544)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:539)
>   at 
> scala.collection.parallel.AugmentedIterableIterator$class.flatmap2combiner(RemainsIterator.scala:132)
>   at 
> scala.collection.parallel.immutable.ParVector$ParVectorIterator.flatmap2combiner(ParVector.scala:62)
>   at 
> scala.collection.parallel.ParIterableLike$FlatMap.leaf(ParIterableLike.scala:1072)
>   at 
> scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49)
>   at 
> scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
>   at 
> scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48)
>   at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51)
>   at 
> scala.collection.parallel.ParIterableLike$FlatMap.tryLeaf(ParIterableLike.scala:1068)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:159)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443)
>   at 
> scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>   at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341)
>   at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673)
>   at 
> scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:378)
>   at 
> scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:443)
>   at 
> scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:426)
>   at 
> scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:56)
>   at 
> scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:958)
>   at 
> 

[jira] [Updated] (PARQUET-1337) Current block alignment logic may lead to several row groups per block

2018-09-28 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1337:
---
Description: 
When the size of buffered data gets near the desired row group size, Parquet 
flushes the data to a row group. However, at this point the data for the last 
page is not yet encoded nor compressed, thereby the row group may end up being 
significantly smaller than it was intended.

If the row group ends up being so small that it is farther away from the next 
disk block boundary than the maximum padding, Parquet will try to create a new 
group in the same disk block, this time targeting the remaning space. This may 
also be flushed prematurely, leading to the creation of an even smaller row 
group, which may lead to an even smaller one... This gets repeated until we get 
sufficiently close to the block boundary so that padding can be finally 
applied. The resulting superflous row groups can lead to bad read performance.

An example of the structure of a Parquet file suffering from this problem can 
be seen below. For easier interpretation, the row groups are visually grouped 
by disk blocks:
{noformat}
row group 1:  RC:18774 TS:22182960 OFFSET:   4
row group 2:  RC: 2896 TS: 3428160 OFFSET: 6574564
row group 3:  RC: 1964 TS: 2322560 OFFSET: 7679844
row group 4:  RC: 1074 TS: 1268880 OFFSET: 8732964
{noformat}
{noformat}
row group 5:  RC:18808 TS:8560 OFFSET:1000
row group 6:  RC: 2872 TS: 3389520 OFFSET:16612640
row group 7:  RC: 1930 TS: 2284960 OFFSET:17716800
row group 8:  RC: 1040 TS: 1233520 OFFSET:18768240
{noformat}
{noformat}
row group 9:  RC:18852 TS:22275520 OFFSET:2000
row group 10: RC: 2831 TS: 3345680 OFFSET:26656320
row group 11: RC: 1893 TS: 2244640 OFFSET:27757200
row group 12: RC: 1008 TS: 1195520 OFFSET:28806560
{noformat}
{noformat}
row group 13: RC:18841 TS:22263360 OFFSET:3000
row group 14: RC: 2835 TS: 3350480 OFFSET:36652000
row group 15: RC: 1900 TS: 2249040 OFFSET:37753600
row group 16: RC: 1016 TS: 1198640 OFFSET:38803600
{noformat}
{noformat}
row group 17: RC: 1466 TS: 1740320 OFFSET:4000
{noformat}
In this example, both the disk block size and the row group size was set to 
1000. The data would fit in 5 row groups of this size, but instead, each of 
the disk blocks (except the last) is split into 4 row groups of progressively 
decreasing size.

  was:
When the size of buffered data gets near the desired row group size, Parquet 
flushes the data to a row group. However, at this point the data for the last 
page is not yet encoded nor compressed, thereby the row group may end up being 
significantly smaller than it was intended.

If the row group ends up being so small that it is farther away from the next 
disk block boundary than the maximum padding, Parquet will try to create a new 
group in the same disk block, this time targeting the remaning space. This may 
also be flushed prematurely, leading to the creation of an even smaller row 
group, which may lead to an even smaller one... This gets repeated until we get 
sufficiently close to the block boundary so that padding can be finally 
applied. The resulting superflous row groups can lead to bad performance.

An example of the structure of a Parquet file suffering from this problem can 
be seen below. For easier interpretation, the row groups are visually grouped 
by disk blocks:

{noformat}
row group 1:  RC:18774 TS:22182960 OFFSET:   4
row group 2:  RC: 2896 TS: 3428160 OFFSET: 6574564
row group 3:  RC: 1964 TS: 2322560 OFFSET: 7679844
row group 4:  RC: 1074 TS: 1268880 OFFSET: 8732964
{noformat}
{noformat}
row group 5:  RC:18808 TS:8560 OFFSET:1000
row group 6:  RC: 2872 TS: 3389520 OFFSET:16612640
row group 7:  RC: 1930 TS: 2284960 OFFSET:17716800
row group 8:  RC: 1040 TS: 1233520 OFFSET:18768240
{noformat}
{noformat}
row group 9:  RC:18852 TS:22275520 OFFSET:2000
row group 10: RC: 2831 TS: 3345680 OFFSET:26656320
row group 11: RC: 1893 TS: 2244640 OFFSET:27757200
row group 12: RC: 1008 TS: 1195520 OFFSET:28806560
{noformat}
{noformat}
row group 13: RC:18841 TS:22263360 OFFSET:3000
row group 14: RC: 2835 TS: 3350480 OFFSET:36652000
row group 15: RC: 1900 TS: 2249040 OFFSET:37753600
row group 16: RC: 1016 TS: 1198640 OFFSET:38803600
{noformat}
{noformat}
row group 17: RC: 1466 TS: 1740320 OFFSET:4000
{noformat}

In this example, both the disk block size and the row group size was set to 
1000. The data would fit in 5 row groups of this size, but instead, each of 
the disk blocks (except the last) is split into 4 row groups of progressively 
decreasing size.


> Current block alignment logic may lead to several row groups per block
> --
>
> Key: PARQUET-1337
> URL: https://issues.apache.org/jira/browse/PARQUET-1337
> Project: Parquet
>   

[jira] [Commented] (PARQUET-1201) Column indexes

2018-09-28 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631463#comment-16631463
 ] 

Zoltan Ivanfi commented on PARQUET-1201:


[~rdblue] That is the branch indeed. Our idea with the feature branch was that 
each commit to the feature branch will happen via a PR and will go through the 
regular thorough reviewing process. We wanted to avoid opening a giant PR at 
the end which would be very hard to review anyway due to its sheer size. The 
community accepted this approach on the Parquet sync where we discussed it. 
Could you please review the individual PR-s instead? Thanks!

> Column indexes
> --
>
> Key: PARQUET-1201
> URL: https://issues.apache.org/jira/browse/PARQUET-1201
> Project: Parquet
>  Issue Type: New Feature
>Affects Versions: 1.10.0
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
> Fix For: format-2.5.0
>
>
> Write the column indexes described in PARQUET-922.
>  This is the first phase of implementing the whole feature. The 
> implementation is done in the following steps:
>  * Utility to read/write indexes in parquet-format
>  * Writing indexes in the parquet file
>  * Extend parquet-tools and parquet-cli to show the indexes
>  * Limit index size based on parquet properties
>  * Trim min/max values where possible based on parquet properties
>  * Filtering based on column indexes
> The work is done on the feature branch {{column-indexes}}. This JIRA will be 
> resolved after the branch has been merged to {{master}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1400) Deprecate parquet-mr related code in parquet-format

2018-09-24 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1400.

   Resolution: Fixed
Fix Version/s: format-2.6.0

> Deprecate parquet-mr related code in parquet-format
> ---
>
> Key: PARQUET-1400
> URL: https://issues.apache.org/jira/browse/PARQUET-1400
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-format
>Reporter: Gabor Szadovszky
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
> Fix For: format-2.6.0
>
>
> There are java classes in the 
> [parquet-format|https://github.com/apache/parquet-format] repo that shall be 
> in the [parquet-mr|https://github.com/apache/parquet-mr] repo instead: [java 
> classes|https://github.com/apache/parquet-format/tree/master/src/main] and 
> [test classes|https://github.com/apache/parquet-format/tree/master/src/test]
> These classes shall be deprecated by mentioning they will be moved to the 
> [parquet-mr|https://github.com/apache/parquet-mr] repo.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1381) Add merge blocks command to parquet-tools

2018-09-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1381.

   Resolution: Fixed
Fix Version/s: (was: 1.10.1)
   1.11.0

> Add merge blocks command to parquet-tools
> -
>
> Key: PARQUET-1381
> URL: https://issues.apache.org/jira/browse/PARQUET-1381
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Ekaterina Galieva
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Current implementation of merge command in parquet-tools doesn't merge row 
> groups, just places one after the other. Add API and command option to be 
> able to merge small blocks into larger ones up to specified size limit.
> h6. Implementation details:
> Blocks are not reordered not to break possible initial predicate pushdown 
> optimizations.
> Blocks are not divided to fit upper bound perfectly. 
> This is an intentional performance optimization. 
> This gives an opportunity to form new blocks by coping full content of 
> smaller blocks by column, not by row.
> h6. Examples:
>  # Input files with blocks sizes:
> {code:java}
> [128 | 35], [128 | 40], [120]{code}
> Expected output file blocks sizes:
> {{merge }}
> {code:java}
> [128 | 35 | 128 | 40 | 120]
> {code}
> {{merge -b}}
> {code:java}
> [128 | 35 | 128 | 40 | 120]
> {code}
> {{merge -b -l 256 }}
> {code:java}
> [163 | 168 | 120]
> {code}
>  # Input files with blocks sizes:
> {code:java}
> [128 | 35], [40], [120], [6] {code}
> Expected output file blocks sizes:
> {{merge}}
> {code:java}
> [128 | 35 | 40 | 120 | 6] 
> {code}
> {{merge -b}}
> {code:java}
> [128 | 75 | 126] 
> {code}
> {{merge -b -l 256}}
> {code:java}
> [203 | 126]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1417) BINARY_AS_SIGNED_INTEGER_COMPARATOR fails with IOBE for the same arrays with the different length

2018-09-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1417:
---
Fix Version/s: 1.11.0

> BINARY_AS_SIGNED_INTEGER_COMPARATOR fails with IOBE for the same arrays with 
> the different length
> -
>
> Key: PARQUET-1417
> URL: https://issues.apache.org/jira/browse/PARQUET-1417
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Volodymyr Vysotskyi
>Assignee: Volodymyr Vysotskyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> {{BINARY_AS_SIGNED_INTEGER_COMPARATOR}} fails when the same byte arrays but 
> with the different number leading zeros are compared:
> {code:java}
> BINARY_AS_SIGNED_INTEGER_COMPARATOR.compare(
> Binary.fromConstantByteBuffer(ByteBuffer.wrap(new byte[] { 0, 0, -108 
> })),
> Binary.fromConstantByteBuffer(ByteBuffer.wrap(new byte[] { 0, -108 
> })));
> {code}
> Error is:
> {noformat}
> java.lang.IndexOutOfBoundsException
>   at java.nio.Buffer.checkIndex(Buffer.java:540)
>   at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
>   at 
> org.apache.parquet.schema.PrimitiveComparator$9.compare(PrimitiveComparator.java:280)
>   at 
> org.apache.parquet.schema.PrimitiveComparator$9.compare(PrimitiveComparator.java:262)
>   at 
> org.apache.parquet.schema.PrimitiveComparator$BinaryComparator.compareNotNulls(PrimitiveComparator.java:186)
>   at 
> org.apache.parquet.schema.PrimitiveComparator$BinaryComparator.compareNotNulls(PrimitiveComparator.java:183)
>   at 
> org.apache.parquet.schema.PrimitiveComparator.compare(PrimitiveComparator.java:63)
> {noformat}
> The problem is that {{BINARY_AS_SIGNED_INTEGER_COMPARATOR.compare(ByteBuffer 
> b1, ByteBuffer b2)}} method passes the length of the first {{ByteBuffer}}, 
> but it should pass the less length since padding was calculated and passed 
> for the {{ByteBuffer}} with greater length to the {{compare(int length, 
> ByteBuffer b1, int p1, ByteBuffer b2, int p2)}} method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1418) Run integration tests in Travis

2018-09-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1418:
---
Fix Version/s: 1.11.0

> Run integration tests in Travis
> ---
>
> Key: PARQUET-1418
> URL: https://issues.apache.org/jira/browse/PARQUET-1418
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Currently Travis only runs the unit tests. It should run the integration 
> tests as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1337) Current block alignment logic may lead to several row groups per block

2018-09-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1337:
---
Component/s: parquet-mr

> Current block alignment logic may lead to several row groups per block
> --
>
> Key: PARQUET-1337
> URL: https://issues.apache.org/jira/browse/PARQUET-1337
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Gabor Szadovszky
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
>
> When the size of buffered data gets near the desired row group size, Parquet 
> flushes the data to a row group. However, at this point the data for the last 
> page is not yet encoded nor compressed, thereby the row group may end up 
> being significantly smaller than it was intended.
> If the row group ends up being so small that it is farther away from the next 
> disk block boundary than the maximum padding, Parquet will try to create a 
> new group in the same disk block, this time targeting the remaning space. 
> This may also be flushed prematurely, leading to the creation of an even 
> smaller row group, which may lead to an even smaller one... This gets 
> repeated until we get sufficiently close to the block boundary so that 
> padding can be finally applied. The resulting superflous row groups can lead 
> to bad performance.
> An example of the structure of a Parquet file suffering from this problem can 
> be seen below. For easier interpretation, the row groups are visually grouped 
> by disk blocks:
> {noformat}
> row group 1:  RC:18774 TS:22182960 OFFSET:   4
> row group 2:  RC: 2896 TS: 3428160 OFFSET: 6574564
> row group 3:  RC: 1964 TS: 2322560 OFFSET: 7679844
> row group 4:  RC: 1074 TS: 1268880 OFFSET: 8732964
> {noformat}
> {noformat}
> row group 5:  RC:18808 TS:8560 OFFSET:1000
> row group 6:  RC: 2872 TS: 3389520 OFFSET:16612640
> row group 7:  RC: 1930 TS: 2284960 OFFSET:17716800
> row group 8:  RC: 1040 TS: 1233520 OFFSET:18768240
> {noformat}
> {noformat}
> row group 9:  RC:18852 TS:22275520 OFFSET:2000
> row group 10: RC: 2831 TS: 3345680 OFFSET:26656320
> row group 11: RC: 1893 TS: 2244640 OFFSET:27757200
> row group 12: RC: 1008 TS: 1195520 OFFSET:28806560
> {noformat}
> {noformat}
> row group 13: RC:18841 TS:22263360 OFFSET:3000
> row group 14: RC: 2835 TS: 3350480 OFFSET:36652000
> row group 15: RC: 1900 TS: 2249040 OFFSET:37753600
> row group 16: RC: 1016 TS: 1198640 OFFSET:38803600
> {noformat}
> {noformat}
> row group 17: RC: 1466 TS: 1740320 OFFSET:4000
> {noformat}
> In this example, both the disk block size and the row group size was set to 
> 1000. The data would fit in 5 row groups of this size, but instead, each 
> of the disk blocks (except the last) is split into 4 row groups of 
> progressively decreasing size.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1421) InternalParquetRecordWriter logs debug messages at the INFO level

2018-09-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1421.

   Resolution: Fixed
Fix Version/s: 1.11.0

> InternalParquetRecordWriter logs debug messages at the INFO level
> -
>
> Key: PARQUET-1421
> URL: https://issues.apache.org/jira/browse/PARQUET-1421
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> The superflous log messages clutter the output and may make Travis build due 
> to too long output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1421) InternalParquetRecordWriter logs debug messages at the INFO level

2018-09-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1421:
---
Component/s: parquet-mr

> InternalParquetRecordWriter logs debug messages at the INFO level
> -
>
> Key: PARQUET-1421
> URL: https://issues.apache.org/jira/browse/PARQUET-1421
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> The superflous log messages clutter the output and may make Travis build due 
> to too long output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1418) Run integration tests in Travis

2018-09-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1418:
---
Component/s: parquet-mr

> Run integration tests in Travis
> ---
>
> Key: PARQUET-1418
> URL: https://issues.apache.org/jira/browse/PARQUET-1418
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
>
> Currently Travis only runs the unit tests. It should run the integration 
> tests as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1418) Run integration tests in Travis

2018-09-21 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1418.

Resolution: Fixed

> Run integration tests in Travis
> ---
>
> Key: PARQUET-1418
> URL: https://issues.apache.org/jira/browse/PARQUET-1418
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
>
> Currently Travis only runs the unit tests. It should run the integration 
> tests as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1421) InternalParquetRecordWriter logs debug messages at the INFO level

2018-09-20 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1421:
--

Assignee: Zoltan Ivanfi

> InternalParquetRecordWriter logs debug messages at the INFO level
> -
>
> Key: PARQUET-1421
> URL: https://issues.apache.org/jira/browse/PARQUET-1421
> Project: Parquet
>  Issue Type: Bug
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>
> The superflous log messages clutter the output and may make Travis build due 
> to too long output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1421) InternalParquetRecordWriter logs debug messages at the INFO level

2018-09-20 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1421:
--

 Summary: InternalParquetRecordWriter logs debug messages at the 
INFO level
 Key: PARQUET-1421
 URL: https://issues.apache.org/jira/browse/PARQUET-1421
 Project: Parquet
  Issue Type: Bug
Reporter: Zoltan Ivanfi


The superflous log messages clutter the output and may make Travis build due to 
too long output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1418) Run integration tests in Travis

2018-09-20 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1418:
--

Assignee: Zoltan Ivanfi

> Run integration tests in Travis
> ---
>
> Key: PARQUET-1418
> URL: https://issues.apache.org/jira/browse/PARQUET-1418
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>
> Currently Travis only runs the unit tests. It should run the integration 
> tests as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1418) Run integration tests in Travis

2018-09-20 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1418:
--

 Summary: Run integration tests in Travis
 Key: PARQUET-1418
 URL: https://issues.apache.org/jira/browse/PARQUET-1418
 Project: Parquet
  Issue Type: Improvement
Reporter: Zoltan Ivanfi


Currently Travis only runs the unit tests. It should run the integration tests 
as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-899) Add metadata field describing the application that wrote the file

2018-08-13 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-899.
---
Resolution: Duplicate

Quoting from the commit for PARQUET-352:

WriteSupport now has a getName getter method that is added to the footer
if it returns a non-null string as writer.model.name. This is intended
to help identify files written by object models incorrectly.

So writer.model.name is already there for this purpose, albeit undocumented.

> Add metadata field describing the application that wrote the file
> -
>
> Key: PARQUET-899
> URL: https://issues.apache.org/jira/browse/PARQUET-899
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Zoltan Ivanfi
>Priority: Major
>
> Although the Parquet library should behave the same regardless of what 
> application uses it, occasionally serious interoperability bugs are 
> introduced in specific applications. For example, data written by a specific 
> application may be unnecessarily adjusted or the calculated statistics may be 
> invalid (both actual problems).
> Unfortunately, currently it is not possible to recognize Parquet files 
> affected by application problems because the metadata does not contain any 
> information about the application using the Parquet library. (The name and 
> version number of the Parquet library is recorded, but that only has limited 
> use, because apart from Impala, the most widespread Parquet writers all use 
> the same Java library.)
> To allow creating workarounds for future known issues, we should introduce 
> new metadata fields that applications can populate. The simplest approach is 
> to have one field for the application name and another for its version 
> number. A more sophisticated approach suggested by [~julienledem] could also 
> reference a list of earlier issues that are known to be fixed in the 
> application that wrote the Parquet file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1337) Current block alignment logic may lead to several row groups per block

2018-07-23 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1337:
---
Description: 
When the size of buffered data gets near the desired row group size, Parquet 
flushes the data to a row group. However, at this point the data for the last 
page is not yet encoded nor compressed, thereby the row group may end up being 
significantly smaller than it was intended.

If the row group ends up being so small that it is farther away from the next 
disk block boundary than the maximum padding, Parquet will try to create a new 
group in the same disk block, this time targeting the remaning space. This may 
also be flushed prematurely, leading to the creation of an even smaller row 
group, which may lead to an even smaller one... This gets repeated until we get 
sufficiently close to the block boundary so that padding can be finally 
applied. The resulting superflous row groups can lead to bad performance.

An example of the structure of a Parquet file suffering from this problem can 
be seen below. For easier interpretation, the row groups are visually grouped 
by disk blocks:

{noformat}
row group 1:  RC:18774 TS:22182960 OFFSET:   4
row group 2:  RC: 2896 TS: 3428160 OFFSET: 6574564
row group 3:  RC: 1964 TS: 2322560 OFFSET: 7679844
row group 4:  RC: 1074 TS: 1268880 OFFSET: 8732964
{noformat}
{noformat}
row group 5:  RC:18808 TS:8560 OFFSET:1000
row group 6:  RC: 2872 TS: 3389520 OFFSET:16612640
row group 7:  RC: 1930 TS: 2284960 OFFSET:17716800
row group 8:  RC: 1040 TS: 1233520 OFFSET:18768240
{noformat}
{noformat}
row group 9:  RC:18852 TS:22275520 OFFSET:2000
row group 10: RC: 2831 TS: 3345680 OFFSET:26656320
row group 11: RC: 1893 TS: 2244640 OFFSET:27757200
row group 12: RC: 1008 TS: 1195520 OFFSET:28806560
{noformat}
{noformat}
row group 13: RC:18841 TS:22263360 OFFSET:3000
row group 14: RC: 2835 TS: 3350480 OFFSET:36652000
row group 15: RC: 1900 TS: 2249040 OFFSET:37753600
row group 16: RC: 1016 TS: 1198640 OFFSET:38803600
{noformat}
{noformat}
row group 17: RC: 1466 TS: 1740320 OFFSET:4000
{noformat}

In this example, both the disk block size and the row group size was set to 
1000. The data would fit in 5 row groups of this size, but instead, each of 
the disk blocks (except the last) is split into 4 row groups of progressively 
decreasing size.

  was:
If there are many columns with encoding RLE+bitpacking (e.g. dictionary 
encoding) where the value variance is low the estimate of the size of the open 
pages (which are not encoded yet) are much larger than the final page size. 
Because of that parquet-mr fails to create row-groups that size are close to 
{{parquet.block.size}} which causes performance issues while reading.

A hint from Ryan to solve this issue:
{quote}
We could probably get a better estimate by using the amount of buffered
data and how large other pages in a column were after fully encoding and
compressing. So if you have 5 pages compressed and buffered, and another
1000 values, use the compression ratio of the 5 pages to estimate the final
size. We'd probably want to use some overhead value for the header. And,
we'd want to separate the amount of buffered data from our row group size
estimate, which are currently the same thing.
{quote}

(So, it is not only about RLE+bitpacking but any kind of encoding which is done 
only after "closing" a page.)


> Current block alignment logic may lead to several row groups per block
> --
>
> Key: PARQUET-1337
> URL: https://issues.apache.org/jira/browse/PARQUET-1337
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Gabor Szadovszky
>Assignee: Zoltan Ivanfi
>Priority: Major
>
> When the size of buffered data gets near the desired row group size, Parquet 
> flushes the data to a row group. However, at this point the data for the last 
> page is not yet encoded nor compressed, thereby the row group may end up 
> being significantly smaller than it was intended.
> If the row group ends up being so small that it is farther away from the next 
> disk block boundary than the maximum padding, Parquet will try to create a 
> new group in the same disk block, this time targeting the remaning space. 
> This may also be flushed prematurely, leading to the creation of an even 
> smaller row group, which may lead to an even smaller one... This gets 
> repeated until we get sufficiently close to the block boundary so that 
> padding can be finally applied. The resulting superflous row groups can lead 
> to bad performance.
> An example of the structure of a Parquet file suffering from this problem can 
> be seen below. For easier interpretation, the row groups are visually grouped 
> by disk blocks:
> {noformat}
> row group 1:  RC:18774 

[jira] [Updated] (PARQUET-1337) Current block alignment logic may lead to several row groups per block

2018-07-23 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1337:
---
Summary: Current block alignment logic may lead to several row groups per 
block  (was: Implement better estimate of page size for RLE+bitpacking)

> Current block alignment logic may lead to several row groups per block
> --
>
> Key: PARQUET-1337
> URL: https://issues.apache.org/jira/browse/PARQUET-1337
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Gabor Szadovszky
>Assignee: Zoltan Ivanfi
>Priority: Major
>
> If there are many columns with encoding RLE+bitpacking (e.g. dictionary 
> encoding) where the value variance is low the estimate of the size of the 
> open pages (which are not encoded yet) are much larger than the final page 
> size. Because of that parquet-mr fails to create row-groups that size are 
> close to {{parquet.block.size}} which causes performance issues while reading.
> A hint from Ryan to solve this issue:
> {quote}
> We could probably get a better estimate by using the amount of buffered
> data and how large other pages in a column were after fully encoding and
> compressing. So if you have 5 pages compressed and buffered, and another
> 1000 values, use the compression ratio of the 5 pages to estimate the final
> size. We'd probably want to use some overhead value for the header. And,
> we'd want to separate the amount of buffered data from our row group size
> estimate, which are currently the same thing.
> {quote}
> (So, it is not only about RLE+bitpacking but any kind of encoding which is 
> done only after "closing" a page.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1337) Implement better estimate of page size for RLE+bitpacking

2018-07-23 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1337:
--

Assignee: Zoltan Ivanfi

> Implement better estimate of page size for RLE+bitpacking
> -
>
> Key: PARQUET-1337
> URL: https://issues.apache.org/jira/browse/PARQUET-1337
> Project: Parquet
>  Issue Type: Improvement
>Reporter: Gabor Szadovszky
>Assignee: Zoltan Ivanfi
>Priority: Major
>
> If there are many columns with encoding RLE+bitpacking (e.g. dictionary 
> encoding) where the value variance is low the estimate of the size of the 
> open pages (which are not encoded yet) are much larger than the final page 
> size. Because of that parquet-mr fails to create row-groups that size are 
> close to {{parquet.block.size}} which causes performance issues while reading.
> A hint from Ryan to solve this issue:
> {quote}
> We could probably get a better estimate by using the amount of buffered
> data and how large other pages in a column were after fully encoding and
> compressing. So if you have 5 pages compressed and buffered, and another
> 1000 values, use the compression ratio of the 5 pages to estimate the final
> size. We'd probably want to use some overhead value for the header. And,
> we'd want to separate the amount of buffered data from our row group size
> estimate, which are currently the same thing.
> {quote}
> (So, it is not only about RLE+bitpacking but any kind of encoding which is 
> done only after "closing" a page.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1353) The random data generator used for tests repeats the same value over and over again

2018-07-23 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1353:
---
Component/s: parquet-mr

> The random data generator used for tests repeats the same value over and over 
> again
> ---
>
> Key: PARQUET-1353
> URL: https://issues.apache.org/jira/browse/PARQUET-1353
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Minor
>  Labels: pull-request-available
>
> The RandomValues class returns references to its internal buffer as random 
> values. This buffer gets a random value every time a new random value is 
> requested, but since earlier values reference the same internal buffer, they 
> get changed to the same value as well. So even if successive calls return 
> different values each time, the actual list of these values will always 
> consist of a single value repeated multiple times. For example:
> ||n-th call||returned value||accumulated list expected||accumulated list 
> actual||
> |1|6C|6C|6C|
> |2|8F|6C 8F|8F 8F|
> |3|52|6C 8F 52|52 52 52|
> |4|B8|6C 8F 52 B8|B8 B8 B8 B8|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1353) The random data generator used for tests repeats the same value over and over again

2018-07-23 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1353:
--

 Summary: The random data generator used for tests repeats the same 
value over and over again
 Key: PARQUET-1353
 URL: https://issues.apache.org/jira/browse/PARQUET-1353
 Project: Parquet
  Issue Type: Bug
Reporter: Zoltan Ivanfi


The RandomValues class returns references to its internal buffer as random 
values. This buffer gets a random value every time a new random value is 
requested, but since earlier values reference the same internal buffer, they 
get changed to the same value as well. So even if successive calls return 
different values each time, the actual list of these values will always consist 
of a single value repeated multiple times. For example:

||n-th call||returned value||accumulated list expected||accumulated list 
actual||
|1|6C|6C|6C|
|2|8F|6C 8F|8F 8F|
|3|52|6C 8F 52|52 52 52|
|4|B8|6C 8F 52 B8|B8 B8 B8 B8|




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1353) The random data generator used for tests repeats the same value over and over again

2018-07-23 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1353:
--

Assignee: Zoltan Ivanfi

> The random data generator used for tests repeats the same value over and over 
> again
> ---
>
> Key: PARQUET-1353
> URL: https://issues.apache.org/jira/browse/PARQUET-1353
> Project: Parquet
>  Issue Type: Bug
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Minor
>
> The RandomValues class returns references to its internal buffer as random 
> values. This buffer gets a random value every time a new random value is 
> requested, but since earlier values reference the same internal buffer, they 
> get changed to the same value as well. So even if successive calls return 
> different values each time, the actual list of these values will always 
> consist of a single value repeated multiple times. For example:
> ||n-th call||returned value||accumulated list expected||accumulated list 
> actual||
> |1|6C|6C|6C|
> |2|8F|6C 8F|8F 8F|
> |3|52|6C 8F 52|52 52 52|
> |4|B8|6C 8F 52 B8|B8 B8 B8 B8|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1347) [parquet-tools] dump command shows binary values differenty than the cat or head commands

2018-07-12 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1347:
---
Description: 
{{parquet-tools dump}} shows binary values as strings, if they are valid UTF-8 
sequences.

{{parquet-tools cat}} and {{parquet-tools head}} show binary values 
base64-encoded, regardless of whether they are valid UTF-8 sequences or not.

(If the type is annotated as UTF-8, the values are shown as strings by all of 
these commands.)

  was:
{{parquet-tools dump}} shows binary values as strings, if they are valid UTF-8 
sequences.

{{parquet-tools cat}} and {{parquet-tools head}} show binary values 
base64-encoded, regardless of whether they are valid UTF-8 sequences or not.


> [parquet-tools] dump command shows binary values differenty than the cat or 
> head commands
> -
>
> Key: PARQUET-1347
> URL: https://issues.apache.org/jira/browse/PARQUET-1347
> Project: Parquet
>  Issue Type: Bug
>Reporter: Zoltan Ivanfi
>Priority: Minor
>
> {{parquet-tools dump}} shows binary values as strings, if they are valid 
> UTF-8 sequences.
> {{parquet-tools cat}} and {{parquet-tools head}} show binary values 
> base64-encoded, regardless of whether they are valid UTF-8 sequences or not.
> (If the type is annotated as UTF-8, the values are shown as strings by all of 
> these commands.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1347) [parquet-tools] dump command shows binary values differenty than the cat or head commands

2018-07-12 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1347:
--

 Summary: [parquet-tools] dump command shows binary values 
differenty than the cat or head commands
 Key: PARQUET-1347
 URL: https://issues.apache.org/jira/browse/PARQUET-1347
 Project: Parquet
  Issue Type: Bug
Reporter: Zoltan Ivanfi


{{parquet-tools dump}} shows binary values as strings, if they are valid UTF-8 
sequences.

{{parquet-tools cat}} and {{parquet-tools head}} show binary values 
base64-encoded, regardless of whether they are valid UTF-8 sequences or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1317) ParquetMetadataConverter throw NPE

2018-06-04 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1317:
---
Affects Version/s: (was: 1.10.1)
   1.11.0

> ParquetMetadataConverter throw NPE
> --
>
> Key: PARQUET-1317
> URL: https://issues.apache.org/jira/browse/PARQUET-1317
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 1.11.0
>
>
> How to reproduce:
> {code:scala}
> $ bin/spark-shell 
> scala> spark.range(10).selectExpr("cast(id as string) as 
> id").coalesce(1).write.parquet("/tmp/parquet-1317")
> scala> 
> java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head 
> --debug 
> file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet
> {code}
> {noformat}
> java.io.IOException: Could not read footer: java.lang.NullPointerException
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354)
> at 
> org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88)
> at org.apache.parquet.tools.Main.main(Main.java:223)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> java.io.IOException: Could not read footer: 
> java.lang.NullPointerException{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (PARQUET-1317) ParquetMetadataConverter throw NPE

2018-06-04 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi resolved PARQUET-1317.

   Resolution: Fixed
Fix Version/s: 1.11.0

> ParquetMetadataConverter throw NPE
> --
>
> Key: PARQUET-1317
> URL: https://issues.apache.org/jira/browse/PARQUET-1317
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 1.11.0
>
>
> How to reproduce:
> {code:scala}
> $ bin/spark-shell 
> scala> spark.range(10).selectExpr("cast(id as string) as 
> id").coalesce(1).write.parquet("/tmp/parquet-1317")
> scala> 
> java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head 
> --debug 
> file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet
> {code}
> {noformat}
> java.io.IOException: Could not read footer: java.lang.NullPointerException
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354)
> at 
> org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88)
> at org.apache.parquet.tools.Main.main(Main.java:223)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> java.io.IOException: Could not read footer: 
> java.lang.NullPointerException{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1317) ParquetMetadataConverter throw NPE

2018-06-04 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi updated PARQUET-1317:
---
Component/s: parquet-mr

> ParquetMetadataConverter throw NPE
> --
>
> Key: PARQUET-1317
> URL: https://issues.apache.org/jira/browse/PARQUET-1317
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.1
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:scala}
> $ bin/spark-shell 
> scala> spark.range(10).selectExpr("cast(id as string) as 
> id").coalesce(1).write.parquet("/tmp/parquet-1317")
> scala> 
> java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head 
> --debug 
> file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet
> {code}
> {noformat}
> java.io.IOException: Could not read footer: java.lang.NullPointerException
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354)
> at 
> org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88)
> at org.apache.parquet.tools.Main.main(Main.java:223)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> java.io.IOException: Could not read footer: 
> java.lang.NullPointerException{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1317) ParquetMetadataConverter throw NPE

2018-06-04 Thread Zoltan Ivanfi (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499891#comment-16499891
 ] 

Zoltan Ivanfi commented on PARQUET-1317:


Hi [~q79969786], Thanks for reporting and investigating this issue. Since you 
wrote that you are working on it (but couldn't assign it to yourself due to 
insufficient access rights), I added you to the list of contributors and 
assigned the JIRA to you. In the future you can freely assign tickets to 
yourself (and of course you can unassign them as well if you stop working on 
them).

> ParquetMetadataConverter throw NPE
> --
>
> Key: PARQUET-1317
> URL: https://issues.apache.org/jira/browse/PARQUET-1317
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:scala}
> $ bin/spark-shell 
> scala> spark.range(10).selectExpr("cast(id as string) as 
> id").coalesce(1).write.parquet("/tmp/parquet-1317")
> scala> 
> java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head 
> --debug 
> file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet
> {code}
> {noformat}
> java.io.IOException: Could not read footer: java.lang.NullPointerException
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354)
> at 
> org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88)
> at org.apache.parquet.tools.Main.main(Main.java:223)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> java.io.IOException: Could not read footer: 
> java.lang.NullPointerException{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (PARQUET-1317) ParquetMetadataConverter throw NPE

2018-06-04 Thread Zoltan Ivanfi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltan Ivanfi reassigned PARQUET-1317:
--

Assignee: Yuming Wang

> ParquetMetadataConverter throw NPE
> --
>
> Key: PARQUET-1317
> URL: https://issues.apache.org/jira/browse/PARQUET-1317
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> How to reproduce:
> {code:scala}
> $ bin/spark-shell 
> scala> spark.range(10).selectExpr("cast(id as string) as 
> id").coalesce(1).write.parquet("/tmp/parquet-1317")
> scala> 
> java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head 
> --debug 
> file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet
> {code}
> {noformat}
> java.io.IOException: Could not read footer: java.lang.NullPointerException
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354)
> at 
> org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88)
> at org.apache.parquet.tools.Main.main(Main.java:223)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058)
> at 
> org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> java.io.IOException: Could not read footer: 
> java.lang.NullPointerException{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (PARQUET-1295) Parquet libraries do not follow proper semantic versioning

2018-05-22 Thread Zoltan Ivanfi (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483843#comment-16483843
 ] 

Zoltan Ivanfi edited comment on PARQUET-1295 at 5/22/18 12:12 PM:
--

I agree with [~vrozov], in fact I used the same argument [on the mailing 
list|https://lists.apache.org/thread.html/7db8ec906b29c917d70120fab78382cdfb2406c4188f10829933ed87@%3Cdev.parquet.apache.org%3E]:

{quote}Parquet uses semantic versioning. As a library, it should take extra
care not to break its public API in minor releases. This also applies
to publicly accessible classes and methods that are considered
internal if this "internalness" is not properly documented. It is
tempting to dismiss these cases with the reasoning that they were not
intended to be public in the first place, but from an API consumer's
point of view, this "leaked" API is indistinguishable from the "real"
API. Currently the information of what is public and what is internal
is undocumented and only known to a few Parquet developers.

Until API consumers have a way to determine the intended target
audience of our classes and methods, we should pay more attention to
keeping our leaked internal API backwards-compatible as well.
{quote}

[~vrozov], regarding your comment:

{quote}
That information is hidden somewhere in the pom file.
{quote}

It's actually even worse than that. The exclusions are only added to the pom 
file when a breaking change is made, so even that list is unsuitable for 
determining whether something is considered internal, as it only contains those 
parts of the internal API that we already broke.


was (Author: zi):
I agree with [~vrozov], in fact I used the same argument [on the mailing 
list|https://lists.apache.org/thread.html/7db8ec906b29c917d70120fab78382cdfb2406c4188f10829933ed87@%3Cdev.parquet.apache.org%3E].

{quote}That information is hidden somewhere in the pom file.{quote}

It's actually even worse than that. The exclusions are only added to the pom 
file when a breaking change is made, so even that list is unsuitable for 
determining whether something is considered internal, as it only contains those 
parts of the internal API that we already broke.

> Parquet libraries do not follow proper semantic versioning
> --
>
> Key: PARQUET-1295
> URL: https://issues.apache.org/jira/browse/PARQUET-1295
> Project: Parquet
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Priority: Major
>
> There are changes between 1.8.0 and 1.10.0 that break API compatibility. A 
> minor version change is supposed to be backward compatible with 1.9.0 and 
> 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1295) Parquet libraries do not follow proper semantic versioning

2018-05-22 Thread Zoltan Ivanfi (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483843#comment-16483843
 ] 

Zoltan Ivanfi commented on PARQUET-1295:


I agree with [~vrozov], in fact I used the same argument [on the mailing 
list|https://lists.apache.org/thread.html/7db8ec906b29c917d70120fab78382cdfb2406c4188f10829933ed87@%3Cdev.parquet.apache.org%3E].

{quote}That information is hidden somewhere in the pom file.{quote}

It's actually even worse than that. The exclusions are only added to the pom 
file when a breaking change is made, so even that list is unsuitable for 
determining whether something is considered internal, as it only contains those 
parts of the internal API that we already broke.

> Parquet libraries do not follow proper semantic versioning
> --
>
> Key: PARQUET-1295
> URL: https://issues.apache.org/jira/browse/PARQUET-1295
> Project: Parquet
>  Issue Type: Bug
>Reporter: Vlad Rozov
>Priority: Major
>
> There are changes between 1.8.0 and 1.10.0 that break API compatibility. A 
> minor version change is supposed to be backward compatible with 1.9.0 and 
> 1.8.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1304) Release 1.10 contains breaking changes for Hive

2018-05-18 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1304:
--

 Summary: Release 1.10 contains breaking changes for Hive
 Key: PARQUET-1304
 URL: https://issues.apache.org/jira/browse/PARQUET-1304
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Affects Versions: 1.10.0
Reporter: Zoltan Ivanfi


Hive uses the initFromPage(int valueCount, ByteBuffer page, int offset) method 
that [got 
removed|https://github.com/apache/parquet-mr/commit/8bbc6cb95fd9b4b9e86c924ca1e40fd555ecac1d#diff-175c27f5147df0043ac57c7685629934L574]
 in PARQUET-787. As a result, Hive does not compile with Paruqet 1.10.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   >