[jira] [Assigned] (PARQUET-1353) The random data generator used for tests repeats the same value over and over again
[ https://issues.apache.org/jira/browse/PARQUET-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1353: -- Assignee: (was: Zoltan Ivanfi) > The random data generator used for tests repeats the same value over and over > again > --- > > Key: PARQUET-1353 > URL: https://issues.apache.org/jira/browse/PARQUET-1353 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Zoltan Ivanfi >Priority: Minor > Labels: pull-request-available > > The RandomValues class returns references to its internal buffer as random > values. This buffer gets a random value every time a new random value is > requested, but since earlier values reference the same internal buffer, they > get changed to the same value as well. So even if successive calls return > different values each time, the actual list of these values will always > consist of a single value repeated multiple times. For example: > ||n-th call||returned value||accumulated list expected||accumulated list > actual|| > |1|6C|6C|6C| > |2|8F|6C 8F|8F 8F| > |3|52|6C 8F 52|52 52 52| > |4|B8|6C 8F 52 B8|B8 B8 B8 B8| -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (PARQUET-1337) Current block alignment logic may lead to several row groups per block
[ https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1337: -- Assignee: (was: Zoltan Ivanfi) > Current block alignment logic may lead to several row groups per block > -- > > Key: PARQUET-1337 > URL: https://issues.apache.org/jira/browse/PARQUET-1337 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > When the size of buffered data gets near the desired row group size, Parquet > flushes the data to a row group. However, at this point the data for the last > page is not yet encoded nor compressed, thereby the row group may end up > being significantly smaller than it was intended. > If the row group ends up being so small that it is farther away from the next > disk block boundary than the maximum padding, Parquet will try to create a > new group in the same disk block, this time targeting the remaning space. > This may also be flushed prematurely, leading to the creation of an even > smaller row group, which may lead to an even smaller one... This gets > repeated until we get sufficiently close to the block boundary so that > padding can be finally applied. The resulting superflous row groups can lead > to bad read performance. > An example of the structure of a Parquet file suffering from this problem can > be seen below. For easier interpretation, the row groups are visually grouped > by disk blocks: > {noformat} > row group 1: RC:18774 TS:22182960 OFFSET: 4 > row group 2: RC: 2896 TS: 3428160 OFFSET: 6574564 > row group 3: RC: 1964 TS: 2322560 OFFSET: 7679844 > row group 4: RC: 1074 TS: 1268880 OFFSET: 8732964 > {noformat} > {noformat} > row group 5: RC:18808 TS:8560 OFFSET:1000 > row group 6: RC: 2872 TS: 3389520 OFFSET:16612640 > row group 7: RC: 1930 TS: 2284960 OFFSET:17716800 > row group 8: RC: 1040 TS: 1233520 OFFSET:18768240 > {noformat} > {noformat} > row group 9: RC:18852 TS:22275520 OFFSET:2000 > row group 10: RC: 2831 TS: 3345680 OFFSET:26656320 > row group 11: RC: 1893 TS: 2244640 OFFSET:27757200 > row group 12: RC: 1008 TS: 1195520 OFFSET:28806560 > {noformat} > {noformat} > row group 13: RC:18841 TS:22263360 OFFSET:3000 > row group 14: RC: 2835 TS: 3350480 OFFSET:36652000 > row group 15: RC: 1900 TS: 2249040 OFFSET:37753600 > row group 16: RC: 1016 TS: 1198640 OFFSET:38803600 > {noformat} > {noformat} > row group 17: RC: 1466 TS: 1740320 OFFSET:4000 > {noformat} > In this example, both the disk block size and the row group size was set to > 1000. The data would fit in 5 row groups of this size, but instead, each > of the disk blocks (except the last) is split into 4 row groups of > progressively decreasing size. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (PARQUET-1628) Accept local timestamps annotated with the legacy timestamp types
Zoltan Ivanfi created PARQUET-1628: -- Summary: Accept local timestamps annotated with the legacy timestamp types Key: PARQUET-1628 URL: https://issues.apache.org/jira/browse/PARQUET-1628 Project: Parquet Issue Type: Task Components: parquet-mr Reporter: Zoltan Ivanfi Assignee: Nandor Kollar The rules for TIMESTAMP forward-compatibility were created based on the assumption that TIMESTAMP_MILLIS and TIMESTAMP_MICROS have only been used in the instant aka. UTC-normalized semantics so far. >From this false premise it followed that TIMESTAMPs with local semantics were >a new type and did not need to be annotated with the old types to maintain >compatibility. In fact, annotating them with the old types were considered to >be harmful, since it would have mislead older readers into thinking that they >can read TIMESTAMPs with local semantics, when in reality they would have >misinterpreted them as TIMESTAMPs with instant semantics. This would have lead >to a difference of several hours, corresponding to the time zone offset. In reality, however, this misinterpretation of timestamps has already been going on for a while, since Arrow annotates local timestamps with TIMESTAMP_MILLIS or TIMESTMAP_MICROS. To maintain forward compatibilty of local timestamps, Arrow annotates them with the legacy timestamp logical types. However, the Java library considers these logical types to be incompatible and discards the new type in favour of the legacy ones (since doing the other way around would change the behaviour). Parquet-mr should be updated so that it accepts this combination of new and old logical types. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (PARQUET-1627) Update specification so that legacy timestamp logical types can be written for local semantics as well
Zoltan Ivanfi created PARQUET-1627: -- Summary: Update specification so that legacy timestamp logical types can be written for local semantics as well Key: PARQUET-1627 URL: https://issues.apache.org/jira/browse/PARQUET-1627 Project: Parquet Issue Type: Task Components: parquet-format Reporter: Zoltan Ivanfi Assignee: Nandor Kollar The rules for TIMESTAMP forward-compatibility were created based on the assumption that TIMESTAMP_MILLIS and TIMESTAMP_MICROS have only been used in the instant aka. UTC-normalized semantics so far. >From this false premise it followed that TIMESTAMPs with local semantics were >a new type and did not need to be annotated with the old types to maintain >compatibility. In fact, annotating them with the old types were considered to >be harmful, since it would have mislead older readers into thinking that they >can read TIMESTAMPs with local semantics, when in reality they would have >misinterpreted them as TIMESTAMPs with instant semantics. This would have lead >to a difference of several hours, corresponding to the time zone offset. In reality, however, this misinterpretation of timestamps has already been going on for a while, since Arrow annotates local timestamps with TIMESTAMP_MILLIS or TIMESTMAP_MICROS. To maintain forward compatibilty of local timestamps, the specification should allow annotating them with the legacy timestamp logical types. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (PARQUET-1222) Specify a well-defined sorting order for float and double types
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1222: --- Description: Currently parquet-format specifies the sort order for floating point numbers as follows: {code:java} * FLOAT - signed comparison of the represented value * DOUBLE - signed comparison of the represented value {code} The problem is that the comparison of floating point numbers is only a partial ordering with strange behaviour in specific corner cases. For example, according to IEEE 754, -0 is neither less nor more than \+0 and comparing NaN to anything always returns false. This ordering is not suitable for statistics. Additionally, the Java implementation already uses a different (total) ordering that handles these cases correctly but differently than the C\+\+ implementations, which leads to interoperability problems. TypeDefinedOrder for doubles and floats should be deprecated and a new TotalFloatingPointOrder should be introduced. The default for writing doubles and floats would be the new TotalFloatingPointOrder. This ordering should be effective and easy to implement in all programming languages. was: Currently parquet-format specifies the sort order for floating point numbers as follows: {code:java} * FLOAT - signed comparison of the represented value * DOUBLE - signed comparison of the represented value {code} The problem is that the comparison of floating point numbers is only a partial ordering with strange behaviour in specific corner cases. For example, according to IEEE 754, -0 is neither less nor more than +0 and comparing NaN to anything always returns false. This ordering is not suitable for statistics. Additionally, the Java implementation already uses a different (total) ordering that handles these cases correctly but differently than the C++ implementations, which leads to interoperability problems. TypeDefinedOrder for doubles and floats should be deprecated and a new TotalFloatingPointOrder should be introduced. The default for writing doubles and floats would be the new TotalFloatingPointOrder. This ordering should be effective and easy to implement in all programming languages. > Specify a well-defined sorting order for float and double types > --- > > Key: PARQUET-1222 > URL: https://issues.apache.org/jira/browse/PARQUET-1222 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Zoltan Ivanfi >Priority: Critical > > Currently parquet-format specifies the sort order for floating point numbers > as follows: > {code:java} >* FLOAT - signed comparison of the represented value >* DOUBLE - signed comparison of the represented value > {code} > The problem is that the comparison of floating point numbers is only a > partial ordering with strange behaviour in specific corner cases. For > example, according to IEEE 754, -0 is neither less nor more than \+0 and > comparing NaN to anything always returns false. This ordering is not suitable > for statistics. Additionally, the Java implementation already uses a > different (total) ordering that handles these cases correctly but differently > than the C\+\+ implementations, which leads to interoperability problems. > TypeDefinedOrder for doubles and floats should be deprecated and a new > TotalFloatingPointOrder should be introduced. The default for writing doubles > and floats would be the new TotalFloatingPointOrder. This ordering should be > effective and easy to implement in all programming languages. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1588) Bump Apache Thrift to 0.12.0 in parquet-format
[ https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1588. Resolution: Fixed > Bump Apache Thrift to 0.12.0 in parquet-format > -- > > Key: PARQUET-1588 > URL: https://issues.apache.org/jira/browse/PARQUET-1588 > Project: Parquet > Issue Type: Improvement > Components: parquet-format >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Fix For: format-2.7.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1588) Bump Apache Thrift to 0.12.0 in parquet-format
[ https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1588: --- Fix Version/s: format-2.7.0 > Bump Apache Thrift to 0.12.0 in parquet-format > -- > > Key: PARQUET-1588 > URL: https://issues.apache.org/jira/browse/PARQUET-1588 > Project: Parquet > Issue Type: Improvement > Components: parquet-format >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Fix For: format-2.7.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1588) Bump Apache Thrift to 0.12.0 in parquet-format
[ https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16861932#comment-16861932 ] Zoltan Ivanfi commented on PARQUET-1588: It already existed, just not as "2.7.0" but as "format-2.7.0" instead. > Bump Apache Thrift to 0.12.0 in parquet-format > -- > > Key: PARQUET-1588 > URL: https://issues.apache.org/jira/browse/PARQUET-1588 > Project: Parquet > Issue Type: Improvement > Components: parquet-format >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Fix For: format-2.7.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (PARQUET-1588) Bump Apache Thrift to 0.12.0
[ https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reopened PARQUET-1588: As we discussed, let's stick to your original approach of separate JIRA-s for parquet-mr and parquet-format to better track what gets released in which version. > Bump Apache Thrift to 0.12.0 > > > Key: PARQUET-1588 > URL: https://issues.apache.org/jira/browse/PARQUET-1588 > Project: Parquet > Issue Type: Improvement > Components: parquet-format >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1588) Bump Apache Thrift to 0.12.0 in parquet-format
[ https://issues.apache.org/jira/browse/PARQUET-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1588: --- Summary: Bump Apache Thrift to 0.12.0 in parquet-format (was: Bump Apache Thrift to 0.12.0) > Bump Apache Thrift to 0.12.0 in parquet-format > -- > > Key: PARQUET-1588 > URL: https://issues.apache.org/jira/browse/PARQUET-1588 > Project: Parquet > Issue Type: Improvement > Components: parquet-format >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1590) [parquet-format] Add Java 11 to Travis
[ https://issues.apache.org/jira/browse/PARQUET-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1590: --- Summary: [parquet-format] Add Java 11 to Travis (was: Build against Java 11) > [parquet-format] Add Java 11 to Travis > -- > > Key: PARQUET-1590 > URL: https://issues.apache.org/jira/browse/PARQUET-1590 > Project: Parquet > Issue Type: Improvement > Components: parquet-format >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (PARQUET-1590) Build against Java 11
[ https://issues.apache.org/jira/browse/PARQUET-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reopened PARQUET-1590: > Build against Java 11 > - > > Key: PARQUET-1590 > URL: https://issues.apache.org/jira/browse/PARQUET-1590 > Project: Parquet > Issue Type: Improvement > Components: parquet-format >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1499) [parquet-mr] Add Java 11 to Travis
[ https://issues.apache.org/jira/browse/PARQUET-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1499: --- Summary: [parquet-mr] Add Java 11 to Travis (was: Add Java 11 build to the repository) > [parquet-mr] Add Java 11 to Travis > -- > > Key: PARQUET-1499 > URL: https://issues.apache.org/jira/browse/PARQUET-1499 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1585) Update old external links in the code base
[ https://issues.apache.org/jira/browse/PARQUET-1585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1585. Resolution: Fixed Fix Version/s: 1.11.0 > Update old external links in the code base > -- > > Key: PARQUET-1585 > URL: https://issues.apache.org/jira/browse/PARQUET-1585 > Project: Parquet > Issue Type: Task >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1585) Update old external links in the code base
Zoltan Ivanfi created PARQUET-1585: -- Summary: Update old external links in the code base Key: PARQUET-1585 URL: https://issues.apache.org/jira/browse/PARQUET-1585 Project: Parquet Issue Type: Task Reporter: Zoltan Ivanfi Assignee: Zoltan Ivanfi -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1572) Clarify the definition of timestamp types
Zoltan Ivanfi created PARQUET-1572: -- Summary: Clarify the definition of timestamp types Key: PARQUET-1572 URL: https://issues.apache.org/jira/browse/PARQUET-1572 Project: Parquet Issue Type: Task Components: parquet-format Reporter: Zoltan Ivanfi Assignee: Zoltan Ivanfi The current definition only makes sense for the isUtcAdjusted=true case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (PARQUET-1496) [Java] Update Scala for JDK 11 compatibility
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801910#comment-16801910 ] Zoltan Ivanfi edited comment on PARQUET-1496 at 5/3/19 3:01 PM: There seems to be an unresolvable circular incompatibility issue here: * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 19. * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons: ** It doesn't handle one of our empty structs correctly. Update: This turned out to be due to using a javadoc-style comment in the empty struct. ** It doesn't handle the {{String}} logical type correctly, because in the code it generates it does not use fully qualified names. Since the name of this logical type shadows the stock String type, this leads to a compilation failure in the generated {{LogicalType.scala}} file. was (Author: zi): There seems to be an unresolvable circular incompatibility issue here: * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 19. * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons: ** It doesn't handle empty structs correctly. For further experimentation, this can be hacked around by changing each {noformat} struct whatever {} {noformat} to {noformat} struct whatever {32767: optional i32 dummy;} {noformat} ** It doesn't handle the {{String}} logical type correctly, because in the code it generates it does not use fully qualified names. Since the name of this logical type shadows the stock String type, this leads to a compilation failure in the generated {{LogicalType.scala}} file. > [Java] Update Scala for JDK 11 compatibility > > > Key: PARQUET-1496 > URL: https://issues.apache.org/jira/browse/PARQUET-1496 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, > the build fails for me in {{parquet-scala}} with: > {code:java} > [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 > --- > [INFO] Checking for multiple versions of scala > [INFO] includes = [**/*.java,**/*.scala,] > [INFO] excludes = [] > [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: > info: compiling > [INFO] Compiling 1 source files to > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at > 1547922718010 > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class) > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class) > [ERROR] error: scala.reflect.internal.MissingRequirementError: object > java.lang.Object in compiler mirror not found. > [ERROR] at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) > [ERROR] at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407) > [INFO] at >
[jira] [Updated] (PARQUET-1556) Problem with Maven repo specifications in POMs of dependencies in some development environments
[ https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1556: --- Description: Running {{mvn verify}} based on the instructions in the README results in this error {code:java} Could not resolve dependencies for project org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} As a workaround, the local {{~/.m2/settings.xml}} file can be modified to include the twitter maven repo: {code:java} twitter twitter http://maven.twttr.com {code} After adding this, {{mvn verify}} works. This should not be necessary though, since the artifact is a transitive dependency and the POM of the direct dependency (elephant-bird) contains the repo specification, which works in most environments. was: Running {{mvn verify}} based on the instructions in the README results in this error {code:java} Could not resolve dependencies for project org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} As a workaround, the local {{~/.m2/settings.xml}} file can be modified to include the twitter maven repo: {code:java} twitter twitter http://maven.twttr.com {code} After adding this, {{mvn verify}} works. The proper solution, however, is to include this repo in the POM files. > Problem with Maven repo specifications in POMs of dependencies in some > development environments > --- > > Key: PARQUET-1556 > URL: https://issues.apache.org/jira/browse/PARQUET-1556 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.12.0 > > > Running {{mvn verify}} based on the instructions in the README results in > this error > {code:java} > Could not resolve dependencies for project > org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact > com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} > As a workaround, the local {{~/.m2/settings.xml}} file can be modified to > include the twitter maven repo: > {code:java} > > twitter > twitter > http://maven.twttr.com > > {code} > After adding this, {{mvn verify}} works. This should not be necessary though, > since the artifact is a transitive dependency and the POM of the direct > dependency (elephant-bird) contains the repo specification, which works in > most environments. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1556) Problem with Maven repo specifications in POMs of dependencies in some development environments
[ https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1556: --- Summary: Problem with Maven repo specifications in POMs of dependencies in some development environments (was: Add twitter maven repo to POM for hadoop-lzo dependency) > Problem with Maven repo specifications in POMs of dependencies in some > development environments > --- > > Key: PARQUET-1556 > URL: https://issues.apache.org/jira/browse/PARQUET-1556 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.12.0 > > > Running {{mvn verify}} based on the instructions in the README results in > this error > {code:java} > Could not resolve dependencies for project > org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact > com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} > As a workaround, the local {{~/.m2/settings.xml}} file can be modified to > include the twitter maven repo: > {code:java} > > twitter > twitter > http://maven.twttr.com > > {code} > After adding this, {{mvn verify}} works. The proper solution, however, is to > include this repo in the POM files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency
[ https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809841#comment-16809841 ] Zoltan Ivanfi commented on PARQUET-1556: I came to the conclusion that the only possible source of the twitter repo is the [POM of the elephant-bird dependecy|https://github.com/twitter/elephant-bird/blob/master/pom.xml#L94]. However, I have no idea why this doesn't happen for you, [~andygrove]. I have tried it with both Maven 3.5.2 and 3.6.0 and both are able to download the transitive dependecy. What version do you use? > Add twitter maven repo to POM for hadoop-lzo dependency > --- > > Key: PARQUET-1556 > URL: https://issues.apache.org/jira/browse/PARQUET-1556 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.12.0 > > > Running {{mvn verify}} based on the instructions in the README results in > this error > {code:java} > Could not resolve dependencies for project > org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact > com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} > As a workaround, the local {{~/.m2/settings.xml}} file can be modified to > include the twitter maven repo: > {code:java} > > twitter > twitter > http://maven.twttr.com > > {code} > After adding this, {{mvn verify}} works. The proper solution, however, is to > include this repo in the POM files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency
[ https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16809841#comment-16809841 ] Zoltan Ivanfi edited comment on PARQUET-1556 at 4/4/19 1:16 PM: I came to the conclusion that the only possible source of the twitter repo is the [POM of the elephant-bird dependecy|https://github.com/twitter/elephant-bird/blob/master/pom.xml#L94]. However, I have no idea why this doesn't work for you, [~andygrove]. I have tried it with both Maven 3.5.2 and 3.6.0 and both are able to download the transitive dependecy. What version do you use? was (Author: zi): I came to the conclusion that the only possible source of the twitter repo is the [POM of the elephant-bird dependecy|https://github.com/twitter/elephant-bird/blob/master/pom.xml#L94]. However, I have no idea why this doesn't happen for you, [~andygrove]. I have tried it with both Maven 3.5.2 and 3.6.0 and both are able to download the transitive dependecy. What version do you use? > Add twitter maven repo to POM for hadoop-lzo dependency > --- > > Key: PARQUET-1556 > URL: https://issues.apache.org/jira/browse/PARQUET-1556 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.12.0 > > > Running {{mvn verify}} based on the instructions in the README results in > this error > {code:java} > Could not resolve dependencies for project > org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact > com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} > As a workaround, the local {{~/.m2/settings.xml}} file can be modified to > include the twitter maven repo: > {code:java} > > twitter > twitter > http://maven.twttr.com > > {code} > After adding this, {{mvn verify}} works. The proper solution, however, is to > include this repo in the POM files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency
[ https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1556: --- Description: Running {{mvn verify}} based on the instructions in the README results in this error {code:java} Could not resolve dependencies for project org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} As a workaround, the local {{~/.m2/settings.xml}} file can be modified to include the twitter maven repo: {code:java} twitter twitter http://maven.twttr.com {code} After adding this, {{mvn verify}} works. The proper solution, however, is to include this repo in the POM files. was: Running mvn verify based on the instructions in the README results in this error {code:java} Could not resolve dependencies for project org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} As a workaround, the local ~/.m2/settings.xml file can be modified to include the twitter maven repo: {code:java} twitter twitter http://maven.twttr.com {code} After adding this, {{mvn verify}} works. The proper solution, however, is to include this repo in the POM files. > Add twitter maven repo to POM for hadoop-lzo dependency > --- > > Key: PARQUET-1556 > URL: https://issues.apache.org/jira/browse/PARQUET-1556 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.12.0 > > > Running {{mvn verify}} based on the instructions in the README results in > this error > {code:java} > Could not resolve dependencies for project > org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact > com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} > As a workaround, the local {{~/.m2/settings.xml}} file can be modified to > include the twitter maven repo: > {code:java} > > twitter > twitter > http://maven.twttr.com > > {code} > After adding this, {{mvn verify}} works. The proper solution, however, is to > include this repo in the POM files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency
[ https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808877#comment-16808877 ] Zoltan Ivanfi commented on PARQUET-1556: That's a very good point, thanks for raising it. We don't use Hadoop-LZO ourselves. Running {{mvn dependency:tree}} shows that this is a compile-time transitive dependecy: {code} [INFO] org.apache.parquet:parquet-thrift:jar:1.12.0-SNAPSHOT [INFO] +- com.twitter.elephantbird:elephant-bird-core:jar:4.4:compile [INFO] | \- com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16:compile {code} Before adding it to the POM we have to understand: * Why it can be downloaded for most people even without a corresponding repo entry. * Why it fails for others. * What it would mean to add the repo to the POM (would it lead to shipping a GPL dependency). * Can we avoid pulling this in all together? > Add twitter maven repo to POM for hadoop-lzo dependency > --- > > Key: PARQUET-1556 > URL: https://issues.apache.org/jira/browse/PARQUET-1556 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.12.0 > > > Running mvn verify based on the instructions in the README results in this > error > {code:java} > Could not resolve dependencies for project > org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact > com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} > As a workaround, the local ~/.m2/settings.xml file can be modified to include > the twitter maven repo: > {code:java} > > twitter > twitter > http://maven.twttr.com > > {code} > After adding this, {{mvn verify}} works. The proper solution, however, is to > include this repo in the POM files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency
[ https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1556: --- Description: Running mvn verify based on the instructions in the README results in this error {code:java} Could not resolve dependencies for project org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} As a workaround, the local ~/.m2/settings.xml file can be modified to include the twitter maven repo: {code:java} twitter twitter http://maven.twttr.com {code} After adding this, {{mvn verify}} works. The proper solution, however, is to include this repo in the POM files. was: Running mvn verify based on the instructions in the README results in this error {code:java} Could not resolve dependencies for project org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} To fix this, it was necessary to configure my local ~/.m2/settings.xml to include the twitter maven repo: {code:java} twitter twitter http://maven.twttr.com {code} After adding this, mvn verify worked. We should add these instructions to the README. > Add twitter maven repo to POM for hadoop-lzo dependency > --- > > Key: PARQUET-1556 > URL: https://issues.apache.org/jira/browse/PARQUET-1556 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.12.0 > > > Running mvn verify based on the instructions in the README results in this > error > {code:java} > Could not resolve dependencies for project > org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact > com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} > As a workaround, the local ~/.m2/settings.xml file can be modified to include > the twitter maven repo: > {code:java} > > twitter > twitter > http://maven.twttr.com > > {code} > After adding this, {{mvn verify}} works. The proper solution, however, is to > include this repo in the POM files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1556) Add twitter maven repo to POM for hadoop-lzo dependency
[ https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1556: --- Summary: Add twitter maven repo to POM for hadoop-lzo dependency (was: Instructions are missing for configuring twitter maven repo for hadoop-lzo dependency) > Add twitter maven repo to POM for hadoop-lzo dependency > --- > > Key: PARQUET-1556 > URL: https://issues.apache.org/jira/browse/PARQUET-1556 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.12.0 > > > Running mvn verify based on the instructions in the README results in this > error > {code:java} > Could not resolve dependencies for project > org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact > com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} > To fix this, it was necessary to configure my local ~/.m2/settings.xml to > include the twitter maven repo: > {code:java} > > twitter > twitter > http://maven.twttr.com > {code} > After adding this, mvn verify worked. > We should add these instructions to the README. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1556) Instructions are missing for configuring twitter maven repo for hadoop-lzo dependency
[ https://issues.apache.org/jira/browse/PARQUET-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808851#comment-16808851 ] Zoltan Ivanfi commented on PARQUET-1556: Now that is strange. If I issue this command: {code} mvn dependency:get -Dartifact=com.hadoop.gplcompression:hadoop-lzo:0.4.16 {code} I get the same error. But it is still able to download the artifact somehow when I run the following: {code} mvn -Dmaven.repo.local=/tmp/fresh-clean-empty-local-repo clean install -DskipTests | grep hadoop-lzo Downloading from jitpack.io: https://jitpack.io/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.pom Downloading from central: https://repo.maven.apache.org/maven2/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.pom Downloading from twitter: http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.pom Downloaded from twitter: http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.pom (1.3 kB at 1.3 kB/s) Downloading from jitpack.io: https://jitpack.io/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar Downloading from central: https://repo.maven.apache.org/maven2/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar Downloading from twitter: http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar Downloaded from twitter: http://maven.twttr.com/com/hadoop/gplcompression/hadoop-lzo/0.4.16/hadoop-lzo-0.4.16.jar (63 kB at 86 kB/s) {code} > Instructions are missing for configuring twitter maven repo for hadoop-lzo > dependency > - > > Key: PARQUET-1556 > URL: https://issues.apache.org/jira/browse/PARQUET-1556 > Project: Parquet > Issue Type: Improvement >Affects Versions: 1.11.0 >Reporter: Andy Grove >Assignee: Andy Grove >Priority: Major > Fix For: 1.12.0 > > > Running mvn verify based on the instructions in the README results in this > error > {code:java} > Could not resolve dependencies for project > org.apache.parquet:parquet-thrift:jar:1.11.0: Could not find artifact > com.hadoop.gplcompression:hadoop-lzo:jar:0.4.16{code} > To fix this, it was necessary to configure my local ~/.m2/settings.xml to > include the twitter maven repo: > {code:java} > > twitter > twitter > http://maven.twttr.com > {code} > After adding this, mvn verify worked. > We should add these instructions to the README. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (PARQUET-1496) [Java] Update Scala for JDK 11 compatibility
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803012#comment-16803012 ] Zoltan Ivanfi edited comment on PARQUET-1496 at 3/27/19 4:55 PM: - According to [https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html,] Scala 2.10.7 also supports JDK 11, which may provide a resolution for this problem. Update: No, it doesn't. Parquet-scrooge doesn't compile with Scala 2.10.7 either. was (Author: zi): According to [https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html,] Scala 2.10.7 also supports JDK 11, which may provide a resolution for this problem. > [Java] Update Scala for JDK 11 compatibility > > > Key: PARQUET-1496 > URL: https://issues.apache.org/jira/browse/PARQUET-1496 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, > the build fails for me in {{parquet-scala}} with: > {code:java} > [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 > --- > [INFO] Checking for multiple versions of scala > [INFO] includes = [**/*.java,**/*.scala,] > [INFO] excludes = [] > [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: > info: compiling > [INFO] Compiling 1 source files to > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at > 1547922718010 > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class) > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class) > [ERROR] error: scala.reflect.internal.MissingRequirementError: object > java.lang.Object in compiler mirror not found. > [ERROR] at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) > [ERROR] at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass(Definitions.scala:407) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1154) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261) > [INFO] at scala.tools.nsc.Global$Run.(Global.scala:1290) > [INFO] at scala.tools.nsc.Driver.doCompile(Driver.scala:32) > [INFO] at scala.tools.nsc.Main$.doCompile(Main.scala:79) > [INFO] at scala.tools.nsc.Driver.process(Driver.scala:54) > [INFO] at scala.tools.nsc.Driver.main(Driver.scala:67) > [INFO] at scala.tools.nsc.Main.main(Main.scala) > [INFO] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [INFO] at >
[jira] [Updated] (PARQUET-1496) [Java] Update Scala for JDK 11 compatibility
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1496: --- Summary: [Java] Update Scala for JDK 11 compatibility (was: [Java] Update Scala to 2.12) > [Java] Update Scala for JDK 11 compatibility > > > Key: PARQUET-1496 > URL: https://issues.apache.org/jira/browse/PARQUET-1496 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, > the build fails for me in {{parquet-scala}} with: > {code:java} > [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 > --- > [INFO] Checking for multiple versions of scala > [INFO] includes = [**/*.java,**/*.scala,] > [INFO] excludes = [] > [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: > info: compiling > [INFO] Compiling 1 source files to > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at > 1547922718010 > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class) > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class) > [ERROR] error: scala.reflect.internal.MissingRequirementError: object > java.lang.Object in compiler mirror not found. > [ERROR] at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) > [ERROR] at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass(Definitions.scala:407) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1154) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261) > [INFO] at scala.tools.nsc.Global$Run.(Global.scala:1290) > [INFO] at scala.tools.nsc.Driver.doCompile(Driver.scala:32) > [INFO] at scala.tools.nsc.Main$.doCompile(Main.scala:79) > [INFO] at scala.tools.nsc.Driver.process(Driver.scala:54) > [INFO] at scala.tools.nsc.Driver.main(Driver.scala:67) > [INFO] at scala.tools.nsc.Main.main(Main.scala) > [INFO] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [INFO] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [INFO] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [INFO] at java.base/java.lang.reflect.Method.invoke(Method.java:564) > [INFO] at > org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161) > [INFO] at > org_scala_tools_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26){code} > This
[jira] [Commented] (PARQUET-1496) [Java] Update Scala to 2.12
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803012#comment-16803012 ] Zoltan Ivanfi commented on PARQUET-1496: According to [https://docs.scala-lang.org/overviews/jdk-compatibility/overview.html,] Scala 2.10.7 also supports JDK 11, which may provide a resolution for this problem. > [Java] Update Scala to 2.12 > --- > > Key: PARQUET-1496 > URL: https://issues.apache.org/jira/browse/PARQUET-1496 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, > the build fails for me in {{parquet-scala}} with: > {code:java} > [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 > --- > [INFO] Checking for multiple versions of scala > [INFO] includes = [**/*.java,**/*.scala,] > [INFO] excludes = [] > [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: > info: compiling > [INFO] Compiling 1 source files to > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at > 1547922718010 > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class) > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class) > [ERROR] error: scala.reflect.internal.MissingRequirementError: object > java.lang.Object in compiler mirror not found. > [ERROR] at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) > [ERROR] at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass(Definitions.scala:407) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1154) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261) > [INFO] at scala.tools.nsc.Global$Run.(Global.scala:1290) > [INFO] at scala.tools.nsc.Driver.doCompile(Driver.scala:32) > [INFO] at scala.tools.nsc.Main$.doCompile(Main.scala:79) > [INFO] at scala.tools.nsc.Driver.process(Driver.scala:54) > [INFO] at scala.tools.nsc.Driver.main(Driver.scala:67) > [INFO] at scala.tools.nsc.Main.main(Main.scala) > [INFO] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [INFO] at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [INFO] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [INFO] at java.base/java.lang.reflect.Method.invoke(Method.java:564) > [INFO] at > org_scala_tools_maven_executions.MainHelper.runMain(MainHelper.java:161) > [INFO] at >
[jira] [Resolved] (PARQUET-1497) [Java] javax annotations dependency missing for Java 11
[ https://issues.apache.org/jira/browse/PARQUET-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1497. Resolution: Fixed Fix Version/s: 1.11.0 > [Java] javax annotations dependency missing for Java 11 > --- > > Key: PARQUET-1497 > URL: https://issues.apache.org/jira/browse/PARQUET-1497 > Project: Parquet > Issue Type: Bug > Components: parquet-thrift >Affects Versions: 1.10.0 >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > When trying to build with OpenJDK 11, I get errors due to the Generated > annotation not being resolved: > {code:java} > [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ > parquet-format-structures --- > [INFO] Changes detected - recompiling the module! > [INFO] Compiling 51 source files to > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/classes > [INFO] - > [WARNING] COMPILATION WARNING : > [INFO] - > [WARNING] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java: > > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java > uses or overrides a deprecated API. > [WARNING] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java: > Recompile with -Xlint:deprecation for details. > [INFO] 2 warnings > [INFO] - > [INFO] - > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/NanoSeconds.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/NanoSeconds.java:[37,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/StringType.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/StringType.java:[40,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/DataPageHeaderV2.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/DataPageHeaderV2.java:[43,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/Statistics.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/Statistics.java:[41,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/SortingColumn.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/SortingColumn.java:[40,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimestampType.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimestampType.java:[42,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimeUnit.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/MilliSeconds.java:[32,24] > package javax.annotation does not exist >
[jira] [Comment Edited] (PARQUET-1496) [Java] Update Scala to 2.12
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801910#comment-16801910 ] Zoltan Ivanfi edited comment on PARQUET-1496 at 3/26/19 4:35 PM: - There seems to be an unresolvable circular incompatibility issue here: * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 19. * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons: ** It doesn't handle empty structs correctly. For further experimentation, this can be hacked around by changing each {noformat} struct whatever {} {noformat} to {noformat} struct whatever {32767: optional i32 dummy;} {noformat} ** It doesn't handle the {{String}} logical type correctly, because in the code it generates it does not use fully qualified names. Since the name of this logical type shadows the stock String type, this leads to a compilation failure in the generated {{LogicalType.scala}} file. was (Author: zi): There seems to be an unresolvable circular incompatibility issue here: * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 19. * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons: ** It doesn't handle empty structs correctly. For further experimentation, this can be hacked around by changing each {noformat} struct whatever {} {noformat} to {noformat} struct whatever {32767: optional i32 dummy;} {noformat} ** It doesn't handle the {{String}} logical type correctly, because in the code it generates it does not use fully qualified names. Since the name of this logical type shadows the stock String type, this leads to a compilation failure in the generated {{LogicalType.scala}} file. > [Java] Update Scala to 2.12 > --- > > Key: PARQUET-1496 > URL: https://issues.apache.org/jira/browse/PARQUET-1496 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, > the build fails for me in {{parquet-scala}} with: > {code:java} > [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 > --- > [INFO] Checking for multiple versions of scala > [INFO] includes = [**/*.java,**/*.scala,] > [INFO] excludes = [] > [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: > info: compiling > [INFO] Compiling 1 source files to > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at > 1547922718010 > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class) > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class) > [ERROR] error: scala.reflect.internal.MissingRequirementError: object > java.lang.Object in compiler mirror not found. > [ERROR] at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) > [ERROR] at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120) > [INFO] at >
[jira] [Commented] (PARQUET-1496) [Java] Update Scala to 2.12
[ https://issues.apache.org/jira/browse/PARQUET-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16801910#comment-16801910 ] Zoltan Ivanfi commented on PARQUET-1496: There seems to be an unresolvable circular incompatibility issue here: * Java 11 is incompatible with Scala 2.10, needs newer version, like Scala 2.12 * Scala 2.12 is incompatible with Scrooge 4, needs newer version, like Scrooge 19. * Scrooge 19 is incompatible with our {{parquet.thrift}} file for two reasons: ** It doesn't handle empty structs correctly. For further experimentation, this can be hacked around by changing each {{struct whatever {}}} to {{struct whatever {32767: optional i32 dummy;}}} ** It doesn't handle the {{String}} logical type correctly, because in the code it generates it does use fully qualified names. Since the name of this logical type shadows the stock String type, this leads to a compilation failure in the generated {{LogicalType.scala}} file. > [Java] Update Scala to 2.12 > --- > > Key: PARQUET-1496 > URL: https://issues.apache.org/jira/browse/PARQUET-1496 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > When trying to build the parquet-mr code on OSX Mojave with OpenJDK 10 and 9, > the build fails for me in {{parquet-scala}} with: > {code:java} > [INFO] --- maven-scala-plugin:2.15.2:compile (default) @ parquet-scala_2.10 > --- > [INFO] Checking for multiple versions of scala > [INFO] includes = [**/*.java,**/*.scala,] > [INFO] excludes = [] > [INFO] /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/src/main/scala:-1: > info: compiling > [INFO] Compiling 1 source files to > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-scala/target/classes at > 1547922718010 > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/package.class) > [ERROR] error: error while loading package, Missing dependency 'object > java.lang.Object in compiler mirror', required by > /Users/uwe/.m2/repository/org/scala-lang/scala-library/2.10.6/scala-library-2.10.6.jar(scala/runtime/package.class) > [ERROR] error: scala.reflect.internal.MissingRequirementError: object > java.lang.Object in compiler mirror not found. > [ERROR] at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16) > [ERROR] at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getClassByName(Mirrors.scala:99) > [INFO] at > scala.reflect.internal.Mirrors$RootsBase.getRequiredClass(Mirrors.scala:102) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass$lzycompute(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.ObjectClass(Definitions.scala:264) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass$lzycompute(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.AnyRefClass(Definitions.scala:263) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.specialPolyClass(Definitions.scala:1120) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass$lzycompute(Definitions.scala:407) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.RepeatedParamClass(Definitions.scala:407) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses$lzycompute(Definitions.scala:1154) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.syntheticCoreClasses(Definitions.scala:1152) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode$lzycompute(Definitions.scala:1196) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.symbolsNotPresentInBytecode(Definitions.scala:1196) > [INFO] at > scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1261) > [INFO] at scala.tools.nsc.Global$Run.(Global.scala:1290) > [INFO] at scala.tools.nsc.Driver.doCompile(Driver.scala:32) > [INFO] at scala.tools.nsc.Main$.doCompile(Main.scala:79) > [INFO] at scala.tools.nsc.Driver.process(Driver.scala:54)
[jira] [Updated] (PARQUET-1497) [Java] javax annotations dependency missing for Java 11
[ https://issues.apache.org/jira/browse/PARQUET-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1497: --- Summary: [Java] javax annotations dependency missing for Java 11 (was: [Java] Building on OSX fails with OpenJDK 11) > [Java] javax annotations dependency missing for Java 11 > --- > > Key: PARQUET-1497 > URL: https://issues.apache.org/jira/browse/PARQUET-1497 > Project: Parquet > Issue Type: Bug > Components: parquet-thrift >Affects Versions: 1.10.0 >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > > When trying to build with OpenJDK 11, I get errors due to the Generated > annotation not being resolved: > {code:java} > [INFO] --- maven-compiler-plugin:3.1:compile (default-compile) @ > parquet-format-structures --- > [INFO] Changes detected - recompiling the module! > [INFO] Compiling 51 source files to > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/classes > [INFO] - > [WARNING] COMPILATION WARNING : > [INFO] - > [WARNING] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java: > > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java > uses or overrides a deprecated API. > [WARNING] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/src/main/java/org/apache/parquet/format/event/Consumers.java: > Recompile with -Xlint:deprecation for details. > [INFO] 2 warnings > [INFO] - > [INFO] - > [ERROR] COMPILATION ERROR : > [INFO] - > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/NanoSeconds.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/NanoSeconds.java:[37,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/StringType.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/StringType.java:[40,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/DataPageHeaderV2.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/DataPageHeaderV2.java:[43,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/Statistics.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/Statistics.java:[41,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/SortingColumn.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/SortingColumn.java:[40,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimestampType.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimestampType.java:[42,2] > cannot find symbol > symbol: class Generated > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/TimeUnit.java:[32,24] > package javax.annotation does not exist > [ERROR] > /Users/uwe/tmp/apache-parquet-1.11.0/parquet-format-structures/target/generated-sources/thrift/org/apache/parquet/format/MilliSeconds.java:[32,24] >
[jira] [Created] (PARQUET-1551) Support Java 11 - top-level JIRA
Zoltan Ivanfi created PARQUET-1551: -- Summary: Support Java 11 - top-level JIRA Key: PARQUET-1551 URL: https://issues.apache.org/jira/browse/PARQUET-1551 Project: Parquet Issue Type: Task Reporter: Zoltan Ivanfi This JIRA groups all other JIRA-s related to Java 11. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1550) CleanUtil does not work in Java 11
[ https://issues.apache.org/jira/browse/PARQUET-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1550: --- Issue Type: Bug (was: Task) > CleanUtil does not work in Java 11 > -- > > Key: PARQUET-1550 > URL: https://issues.apache.org/jira/browse/PARQUET-1550 > Project: Parquet > Issue Type: Bug >Reporter: Zoltan Ivanfi >Priority: Major > > I'm trying to run the tests with Java 11 using the {{mvn clean install}} > command. After various dependency updates and some workarounds, the tests are > green, but the output is littered with warnings about swallowed > IllegalAccessExceptions caused by CleanUtil. One example of many indentical > ones: > {code} > 2019-03-26 15:07:34 WARN CleanUtil - Clean failed for buffer DirectByteBuffer > java.lang.IllegalAccessException: class > org.apache.parquet.hadoop.codec.CleanUtil cannot access class > jdk.internal.ref.Cleaner (in module java.base) because module java.base does > not export jdk.internal.ref to unnamed module @413f69cc > at > java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:361) > at > java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:591) > at java.base/java.lang.reflect.Method.invoke(Method.java:558) > at org.apache.parquet.hadoop.codec.CleanUtil.clean(CleanUtil.java:64) > at > org.apache.parquet.hadoop.codec.SnappyDecompressor.setInput(SnappyDecompressor.java:109) > at > org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:46) > at java.base/java.io.DataInputStream.readFully(DataInputStream.java:200) > at java.base/java.io.DataInputStream.readFully(DataInputStream.java:170) > at > org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:279) > at > org.apache.parquet.hadoop.TestDirectCodecFactory.test(TestDirectCodecFactory.java:114) > at > org.apache.parquet.hadoop.TestDirectCodecFactory.compressionCodecs(TestDirectCodecFactory.java:168) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) > at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) > at org.junit.runners.ParentRunner.run(ParentRunner.java:363) > at > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) > at > org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) > at > org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175) > at > org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) > {code} -- This message was sent by Atlassian JIRA
[jira] [Created] (PARQUET-1550) CleanUtil does not work in Java 11
Zoltan Ivanfi created PARQUET-1550: -- Summary: CleanUtil does not work in Java 11 Key: PARQUET-1550 URL: https://issues.apache.org/jira/browse/PARQUET-1550 Project: Parquet Issue Type: Task Reporter: Zoltan Ivanfi I'm trying to run the tests with Java 11 using the {{mvn clean install}} command. After various dependency updates and some workarounds, the tests are green, but the output is littered with warnings about swallowed IllegalAccessExceptions caused by CleanUtil. One example of many indentical ones: {code} 2019-03-26 15:07:34 WARN CleanUtil - Clean failed for buffer DirectByteBuffer java.lang.IllegalAccessException: class org.apache.parquet.hadoop.codec.CleanUtil cannot access class jdk.internal.ref.Cleaner (in module java.base) because module java.base does not export jdk.internal.ref to unnamed module @413f69cc at java.base/jdk.internal.reflect.Reflection.newIllegalAccessException(Reflection.java:361) at java.base/java.lang.reflect.AccessibleObject.checkAccess(AccessibleObject.java:591) at java.base/java.lang.reflect.Method.invoke(Method.java:558) at org.apache.parquet.hadoop.codec.CleanUtil.clean(CleanUtil.java:64) at org.apache.parquet.hadoop.codec.SnappyDecompressor.setInput(SnappyDecompressor.java:109) at org.apache.parquet.hadoop.codec.NonBlockedDecompressorStream.read(NonBlockedDecompressorStream.java:46) at java.base/java.io.DataInputStream.readFully(DataInputStream.java:200) at java.base/java.io.DataInputStream.readFully(DataInputStream.java:170) at org.apache.parquet.bytes.BytesInput$StreamBytesInput.toByteArray(BytesInput.java:279) at org.apache.parquet.hadoop.TestDirectCodecFactory.test(TestDirectCodecFactory.java:114) at org.apache.parquet.hadoop.TestDirectCodecFactory.compressionCodecs(TestDirectCodecFactory.java:168) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) at org.junit.runners.ParentRunner.run(ParentRunner.java:363) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175) at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1543) Execute the TIMESTAMP types roadmap
[ https://issues.apache.org/jira/browse/PARQUET-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1543. Resolution: Not A Problem Accidentally opened JIRA for the wrong project. > Execute the TIMESTAMP types roadmap > --- > > Key: PARQUET-1543 > URL: https://issues.apache.org/jira/browse/PARQUET-1543 > Project: Parquet > Issue Type: Task >Reporter: Zoltan Ivanfi >Priority: Major > > This is the top-level JIRA for tracking the addition and/or alteration of > different TIMESTAMP types in order to eventually reach the desired state as > specified in [the design doc for TIMESTAMP > types|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1543) Execute the TIMESTAMP types roadmap
Zoltan Ivanfi created PARQUET-1543: -- Summary: Execute the TIMESTAMP types roadmap Key: PARQUET-1543 URL: https://issues.apache.org/jira/browse/PARQUET-1543 Project: Parquet Issue Type: Task Reporter: Zoltan Ivanfi This is the top-level JIRA for tracking the addition and/or alteration of different TIMESTAMP types in order to eventually reach the desired state as specified in [the design doc for TIMESTAMP types|https://cwiki.apache.org/confluence/display/Hive/Different+TIMESTAMP+types]. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1533) TestSnappy() throws OOM exception with Parquet-1485 change
[ https://issues.apache.org/jira/browse/PARQUET-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1533: --- Description: Parquet-1485 initialize the buffer size(inputBuffer and outputBuffer) from 0 to 128M in total. This cause the unit test TestSnappy() failed with OOM exception. This is on my Mac laptop. To solve the unit test failure, we can increase the size of -Xmx from 512m to 1024m like below. However, we need to evaluate whether or not the increase of the initial size of direct memory usage for inputBuffer and outputBuffer will cause real Parquet application OOM or not, if that application is not with big enough -Xmx size. org.apache.maven.plugins maven-surefire-plugin ... -Xmx1014m ... For details of the exception, the commit page ([https://github.com/apache/parquet-mr/commit/7dcdcdcf0eb5e91618c443d4a84973bf7883d79b]) has the detail. was: Parquet-1485 initialize the buffer size(inputBuffer and outputBuffer) from 0 to 128M in total. This cause the unit test TestSnappy() failed with OOM exception. This is on my Mac laptop. To solve the unit test failure, we can increase the size of -Xmx from 512m to 1024m like below. However, we need to evaluate whether or not the increase of the initial size of direct memory usage for inputBuffer and outputBuffer will cause real Parquet application OOM or not, if that application is not with big enough -Xmx size. org.apache.maven.plugins maven-surefire-plugin ... -Xmx1014m ... For details of the exception, the pull request(https://github.com/apache/parquet-mr/commit/7dcdcdcf0eb5e91618c443d4a84973bf7883d79b) has the detail. > TestSnappy() throws OOM exception with Parquet-1485 change > --- > > Key: PARQUET-1533 > URL: https://issues.apache.org/jira/browse/PARQUET-1533 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 > Environment: Mac OS 10.14.1 >Reporter: Xinli Shang >Priority: Minor > > Parquet-1485 initialize the buffer size(inputBuffer and outputBuffer) from 0 > to 128M in total. This cause the unit test TestSnappy() failed with OOM > exception. This is on my Mac laptop. > To solve the unit test failure, we can increase the size of -Xmx from 512m to > 1024m like below. However, we need to evaluate whether or not the increase of > the initial size of direct memory usage for inputBuffer and outputBuffer will > cause real Parquet application OOM or not, if that application is not with > big enough -Xmx size. > org.apache.maven.plugins > maven-surefire-plugin > ... > -Xmx1014m > ... > For details of the exception, the commit page > ([https://github.com/apache/parquet-mr/commit/7dcdcdcf0eb5e91618c443d4a84973bf7883d79b]) > has the detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1491) Conditional debug logging in InternalParquetRecordReader to reduce GC
[ https://issues.apache.org/jira/browse/PARQUET-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1491. Resolution: Not A Problem > Conditional debug logging in InternalParquetRecordReader to reduce GC > - > > Key: PARQUET-1491 > URL: https://issues.apache.org/jira/browse/PARQUET-1491 > Project: Parquet > Issue Type: Improvement > Components: parquet-avro >Reporter: Artavazd Balaian >Priority: Minor > Labels: pull-request-available > Attachments: image-2019-01-12-04-03-48-005.png, > image-2019-01-12-04-09-18-359.png, image-2019-01-12-04-10-49-230.png > > > Currently there is no check for the log level in > [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.javaL249|https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/InternalParquetRecordReader.java#L230;L249.] > which causes a lot of memory allocation and performance degradation. > Link to parquet file which was used: > [https://drive.google.com/open?id=1xCMZrUPWvlS4KOFO8m9EmtkvDy-SiRHq] > Screenshot of Java Mission Control comparison with fix and without (link to > the JFR files > [https://drive.google.com/open?id=1blSeF-AyAhQyRYaqVsihyzy7pJCJt7U3):] > !image-2019-01-12-04-03-48-005.png|width=956,height=538! > !image-2019-01-12-04-10-49-230.png|width=1403,height=760! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1490) Add branch-specific Travis steps
[ https://issues.apache.org/jira/browse/PARQUET-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1490: --- Description: The script for the main branch has to make sure that POM files in the master branch do not refer to SNAPSHOT versions. The possiblity of scripts for feature branches will allow building a SNAPSHOT version of parquet-format and depending on it in the POM files. was: The script for the main branch has to make sure that POM files in the master branch do not refer to SNAPSHOT versions. The script for feature branches will allow building a SNAPSHOT version of parquet-mr and depending on it in the POM files. > Add branch-specific Travis steps > > > Key: PARQUET-1490 > URL: https://issues.apache.org/jira/browse/PARQUET-1490 > Project: Parquet > Issue Type: Improvement >Reporter: Zoltan Ivanfi >Priority: Major > Labels: pull-request-available > > The script for the main branch has to make sure that POM files in the master > branch do not refer to SNAPSHOT versions. > The possiblity of scripts for feature branches will allow building a SNAPSHOT > version of parquet-format and depending on it in the POM files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1490) Add branch-specific Travis steps
Zoltan Ivanfi created PARQUET-1490: -- Summary: Add branch-specific Travis steps Key: PARQUET-1490 URL: https://issues.apache.org/jira/browse/PARQUET-1490 Project: Parquet Issue Type: Improvement Reporter: Zoltan Ivanfi The script for the main branch has to make sure that POM files in the master branch do not refer to SNAPSHOT versions. The script for feature branches will allow building a SNAPSHOT version of parquet-mr and depending on it in the POM files. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1461) Third party code does not compile after parquet-mr minor version update
[ https://issues.apache.org/jira/browse/PARQUET-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1461. Resolution: Fixed > Third party code does not compile after parquet-mr minor version update > --- > > Key: PARQUET-1461 > URL: https://issues.apache.org/jira/browse/PARQUET-1461 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Zoltan Ivanfi >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Third party code implemented public void initFromPage(int valueCount, > ByteBuffer page, int offset), but new version has public abstract > initFromPage(int valueCount, ByteBufferInputStream in) instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1487) Do not write original type for timezone-agnostic timestamps
Zoltan Ivanfi created PARQUET-1487: -- Summary: Do not write original type for timezone-agnostic timestamps Key: PARQUET-1487 URL: https://issues.apache.org/jira/browse/PARQUET-1487 Project: Parquet Issue Type: Improvement Components: parquet-mr Affects Versions: 1.11.0 Reporter: Zoltan Ivanfi Assignee: Nandor Kollar Fix For: 1.11.0 Historically, the TIMESTAMP_MILLIS and TIMESTAMP_MICROS original types used for the INT64 physical type were always UTC-normalized. The new TIMESTAMP logical type allows both UTC-normalized and timezone-agnostic timestamps and writes the legacy original types for compatibility reasons. However, the latter should only be written for UTC-normalized timestamps, because legacy readers are not prepared to handle timezone-agnostic timestamps correctly and the original type would just be misleading. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1478) Can't read spec compliant, 3-level lists via parquet-proto
[ https://issues.apache.org/jira/browse/PARQUET-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1478: --- Affects Version/s: 1.11.0 > Can't read spec compliant, 3-level lists via parquet-proto > -- > > Key: PARQUET-1478 > URL: https://issues.apache.org/jira/browse/PARQUET-1478 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Nandor Kollar >Priority: Major > Labels: pull-request-available > > I noticed that ProtoInputOutputFormatTest doesn't test the following case > properly: when lists are written using the spec compliant 3-level structure. > The test actually doesn't write 3-level list, because the passed > configuration is not used at all, a new one is created each time. See > attached PR. > When I fixed this test, it turned out that it is failing: now it writes the > correct 3-level structure, but looks like the read path is broken. Is it > indeed a bug, or I'm doing something wrong? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1478) Can't read spec compliant, 3-level lists via parquet-proto
[ https://issues.apache.org/jira/browse/PARQUET-1478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1478: --- Fix Version/s: 1.11.0 > Can't read spec compliant, 3-level lists via parquet-proto > -- > > Key: PARQUET-1478 > URL: https://issues.apache.org/jira/browse/PARQUET-1478 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Nandor Kollar >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > I noticed that ProtoInputOutputFormatTest doesn't test the following case > properly: when lists are written using the spec compliant 3-level structure. > The test actually doesn't write 3-level list, because the passed > configuration is not used at all, a new one is created each time. See > attached PR. > When I fixed this test, it turned out that it is failing: now it writes the > correct 3-level structure, but looks like the read path is broken. Is it > indeed a bug, or I'm doing something wrong? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1462) Allow specifying new development version in prepare-release.sh
[ https://issues.apache.org/jira/browse/PARQUET-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1462. Resolution: Fixed Fix Version/s: format-2.7.0 1.12.0 > Allow specifying new development version in prepare-release.sh > -- > > Key: PARQUET-1462 > URL: https://issues.apache.org/jira/browse/PARQUET-1462 > Project: Parquet > Issue Type: Improvement > Components: parquet-format, parquet-mr >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0, format-2.7.0 > > > Currently prepare-release.sh only takes the release version as a parameter, > the new development version is asked interactively for each individual > pom.xml file, which makes answering them tedious. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1462) Allow specifying new development version in prepare-release.sh
Zoltan Ivanfi created PARQUET-1462: -- Summary: Allow specifying new development version in prepare-release.sh Key: PARQUET-1462 URL: https://issues.apache.org/jira/browse/PARQUET-1462 Project: Parquet Issue Type: Improvement Components: parquet-format, parquet-mr Reporter: Zoltan Ivanfi Assignee: Zoltan Ivanfi Currently prepare-release.sh only takes the release version as a parameter, the new development version is asked interactively for each individual pom.xml file, which makes answering them tedious. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1461) Third party code does not compile after parquet-mr minor version update
Zoltan Ivanfi created PARQUET-1461: -- Summary: Third party code does not compile after parquet-mr minor version update Key: PARQUET-1461 URL: https://issues.apache.org/jira/browse/PARQUET-1461 Project: Parquet Issue Type: Bug Components: parquet-mr Affects Versions: 1.11.0 Reporter: Zoltan Ivanfi Assignee: Gabor Szadovszky Fix For: 1.11.0 Third party code implemented public void initFromPage(int valueCount, ByteBuffer page, int offset), but new version has public abstract initFromPage(int valueCount, ByteBufferInputStream in) instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PARQUET-1460) Fix javadoc errors and include javadoc checking in Travis checks
[ https://issues.apache.org/jira/browse/PARQUET-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1460: -- Assignee: Gabor Szadovszky (was: Zoltan Ivanfi) > Fix javadoc errors and include javadoc checking in Travis checks > > > Key: PARQUET-1460 > URL: https://issues.apache.org/jira/browse/PARQUET-1460 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Zoltan Ivanfi >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Javadoc generation fails with various errors, preventing us from running the > release script. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1460) Fix javadoc errors and include javadoc checking in Travis checks
[ https://issues.apache.org/jira/browse/PARQUET-1460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1460. Resolution: Fixed > Fix javadoc errors and include javadoc checking in Travis checks > > > Key: PARQUET-1460 > URL: https://issues.apache.org/jira/browse/PARQUET-1460 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Zoltan Ivanfi >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Javadoc generation fails with various errors, preventing us from running the > release script. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1460) Fix javadoc errors and include javadoc checking in Travis checks
Zoltan Ivanfi created PARQUET-1460: -- Summary: Fix javadoc errors and include javadoc checking in Travis checks Key: PARQUET-1460 URL: https://issues.apache.org/jira/browse/PARQUET-1460 Project: Parquet Issue Type: Bug Components: parquet-mr Affects Versions: 1.10.0 Reporter: Zoltan Ivanfi Assignee: Zoltan Ivanfi Fix For: 1.11.0 Javadoc generation fails with various errors, preventing us from running the release script. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1388) Nanosecond precision time and timestamp - parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1388: --- Issue Type: New Feature (was: Improvement) > Nanosecond precision time and timestamp - parquet-mr > > > Key: PARQUET-1388 > URL: https://issues.apache.org/jira/browse/PARQUET-1388 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Reporter: Nandor Kollar >Assignee: Nandor Kollar >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1253) Support for new logical type representation
[ https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1253: --- Issue Type: New Feature (was: Improvement) > Support for new logical type representation > --- > > Key: PARQUET-1253 > URL: https://issues.apache.org/jira/browse/PARQUET-1253 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Reporter: Nandor Kollar >Assignee: Nandor Kollar >Priority: Major > Fix For: 1.11.0 > > > Latest parquet-format > [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252] > a new representation for logical types. As of now this is not yet supported > in parquet-mr, thus there's no way to use parametrized UTC normalized > timestamp data types. When reading and writing Parquet files, besides > 'converted_type' parquet-mr should use the new 'logicalType' field in > SchemaElement to tell the current logical type annotation. To maintain > backward compatibility, the semantic of converted_type shouldn't change. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1365) Don't write page level statistics
[ https://issues.apache.org/jira/browse/PARQUET-1365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1365. Resolution: Fixed > Don't write page level statistics > - > > Key: PARQUET-1365 > URL: https://issues.apache.org/jira/browse/PARQUET-1365 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Page level statistics are never used in production and after adding column > indexes they are completely useless. Fortunately, statistics are optional in > both the v1 and v2 pages therefore, we can safely stop writing them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1451) Deprecate old logical types API
[ https://issues.apache.org/jira/browse/PARQUET-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1451. Resolution: Duplicate JIRA was not responding, accidentally created the issue twice. > Deprecate old logical types API > --- > > Key: PARQUET-1451 > URL: https://issues.apache.org/jira/browse/PARQUET-1451 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Zoltan Ivanfi >Assignee: Nandor Kollar >Priority: Major > Fix For: 1.11.0 > > > Now that the new logical types API is ready, we should deprecate the old one > because new types will not support it (in fact, nano precision has already > been added without support in the old API). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1452) Deprecate old logical types API
Zoltan Ivanfi created PARQUET-1452: -- Summary: Deprecate old logical types API Key: PARQUET-1452 URL: https://issues.apache.org/jira/browse/PARQUET-1452 Project: Parquet Issue Type: Task Components: parquet-mr Reporter: Zoltan Ivanfi Assignee: Nandor Kollar Fix For: 1.11.0 Now that the new logical types API is ready, we should deprecate the old one because new types will not support it (in fact, nano precision has already been added without support in the old API). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1451) Deprecate old logical types API
Zoltan Ivanfi created PARQUET-1451: -- Summary: Deprecate old logical types API Key: PARQUET-1451 URL: https://issues.apache.org/jira/browse/PARQUET-1451 Project: Parquet Issue Type: Bug Components: parquet-mr Reporter: Zoltan Ivanfi Assignee: Nandor Kollar Fix For: 1.11.0 Now that the new logical types API is ready, we should deprecate the old one because new types will not support it (in fact, nano precision has already been added without support in the old API). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PARQUET-1440) Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale
[ https://issues.apache.org/jira/browse/PARQUET-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1440: -- Assignee: Ryan Gardner > Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file > aren't displayed with their proper scale > -- > > Key: PARQUET-1440 > URL: https://issues.apache.org/jira/browse/PARQUET-1440 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Ryan Gardner >Assignee: Ryan Gardner >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > When working with the parquet-tools, I noticed that decimal values that were > stored with int32 or int64 were not being displayed properly. > I opened up a pull request to fix this: > https://github.com/apache/parquet-mr/pull/530#issuecomment-428137066 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1440) Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale
[ https://issues.apache.org/jira/browse/PARQUET-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1440. Resolution: Fixed > Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file > aren't displayed with their proper scale > -- > > Key: PARQUET-1440 > URL: https://issues.apache.org/jira/browse/PARQUET-1440 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Ryan Gardner >Assignee: Ryan Gardner >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > When working with the parquet-tools, I noticed that decimal values that were > stored with int32 or int64 were not being displayed properly. > I opened up a pull request to fix this: > https://github.com/apache/parquet-mr/pull/530#issuecomment-428137066 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1440) Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale
[ https://issues.apache.org/jira/browse/PARQUET-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1440: --- Fix Version/s: 1.11.0 > Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file > aren't displayed with their proper scale > -- > > Key: PARQUET-1440 > URL: https://issues.apache.org/jira/browse/PARQUET-1440 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Ryan Gardner >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > When working with the parquet-tools, I noticed that decimal values that were > stored with int32 or int64 were not being displayed properly. > I opened up a pull request to fix this: > https://github.com/apache/parquet-mr/pull/530#issuecomment-428137066 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1440) Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale
[ https://issues.apache.org/jira/browse/PARQUET-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1440: --- Summary: Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale (was: Decimal values stored in an int32 or int64 in the parquet file aren't displayed with their proper scale) > Parquet-tools: Decimal values stored in an int32 or int64 in the parquet file > aren't displayed with their proper scale > -- > > Key: PARQUET-1440 > URL: https://issues.apache.org/jira/browse/PARQUET-1440 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Ryan Gardner >Priority: Major > Labels: pull-request-available > > When working with the parquet-tools, I noticed that decimal values that were > stored with int32 or int64 were not being displayed properly. > I opened up a pull request to fix this: > https://github.com/apache/parquet-mr/pull/530#issuecomment-428137066 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1437) Misleading comment in parquet.thrift
[ https://issues.apache.org/jira/browse/PARQUET-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1437: --- Fix Version/s: 2.7.0 > Misleading comment in parquet.thrift > > > Key: PARQUET-1437 > URL: https://issues.apache.org/jira/browse/PARQUET-1437 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Zoltan Ivanfi >Priority: Major > Fix For: format-2.7.0 > > > The documentation for {{list column_orders}} states that "Each > sort order corresponds to one column, determined by its position in the list, > matching the position of the column in the schema." > However, in reality, while the order of elements in these two lists (schema > and sort order) are the same, only leaf nodes are represented in the list of > sort orders, so the positions do *not* match. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1437) Misleading comment in parquet.thrift
[ https://issues.apache.org/jira/browse/PARQUET-1437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1437: --- Fix Version/s: (was: 2.7.0) format-2.7.0 > Misleading comment in parquet.thrift > > > Key: PARQUET-1437 > URL: https://issues.apache.org/jira/browse/PARQUET-1437 > Project: Parquet > Issue Type: Bug > Components: parquet-format >Reporter: Zoltan Ivanfi >Priority: Major > Fix For: format-2.7.0 > > > The documentation for {{list column_orders}} states that "Each > sort order corresponds to one column, determined by its position in the list, > matching the position of the column in the schema." > However, in reality, while the order of elements in these two lists (schema > and sort order) are the same, only leaf nodes are represented in the list of > sort orders, so the positions do *not* match. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1437) Misleading comment in parquet.thrift
Zoltan Ivanfi created PARQUET-1437: -- Summary: Misleading comment in parquet.thrift Key: PARQUET-1437 URL: https://issues.apache.org/jira/browse/PARQUET-1437 Project: Parquet Issue Type: Bug Components: parquet-format Reporter: Zoltan Ivanfi The documentation for {{list column_orders}} states that "Each sort order corresponds to one column, determined by its position in the list, matching the position of the column in the schema." However, in reality, while the order of elements in these two lists (schema and sort order) are the same, only leaf nodes are represented in the list of sort orders, so the positions do *not* match. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1436) TimestampMicrosStringifier shows wrong microseconds for timestamps before 1970
Zoltan Ivanfi created PARQUET-1436: -- Summary: TimestampMicrosStringifier shows wrong microseconds for timestamps before 1970 Key: PARQUET-1436 URL: https://issues.apache.org/jira/browse/PARQUET-1436 Project: Parquet Issue Type: Task Components: parquet-mr Reporter: Zoltan Ivanfi Fix For: 1.11.0 testTimestampMicrosStringifier takes the timestamp 1848-03-15T09:23:59.765 and subtracts 1 microseconds from it. The result (both expected and actual) is 1848-03-15T09:23:59.765001, but it should be 1848-03-15T09:23:59.764999 instead. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PARQUET-1381) Add merge blocks command to parquet-tools
[ https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1381: -- Assignee: Ekaterina Galieva > Add merge blocks command to parquet-tools > - > > Key: PARQUET-1381 > URL: https://issues.apache.org/jira/browse/PARQUET-1381 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Ekaterina Galieva >Assignee: Ekaterina Galieva >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Current implementation of merge command in parquet-tools doesn't merge row > groups, just places one after the other. Add API and command option to be > able to merge small blocks into larger ones up to specified size limit. > h6. Implementation details: > Blocks are not reordered not to break possible initial predicate pushdown > optimizations. > Blocks are not divided to fit upper bound perfectly. > This is an intentional performance optimization. > This gives an opportunity to form new blocks by coping full content of > smaller blocks by column, not by row. > h6. Examples: > # Input files with blocks sizes: > {code:java} > [128 | 35], [128 | 40], [120]{code} > Expected output file blocks sizes: > {{merge }} > {code:java} > [128 | 35 | 128 | 40 | 120] > {code} > {{merge -b}} > {code:java} > [128 | 35 | 128 | 40 | 120] > {code} > {{merge -b -l 256 }} > {code:java} > [163 | 168 | 120] > {code} > # Input files with blocks sizes: > {code:java} > [128 | 35], [40], [120], [6] {code} > Expected output file blocks sizes: > {{merge}} > {code:java} > [128 | 35 | 40 | 120 | 6] > {code} > {{merge -b}} > {code:java} > [128 | 75 | 126] > {code} > {{merge -b -l 256}} > {code:java} > [203 | 126]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PARQUET-1368) ParquetFileReader should close its input stream for the failure in constructor
[ https://issues.apache.org/jira/browse/PARQUET-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1368: -- Assignee: Hyukjin Kwon > ParquetFileReader should close its input stream for the failure in constructor > -- > > Key: PARQUET-1368 > URL: https://issues.apache.org/jira/browse/PARQUET-1368 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > I was trying to replace deprecated usage {{readFooter}} to > {{ParquetFileReader.open}} according to the node: > {code} > [warn] > /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:368: > method readFooter in object ParquetFileReader is deprecated: see > corresponding Javadoc for more information. > [warn] ParquetFileReader.readFooter(sharedConf, filePath, > SKIP_ROW_GROUPS).getFileMetaData > [warn] ^ > [warn] > /home/jenkins/workspace/spark-master-test-maven-hadoop-2.7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala:545: > method readFooter in object ParquetFileReader is deprecated: see > corresponding Javadoc for more information. > [warn] ParquetFileReader.readFooter( > [warn] ^ > {code} > Then, I realised some test suites reports resource leak: > {code} > java.lang.Throwable > at > org.apache.spark.DebugFilesystem$.addOpenStream(DebugFilesystem.scala:36) > at org.apache.spark.DebugFilesystem.open(DebugFilesystem.scala:70) > at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766) > at > org.apache.parquet.hadoop.util.HadoopInputFile.newStream(HadoopInputFile.java:65) > at > org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:687) > at > org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:595) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$.createParquetReader(ParquetUtils.scala:67) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetUtils$.readFooter(ParquetUtils.scala:46) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:544) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$readParquetFootersInParallel$1.apply(ParquetFileFormat.scala:539) > at > scala.collection.parallel.AugmentedIterableIterator$class.flatmap2combiner(RemainsIterator.scala:132) > at > scala.collection.parallel.immutable.ParVector$ParVectorIterator.flatmap2combiner(ParVector.scala:62) > at > scala.collection.parallel.ParIterableLike$FlatMap.leaf(ParIterableLike.scala:1072) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply$mcV$sp(Tasks.scala:49) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48) > at > scala.collection.parallel.Task$$anonfun$tryLeaf$1.apply(Tasks.scala:48) > at scala.collection.parallel.Task$class.tryLeaf(Tasks.scala:51) > at > scala.collection.parallel.ParIterableLike$FlatMap.tryLeaf(ParIterableLike.scala:1068) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.internal(Tasks.scala:159) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.internal(Tasks.scala:443) > at > scala.collection.parallel.AdaptiveWorkStealingTasks$WrappedTask$class.compute(Tasks.scala:149) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.compute(Tasks.scala:443) > at > scala.concurrent.forkjoin.RecursiveAction.exec(RecursiveAction.java:160) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at scala.concurrent.forkjoin.ForkJoinTask.doJoin(ForkJoinTask.java:341) > at scala.concurrent.forkjoin.ForkJoinTask.join(ForkJoinTask.java:673) > at > scala.collection.parallel.ForkJoinTasks$WrappedTask$class.sync(Tasks.scala:378) > at > scala.collection.parallel.AdaptiveWorkStealingForkJoinTasks$WrappedTask.sync(Tasks.scala:443) > at > scala.collection.parallel.ForkJoinTasks$class.executeAndWaitResult(Tasks.scala:426) > at > scala.collection.parallel.ForkJoinTaskSupport.executeAndWaitResult(TaskSupport.scala:56) > at > scala.collection.parallel.ParIterableLike$ResultMapping.leaf(ParIterableLike.scala:958) > at >
[jira] [Updated] (PARQUET-1337) Current block alignment logic may lead to several row groups per block
[ https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1337: --- Description: When the size of buffered data gets near the desired row group size, Parquet flushes the data to a row group. However, at this point the data for the last page is not yet encoded nor compressed, thereby the row group may end up being significantly smaller than it was intended. If the row group ends up being so small that it is farther away from the next disk block boundary than the maximum padding, Parquet will try to create a new group in the same disk block, this time targeting the remaning space. This may also be flushed prematurely, leading to the creation of an even smaller row group, which may lead to an even smaller one... This gets repeated until we get sufficiently close to the block boundary so that padding can be finally applied. The resulting superflous row groups can lead to bad read performance. An example of the structure of a Parquet file suffering from this problem can be seen below. For easier interpretation, the row groups are visually grouped by disk blocks: {noformat} row group 1: RC:18774 TS:22182960 OFFSET: 4 row group 2: RC: 2896 TS: 3428160 OFFSET: 6574564 row group 3: RC: 1964 TS: 2322560 OFFSET: 7679844 row group 4: RC: 1074 TS: 1268880 OFFSET: 8732964 {noformat} {noformat} row group 5: RC:18808 TS:8560 OFFSET:1000 row group 6: RC: 2872 TS: 3389520 OFFSET:16612640 row group 7: RC: 1930 TS: 2284960 OFFSET:17716800 row group 8: RC: 1040 TS: 1233520 OFFSET:18768240 {noformat} {noformat} row group 9: RC:18852 TS:22275520 OFFSET:2000 row group 10: RC: 2831 TS: 3345680 OFFSET:26656320 row group 11: RC: 1893 TS: 2244640 OFFSET:27757200 row group 12: RC: 1008 TS: 1195520 OFFSET:28806560 {noformat} {noformat} row group 13: RC:18841 TS:22263360 OFFSET:3000 row group 14: RC: 2835 TS: 3350480 OFFSET:36652000 row group 15: RC: 1900 TS: 2249040 OFFSET:37753600 row group 16: RC: 1016 TS: 1198640 OFFSET:38803600 {noformat} {noformat} row group 17: RC: 1466 TS: 1740320 OFFSET:4000 {noformat} In this example, both the disk block size and the row group size was set to 1000. The data would fit in 5 row groups of this size, but instead, each of the disk blocks (except the last) is split into 4 row groups of progressively decreasing size. was: When the size of buffered data gets near the desired row group size, Parquet flushes the data to a row group. However, at this point the data for the last page is not yet encoded nor compressed, thereby the row group may end up being significantly smaller than it was intended. If the row group ends up being so small that it is farther away from the next disk block boundary than the maximum padding, Parquet will try to create a new group in the same disk block, this time targeting the remaning space. This may also be flushed prematurely, leading to the creation of an even smaller row group, which may lead to an even smaller one... This gets repeated until we get sufficiently close to the block boundary so that padding can be finally applied. The resulting superflous row groups can lead to bad performance. An example of the structure of a Parquet file suffering from this problem can be seen below. For easier interpretation, the row groups are visually grouped by disk blocks: {noformat} row group 1: RC:18774 TS:22182960 OFFSET: 4 row group 2: RC: 2896 TS: 3428160 OFFSET: 6574564 row group 3: RC: 1964 TS: 2322560 OFFSET: 7679844 row group 4: RC: 1074 TS: 1268880 OFFSET: 8732964 {noformat} {noformat} row group 5: RC:18808 TS:8560 OFFSET:1000 row group 6: RC: 2872 TS: 3389520 OFFSET:16612640 row group 7: RC: 1930 TS: 2284960 OFFSET:17716800 row group 8: RC: 1040 TS: 1233520 OFFSET:18768240 {noformat} {noformat} row group 9: RC:18852 TS:22275520 OFFSET:2000 row group 10: RC: 2831 TS: 3345680 OFFSET:26656320 row group 11: RC: 1893 TS: 2244640 OFFSET:27757200 row group 12: RC: 1008 TS: 1195520 OFFSET:28806560 {noformat} {noformat} row group 13: RC:18841 TS:22263360 OFFSET:3000 row group 14: RC: 2835 TS: 3350480 OFFSET:36652000 row group 15: RC: 1900 TS: 2249040 OFFSET:37753600 row group 16: RC: 1016 TS: 1198640 OFFSET:38803600 {noformat} {noformat} row group 17: RC: 1466 TS: 1740320 OFFSET:4000 {noformat} In this example, both the disk block size and the row group size was set to 1000. The data would fit in 5 row groups of this size, but instead, each of the disk blocks (except the last) is split into 4 row groups of progressively decreasing size. > Current block alignment logic may lead to several row groups per block > -- > > Key: PARQUET-1337 > URL: https://issues.apache.org/jira/browse/PARQUET-1337 > Project: Parquet >
[jira] [Commented] (PARQUET-1201) Column indexes
[ https://issues.apache.org/jira/browse/PARQUET-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631463#comment-16631463 ] Zoltan Ivanfi commented on PARQUET-1201: [~rdblue] That is the branch indeed. Our idea with the feature branch was that each commit to the feature branch will happen via a PR and will go through the regular thorough reviewing process. We wanted to avoid opening a giant PR at the end which would be very hard to review anyway due to its sheer size. The community accepted this approach on the Parquet sync where we discussed it. Could you please review the individual PR-s instead? Thanks! > Column indexes > -- > > Key: PARQUET-1201 > URL: https://issues.apache.org/jira/browse/PARQUET-1201 > Project: Parquet > Issue Type: New Feature >Affects Versions: 1.10.0 >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky >Priority: Major > Fix For: format-2.5.0 > > > Write the column indexes described in PARQUET-922. > This is the first phase of implementing the whole feature. The > implementation is done in the following steps: > * Utility to read/write indexes in parquet-format > * Writing indexes in the parquet file > * Extend parquet-tools and parquet-cli to show the indexes > * Limit index size based on parquet properties > * Trim min/max values where possible based on parquet properties > * Filtering based on column indexes > The work is done on the feature branch {{column-indexes}}. This JIRA will be > resolved after the branch has been merged to {{master}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1400) Deprecate parquet-mr related code in parquet-format
[ https://issues.apache.org/jira/browse/PARQUET-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1400. Resolution: Fixed Fix Version/s: format-2.6.0 > Deprecate parquet-mr related code in parquet-format > --- > > Key: PARQUET-1400 > URL: https://issues.apache.org/jira/browse/PARQUET-1400 > Project: Parquet > Issue Type: Improvement > Components: parquet-format >Reporter: Gabor Szadovszky >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > Fix For: format-2.6.0 > > > There are java classes in the > [parquet-format|https://github.com/apache/parquet-format] repo that shall be > in the [parquet-mr|https://github.com/apache/parquet-mr] repo instead: [java > classes|https://github.com/apache/parquet-format/tree/master/src/main] and > [test classes|https://github.com/apache/parquet-format/tree/master/src/test] > These classes shall be deprecated by mentioning they will be moved to the > [parquet-mr|https://github.com/apache/parquet-mr] repo. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1381) Add merge blocks command to parquet-tools
[ https://issues.apache.org/jira/browse/PARQUET-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1381. Resolution: Fixed Fix Version/s: (was: 1.10.1) 1.11.0 > Add merge blocks command to parquet-tools > - > > Key: PARQUET-1381 > URL: https://issues.apache.org/jira/browse/PARQUET-1381 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr >Affects Versions: 1.10.0 >Reporter: Ekaterina Galieva >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Current implementation of merge command in parquet-tools doesn't merge row > groups, just places one after the other. Add API and command option to be > able to merge small blocks into larger ones up to specified size limit. > h6. Implementation details: > Blocks are not reordered not to break possible initial predicate pushdown > optimizations. > Blocks are not divided to fit upper bound perfectly. > This is an intentional performance optimization. > This gives an opportunity to form new blocks by coping full content of > smaller blocks by column, not by row. > h6. Examples: > # Input files with blocks sizes: > {code:java} > [128 | 35], [128 | 40], [120]{code} > Expected output file blocks sizes: > {{merge }} > {code:java} > [128 | 35 | 128 | 40 | 120] > {code} > {{merge -b}} > {code:java} > [128 | 35 | 128 | 40 | 120] > {code} > {{merge -b -l 256 }} > {code:java} > [163 | 168 | 120] > {code} > # Input files with blocks sizes: > {code:java} > [128 | 35], [40], [120], [6] {code} > Expected output file blocks sizes: > {{merge}} > {code:java} > [128 | 35 | 40 | 120 | 6] > {code} > {{merge -b}} > {code:java} > [128 | 75 | 126] > {code} > {{merge -b -l 256}} > {code:java} > [203 | 126]{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1417) BINARY_AS_SIGNED_INTEGER_COMPARATOR fails with IOBE for the same arrays with the different length
[ https://issues.apache.org/jira/browse/PARQUET-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1417: --- Fix Version/s: 1.11.0 > BINARY_AS_SIGNED_INTEGER_COMPARATOR fails with IOBE for the same arrays with > the different length > - > > Key: PARQUET-1417 > URL: https://issues.apache.org/jira/browse/PARQUET-1417 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.0 >Reporter: Volodymyr Vysotskyi >Assignee: Volodymyr Vysotskyi >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > {{BINARY_AS_SIGNED_INTEGER_COMPARATOR}} fails when the same byte arrays but > with the different number leading zeros are compared: > {code:java} > BINARY_AS_SIGNED_INTEGER_COMPARATOR.compare( > Binary.fromConstantByteBuffer(ByteBuffer.wrap(new byte[] { 0, 0, -108 > })), > Binary.fromConstantByteBuffer(ByteBuffer.wrap(new byte[] { 0, -108 > }))); > {code} > Error is: > {noformat} > java.lang.IndexOutOfBoundsException > at java.nio.Buffer.checkIndex(Buffer.java:540) > at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139) > at > org.apache.parquet.schema.PrimitiveComparator$9.compare(PrimitiveComparator.java:280) > at > org.apache.parquet.schema.PrimitiveComparator$9.compare(PrimitiveComparator.java:262) > at > org.apache.parquet.schema.PrimitiveComparator$BinaryComparator.compareNotNulls(PrimitiveComparator.java:186) > at > org.apache.parquet.schema.PrimitiveComparator$BinaryComparator.compareNotNulls(PrimitiveComparator.java:183) > at > org.apache.parquet.schema.PrimitiveComparator.compare(PrimitiveComparator.java:63) > {noformat} > The problem is that {{BINARY_AS_SIGNED_INTEGER_COMPARATOR.compare(ByteBuffer > b1, ByteBuffer b2)}} method passes the length of the first {{ByteBuffer}}, > but it should pass the less length since padding was calculated and passed > for the {{ByteBuffer}} with greater length to the {{compare(int length, > ByteBuffer b1, int p1, ByteBuffer b2, int p2)}} method. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1418) Run integration tests in Travis
[ https://issues.apache.org/jira/browse/PARQUET-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1418: --- Fix Version/s: 1.11.0 > Run integration tests in Travis > --- > > Key: PARQUET-1418 > URL: https://issues.apache.org/jira/browse/PARQUET-1418 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > Currently Travis only runs the unit tests. It should run the integration > tests as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1337) Current block alignment logic may lead to several row groups per block
[ https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1337: --- Component/s: parquet-mr > Current block alignment logic may lead to several row groups per block > -- > > Key: PARQUET-1337 > URL: https://issues.apache.org/jira/browse/PARQUET-1337 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Gabor Szadovszky >Assignee: Zoltan Ivanfi >Priority: Major > Labels: pull-request-available > > When the size of buffered data gets near the desired row group size, Parquet > flushes the data to a row group. However, at this point the data for the last > page is not yet encoded nor compressed, thereby the row group may end up > being significantly smaller than it was intended. > If the row group ends up being so small that it is farther away from the next > disk block boundary than the maximum padding, Parquet will try to create a > new group in the same disk block, this time targeting the remaning space. > This may also be flushed prematurely, leading to the creation of an even > smaller row group, which may lead to an even smaller one... This gets > repeated until we get sufficiently close to the block boundary so that > padding can be finally applied. The resulting superflous row groups can lead > to bad performance. > An example of the structure of a Parquet file suffering from this problem can > be seen below. For easier interpretation, the row groups are visually grouped > by disk blocks: > {noformat} > row group 1: RC:18774 TS:22182960 OFFSET: 4 > row group 2: RC: 2896 TS: 3428160 OFFSET: 6574564 > row group 3: RC: 1964 TS: 2322560 OFFSET: 7679844 > row group 4: RC: 1074 TS: 1268880 OFFSET: 8732964 > {noformat} > {noformat} > row group 5: RC:18808 TS:8560 OFFSET:1000 > row group 6: RC: 2872 TS: 3389520 OFFSET:16612640 > row group 7: RC: 1930 TS: 2284960 OFFSET:17716800 > row group 8: RC: 1040 TS: 1233520 OFFSET:18768240 > {noformat} > {noformat} > row group 9: RC:18852 TS:22275520 OFFSET:2000 > row group 10: RC: 2831 TS: 3345680 OFFSET:26656320 > row group 11: RC: 1893 TS: 2244640 OFFSET:27757200 > row group 12: RC: 1008 TS: 1195520 OFFSET:28806560 > {noformat} > {noformat} > row group 13: RC:18841 TS:22263360 OFFSET:3000 > row group 14: RC: 2835 TS: 3350480 OFFSET:36652000 > row group 15: RC: 1900 TS: 2249040 OFFSET:37753600 > row group 16: RC: 1016 TS: 1198640 OFFSET:38803600 > {noformat} > {noformat} > row group 17: RC: 1466 TS: 1740320 OFFSET:4000 > {noformat} > In this example, both the disk block size and the row group size was set to > 1000. The data would fit in 5 row groups of this size, but instead, each > of the disk blocks (except the last) is split into 4 row groups of > progressively decreasing size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1421) InternalParquetRecordWriter logs debug messages at the INFO level
[ https://issues.apache.org/jira/browse/PARQUET-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1421. Resolution: Fixed Fix Version/s: 1.11.0 > InternalParquetRecordWriter logs debug messages at the INFO level > - > > Key: PARQUET-1421 > URL: https://issues.apache.org/jira/browse/PARQUET-1421 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > The superflous log messages clutter the output and may make Travis build due > to too long output. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1421) InternalParquetRecordWriter logs debug messages at the INFO level
[ https://issues.apache.org/jira/browse/PARQUET-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1421: --- Component/s: parquet-mr > InternalParquetRecordWriter logs debug messages at the INFO level > - > > Key: PARQUET-1421 > URL: https://issues.apache.org/jira/browse/PARQUET-1421 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Major > Labels: pull-request-available > Fix For: 1.11.0 > > > The superflous log messages clutter the output and may make Travis build due > to too long output. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1418) Run integration tests in Travis
[ https://issues.apache.org/jira/browse/PARQUET-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1418: --- Component/s: parquet-mr > Run integration tests in Travis > --- > > Key: PARQUET-1418 > URL: https://issues.apache.org/jira/browse/PARQUET-1418 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Major > Labels: pull-request-available > > Currently Travis only runs the unit tests. It should run the integration > tests as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1418) Run integration tests in Travis
[ https://issues.apache.org/jira/browse/PARQUET-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1418. Resolution: Fixed > Run integration tests in Travis > --- > > Key: PARQUET-1418 > URL: https://issues.apache.org/jira/browse/PARQUET-1418 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Major > Labels: pull-request-available > > Currently Travis only runs the unit tests. It should run the integration > tests as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PARQUET-1421) InternalParquetRecordWriter logs debug messages at the INFO level
[ https://issues.apache.org/jira/browse/PARQUET-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1421: -- Assignee: Zoltan Ivanfi > InternalParquetRecordWriter logs debug messages at the INFO level > - > > Key: PARQUET-1421 > URL: https://issues.apache.org/jira/browse/PARQUET-1421 > Project: Parquet > Issue Type: Bug >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Major > > The superflous log messages clutter the output and may make Travis build due > to too long output. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1421) InternalParquetRecordWriter logs debug messages at the INFO level
Zoltan Ivanfi created PARQUET-1421: -- Summary: InternalParquetRecordWriter logs debug messages at the INFO level Key: PARQUET-1421 URL: https://issues.apache.org/jira/browse/PARQUET-1421 Project: Parquet Issue Type: Bug Reporter: Zoltan Ivanfi The superflous log messages clutter the output and may make Travis build due to too long output. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PARQUET-1418) Run integration tests in Travis
[ https://issues.apache.org/jira/browse/PARQUET-1418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1418: -- Assignee: Zoltan Ivanfi > Run integration tests in Travis > --- > > Key: PARQUET-1418 > URL: https://issues.apache.org/jira/browse/PARQUET-1418 > Project: Parquet > Issue Type: Improvement >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Major > > Currently Travis only runs the unit tests. It should run the integration > tests as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1418) Run integration tests in Travis
Zoltan Ivanfi created PARQUET-1418: -- Summary: Run integration tests in Travis Key: PARQUET-1418 URL: https://issues.apache.org/jira/browse/PARQUET-1418 Project: Parquet Issue Type: Improvement Reporter: Zoltan Ivanfi Currently Travis only runs the unit tests. It should run the integration tests as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-899) Add metadata field describing the application that wrote the file
[ https://issues.apache.org/jira/browse/PARQUET-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-899. --- Resolution: Duplicate Quoting from the commit for PARQUET-352: WriteSupport now has a getName getter method that is added to the footer if it returns a non-null string as writer.model.name. This is intended to help identify files written by object models incorrectly. So writer.model.name is already there for this purpose, albeit undocumented. > Add metadata field describing the application that wrote the file > - > > Key: PARQUET-899 > URL: https://issues.apache.org/jira/browse/PARQUET-899 > Project: Parquet > Issue Type: Improvement >Reporter: Zoltan Ivanfi >Priority: Major > > Although the Parquet library should behave the same regardless of what > application uses it, occasionally serious interoperability bugs are > introduced in specific applications. For example, data written by a specific > application may be unnecessarily adjusted or the calculated statistics may be > invalid (both actual problems). > Unfortunately, currently it is not possible to recognize Parquet files > affected by application problems because the metadata does not contain any > information about the application using the Parquet library. (The name and > version number of the Parquet library is recorded, but that only has limited > use, because apart from Impala, the most widespread Parquet writers all use > the same Java library.) > To allow creating workarounds for future known issues, we should introduce > new metadata fields that applications can populate. The simplest approach is > to have one field for the application name and another for its version > number. A more sophisticated approach suggested by [~julienledem] could also > reference a list of earlier issues that are known to be fixed in the > application that wrote the Parquet file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1337) Current block alignment logic may lead to several row groups per block
[ https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1337: --- Description: When the size of buffered data gets near the desired row group size, Parquet flushes the data to a row group. However, at this point the data for the last page is not yet encoded nor compressed, thereby the row group may end up being significantly smaller than it was intended. If the row group ends up being so small that it is farther away from the next disk block boundary than the maximum padding, Parquet will try to create a new group in the same disk block, this time targeting the remaning space. This may also be flushed prematurely, leading to the creation of an even smaller row group, which may lead to an even smaller one... This gets repeated until we get sufficiently close to the block boundary so that padding can be finally applied. The resulting superflous row groups can lead to bad performance. An example of the structure of a Parquet file suffering from this problem can be seen below. For easier interpretation, the row groups are visually grouped by disk blocks: {noformat} row group 1: RC:18774 TS:22182960 OFFSET: 4 row group 2: RC: 2896 TS: 3428160 OFFSET: 6574564 row group 3: RC: 1964 TS: 2322560 OFFSET: 7679844 row group 4: RC: 1074 TS: 1268880 OFFSET: 8732964 {noformat} {noformat} row group 5: RC:18808 TS:8560 OFFSET:1000 row group 6: RC: 2872 TS: 3389520 OFFSET:16612640 row group 7: RC: 1930 TS: 2284960 OFFSET:17716800 row group 8: RC: 1040 TS: 1233520 OFFSET:18768240 {noformat} {noformat} row group 9: RC:18852 TS:22275520 OFFSET:2000 row group 10: RC: 2831 TS: 3345680 OFFSET:26656320 row group 11: RC: 1893 TS: 2244640 OFFSET:27757200 row group 12: RC: 1008 TS: 1195520 OFFSET:28806560 {noformat} {noformat} row group 13: RC:18841 TS:22263360 OFFSET:3000 row group 14: RC: 2835 TS: 3350480 OFFSET:36652000 row group 15: RC: 1900 TS: 2249040 OFFSET:37753600 row group 16: RC: 1016 TS: 1198640 OFFSET:38803600 {noformat} {noformat} row group 17: RC: 1466 TS: 1740320 OFFSET:4000 {noformat} In this example, both the disk block size and the row group size was set to 1000. The data would fit in 5 row groups of this size, but instead, each of the disk blocks (except the last) is split into 4 row groups of progressively decreasing size. was: If there are many columns with encoding RLE+bitpacking (e.g. dictionary encoding) where the value variance is low the estimate of the size of the open pages (which are not encoded yet) are much larger than the final page size. Because of that parquet-mr fails to create row-groups that size are close to {{parquet.block.size}} which causes performance issues while reading. A hint from Ryan to solve this issue: {quote} We could probably get a better estimate by using the amount of buffered data and how large other pages in a column were after fully encoding and compressing. So if you have 5 pages compressed and buffered, and another 1000 values, use the compression ratio of the 5 pages to estimate the final size. We'd probably want to use some overhead value for the header. And, we'd want to separate the amount of buffered data from our row group size estimate, which are currently the same thing. {quote} (So, it is not only about RLE+bitpacking but any kind of encoding which is done only after "closing" a page.) > Current block alignment logic may lead to several row groups per block > -- > > Key: PARQUET-1337 > URL: https://issues.apache.org/jira/browse/PARQUET-1337 > Project: Parquet > Issue Type: Improvement >Reporter: Gabor Szadovszky >Assignee: Zoltan Ivanfi >Priority: Major > > When the size of buffered data gets near the desired row group size, Parquet > flushes the data to a row group. However, at this point the data for the last > page is not yet encoded nor compressed, thereby the row group may end up > being significantly smaller than it was intended. > If the row group ends up being so small that it is farther away from the next > disk block boundary than the maximum padding, Parquet will try to create a > new group in the same disk block, this time targeting the remaning space. > This may also be flushed prematurely, leading to the creation of an even > smaller row group, which may lead to an even smaller one... This gets > repeated until we get sufficiently close to the block boundary so that > padding can be finally applied. The resulting superflous row groups can lead > to bad performance. > An example of the structure of a Parquet file suffering from this problem can > be seen below. For easier interpretation, the row groups are visually grouped > by disk blocks: > {noformat} > row group 1: RC:18774
[jira] [Updated] (PARQUET-1337) Current block alignment logic may lead to several row groups per block
[ https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1337: --- Summary: Current block alignment logic may lead to several row groups per block (was: Implement better estimate of page size for RLE+bitpacking) > Current block alignment logic may lead to several row groups per block > -- > > Key: PARQUET-1337 > URL: https://issues.apache.org/jira/browse/PARQUET-1337 > Project: Parquet > Issue Type: Improvement >Reporter: Gabor Szadovszky >Assignee: Zoltan Ivanfi >Priority: Major > > If there are many columns with encoding RLE+bitpacking (e.g. dictionary > encoding) where the value variance is low the estimate of the size of the > open pages (which are not encoded yet) are much larger than the final page > size. Because of that parquet-mr fails to create row-groups that size are > close to {{parquet.block.size}} which causes performance issues while reading. > A hint from Ryan to solve this issue: > {quote} > We could probably get a better estimate by using the amount of buffered > data and how large other pages in a column were after fully encoding and > compressing. So if you have 5 pages compressed and buffered, and another > 1000 values, use the compression ratio of the 5 pages to estimate the final > size. We'd probably want to use some overhead value for the header. And, > we'd want to separate the amount of buffered data from our row group size > estimate, which are currently the same thing. > {quote} > (So, it is not only about RLE+bitpacking but any kind of encoding which is > done only after "closing" a page.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PARQUET-1337) Implement better estimate of page size for RLE+bitpacking
[ https://issues.apache.org/jira/browse/PARQUET-1337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1337: -- Assignee: Zoltan Ivanfi > Implement better estimate of page size for RLE+bitpacking > - > > Key: PARQUET-1337 > URL: https://issues.apache.org/jira/browse/PARQUET-1337 > Project: Parquet > Issue Type: Improvement >Reporter: Gabor Szadovszky >Assignee: Zoltan Ivanfi >Priority: Major > > If there are many columns with encoding RLE+bitpacking (e.g. dictionary > encoding) where the value variance is low the estimate of the size of the > open pages (which are not encoded yet) are much larger than the final page > size. Because of that parquet-mr fails to create row-groups that size are > close to {{parquet.block.size}} which causes performance issues while reading. > A hint from Ryan to solve this issue: > {quote} > We could probably get a better estimate by using the amount of buffered > data and how large other pages in a column were after fully encoding and > compressing. So if you have 5 pages compressed and buffered, and another > 1000 values, use the compression ratio of the 5 pages to estimate the final > size. We'd probably want to use some overhead value for the header. And, > we'd want to separate the amount of buffered data from our row group size > estimate, which are currently the same thing. > {quote} > (So, it is not only about RLE+bitpacking but any kind of encoding which is > done only after "closing" a page.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1353) The random data generator used for tests repeats the same value over and over again
[ https://issues.apache.org/jira/browse/PARQUET-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1353: --- Component/s: parquet-mr > The random data generator used for tests repeats the same value over and over > again > --- > > Key: PARQUET-1353 > URL: https://issues.apache.org/jira/browse/PARQUET-1353 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Minor > Labels: pull-request-available > > The RandomValues class returns references to its internal buffer as random > values. This buffer gets a random value every time a new random value is > requested, but since earlier values reference the same internal buffer, they > get changed to the same value as well. So even if successive calls return > different values each time, the actual list of these values will always > consist of a single value repeated multiple times. For example: > ||n-th call||returned value||accumulated list expected||accumulated list > actual|| > |1|6C|6C|6C| > |2|8F|6C 8F|8F 8F| > |3|52|6C 8F 52|52 52 52| > |4|B8|6C 8F 52 B8|B8 B8 B8 B8| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1353) The random data generator used for tests repeats the same value over and over again
Zoltan Ivanfi created PARQUET-1353: -- Summary: The random data generator used for tests repeats the same value over and over again Key: PARQUET-1353 URL: https://issues.apache.org/jira/browse/PARQUET-1353 Project: Parquet Issue Type: Bug Reporter: Zoltan Ivanfi The RandomValues class returns references to its internal buffer as random values. This buffer gets a random value every time a new random value is requested, but since earlier values reference the same internal buffer, they get changed to the same value as well. So even if successive calls return different values each time, the actual list of these values will always consist of a single value repeated multiple times. For example: ||n-th call||returned value||accumulated list expected||accumulated list actual|| |1|6C|6C|6C| |2|8F|6C 8F|8F 8F| |3|52|6C 8F 52|52 52 52| |4|B8|6C 8F 52 B8|B8 B8 B8 B8| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PARQUET-1353) The random data generator used for tests repeats the same value over and over again
[ https://issues.apache.org/jira/browse/PARQUET-1353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1353: -- Assignee: Zoltan Ivanfi > The random data generator used for tests repeats the same value over and over > again > --- > > Key: PARQUET-1353 > URL: https://issues.apache.org/jira/browse/PARQUET-1353 > Project: Parquet > Issue Type: Bug >Reporter: Zoltan Ivanfi >Assignee: Zoltan Ivanfi >Priority: Minor > > The RandomValues class returns references to its internal buffer as random > values. This buffer gets a random value every time a new random value is > requested, but since earlier values reference the same internal buffer, they > get changed to the same value as well. So even if successive calls return > different values each time, the actual list of these values will always > consist of a single value repeated multiple times. For example: > ||n-th call||returned value||accumulated list expected||accumulated list > actual|| > |1|6C|6C|6C| > |2|8F|6C 8F|8F 8F| > |3|52|6C 8F 52|52 52 52| > |4|B8|6C 8F 52 B8|B8 B8 B8 B8| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1347) [parquet-tools] dump command shows binary values differenty than the cat or head commands
[ https://issues.apache.org/jira/browse/PARQUET-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1347: --- Description: {{parquet-tools dump}} shows binary values as strings, if they are valid UTF-8 sequences. {{parquet-tools cat}} and {{parquet-tools head}} show binary values base64-encoded, regardless of whether they are valid UTF-8 sequences or not. (If the type is annotated as UTF-8, the values are shown as strings by all of these commands.) was: {{parquet-tools dump}} shows binary values as strings, if they are valid UTF-8 sequences. {{parquet-tools cat}} and {{parquet-tools head}} show binary values base64-encoded, regardless of whether they are valid UTF-8 sequences or not. > [parquet-tools] dump command shows binary values differenty than the cat or > head commands > - > > Key: PARQUET-1347 > URL: https://issues.apache.org/jira/browse/PARQUET-1347 > Project: Parquet > Issue Type: Bug >Reporter: Zoltan Ivanfi >Priority: Minor > > {{parquet-tools dump}} shows binary values as strings, if they are valid > UTF-8 sequences. > {{parquet-tools cat}} and {{parquet-tools head}} show binary values > base64-encoded, regardless of whether they are valid UTF-8 sequences or not. > (If the type is annotated as UTF-8, the values are shown as strings by all of > these commands.) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1347) [parquet-tools] dump command shows binary values differenty than the cat or head commands
Zoltan Ivanfi created PARQUET-1347: -- Summary: [parquet-tools] dump command shows binary values differenty than the cat or head commands Key: PARQUET-1347 URL: https://issues.apache.org/jira/browse/PARQUET-1347 Project: Parquet Issue Type: Bug Reporter: Zoltan Ivanfi {{parquet-tools dump}} shows binary values as strings, if they are valid UTF-8 sequences. {{parquet-tools cat}} and {{parquet-tools head}} show binary values base64-encoded, regardless of whether they are valid UTF-8 sequences or not. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1317) ParquetMetadataConverter throw NPE
[ https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1317: --- Affects Version/s: (was: 1.10.1) 1.11.0 > ParquetMetadataConverter throw NPE > -- > > Key: PARQUET-1317 > URL: https://issues.apache.org/jira/browse/PARQUET-1317 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 1.11.0 > > > How to reproduce: > {code:scala} > $ bin/spark-shell > scala> spark.range(10).selectExpr("cast(id as string) as > id").coalesce(1).write.parquet("/tmp/parquet-1317") > scala> > java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head > --debug > file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet > {code} > {noformat} > java.io.IOException: Could not read footer: java.lang.NullPointerException > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271) > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354) > at > org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88) > at org.apache.parquet.tools.Main.main(Main.java:223) > Caused by: java.lang.NullPointerException > at > org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > java.io.IOException: Could not read footer: > java.lang.NullPointerException{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (PARQUET-1317) ParquetMetadataConverter throw NPE
[ https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1317. Resolution: Fixed Fix Version/s: 1.11.0 > ParquetMetadataConverter throw NPE > -- > > Key: PARQUET-1317 > URL: https://issues.apache.org/jira/browse/PARQUET-1317 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 1.11.0 > > > How to reproduce: > {code:scala} > $ bin/spark-shell > scala> spark.range(10).selectExpr("cast(id as string) as > id").coalesce(1).write.parquet("/tmp/parquet-1317") > scala> > java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head > --debug > file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet > {code} > {noformat} > java.io.IOException: Could not read footer: java.lang.NullPointerException > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271) > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354) > at > org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88) > at org.apache.parquet.tools.Main.main(Main.java:223) > Caused by: java.lang.NullPointerException > at > org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > java.io.IOException: Could not read footer: > java.lang.NullPointerException{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (PARQUET-1317) ParquetMetadataConverter throw NPE
[ https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1317: --- Component/s: parquet-mr > ParquetMetadataConverter throw NPE > -- > > Key: PARQUET-1317 > URL: https://issues.apache.org/jira/browse/PARQUET-1317 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Affects Versions: 1.10.1 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > How to reproduce: > {code:scala} > $ bin/spark-shell > scala> spark.range(10).selectExpr("cast(id as string) as > id").coalesce(1).write.parquet("/tmp/parquet-1317") > scala> > java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head > --debug > file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet > {code} > {noformat} > java.io.IOException: Could not read footer: java.lang.NullPointerException > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271) > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354) > at > org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88) > at org.apache.parquet.tools.Main.main(Main.java:223) > Caused by: java.lang.NullPointerException > at > org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > java.io.IOException: Could not read footer: > java.lang.NullPointerException{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1317) ParquetMetadataConverter throw NPE
[ https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499891#comment-16499891 ] Zoltan Ivanfi commented on PARQUET-1317: Hi [~q79969786], Thanks for reporting and investigating this issue. Since you wrote that you are working on it (but couldn't assign it to yourself due to insufficient access rights), I added you to the list of contributors and assigned the JIRA to you. In the future you can freely assign tickets to yourself (and of course you can unassign them as well if you stop working on them). > ParquetMetadataConverter throw NPE > -- > > Key: PARQUET-1317 > URL: https://issues.apache.org/jira/browse/PARQUET-1317 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > How to reproduce: > {code:scala} > $ bin/spark-shell > scala> spark.range(10).selectExpr("cast(id as string) as > id").coalesce(1).write.parquet("/tmp/parquet-1317") > scala> > java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head > --debug > file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet > {code} > {noformat} > java.io.IOException: Could not read footer: java.lang.NullPointerException > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271) > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354) > at > org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88) > at org.apache.parquet.tools.Main.main(Main.java:223) > Caused by: java.lang.NullPointerException > at > org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > java.io.IOException: Could not read footer: > java.lang.NullPointerException{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (PARQUET-1317) ParquetMetadataConverter throw NPE
[ https://issues.apache.org/jira/browse/PARQUET-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi reassigned PARQUET-1317: -- Assignee: Yuming Wang > ParquetMetadataConverter throw NPE > -- > > Key: PARQUET-1317 > URL: https://issues.apache.org/jira/browse/PARQUET-1317 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > How to reproduce: > {code:scala} > $ bin/spark-shell > scala> spark.range(10).selectExpr("cast(id as string) as > id").coalesce(1).write.parquet("/tmp/parquet-1317") > scala> > java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar head > --debug > file:///tmp/parquet-1317/part-0-6cfafbdd-fdeb-4861-8499-8583852ba437-c000.snappy.parquet > {code} > {noformat} > java.io.IOException: Could not read footer: java.lang.NullPointerException > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:271) > at > org.apache.parquet.hadoop.ParquetFileReader.readAllFootersInParallelUsingSummaryFiles(ParquetFileReader.java:202) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooters(ParquetFileReader.java:354) > at > org.apache.parquet.tools.command.RowCountCommand.execute(RowCountCommand.java:88) > at org.apache.parquet.tools.Main.main(Main.java:223) > Caused by: java.lang.NullPointerException > at > org.apache.parquet.format.converter.ParquetMetadataConverter.getOriginalType(ParquetMetadataConverter.java:828) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.buildChildren(ParquetMetadataConverter.java:1173) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetSchema(ParquetMetadataConverter.java:1124) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.fromParquetMetadata(ParquetMetadataConverter.java:1058) > at > org.apache.parquet.format.converter.ParquetMetadataConverter.readParquetMetadata(ParquetMetadataConverter.java:1052) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:532) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:505) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:499) > at > org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:476) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:261) > at > org.apache.parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:257) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > java.io.IOException: Could not read footer: > java.lang.NullPointerException{noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (PARQUET-1295) Parquet libraries do not follow proper semantic versioning
[ https://issues.apache.org/jira/browse/PARQUET-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483843#comment-16483843 ] Zoltan Ivanfi edited comment on PARQUET-1295 at 5/22/18 12:12 PM: -- I agree with [~vrozov], in fact I used the same argument [on the mailing list|https://lists.apache.org/thread.html/7db8ec906b29c917d70120fab78382cdfb2406c4188f10829933ed87@%3Cdev.parquet.apache.org%3E]: {quote}Parquet uses semantic versioning. As a library, it should take extra care not to break its public API in minor releases. This also applies to publicly accessible classes and methods that are considered internal if this "internalness" is not properly documented. It is tempting to dismiss these cases with the reasoning that they were not intended to be public in the first place, but from an API consumer's point of view, this "leaked" API is indistinguishable from the "real" API. Currently the information of what is public and what is internal is undocumented and only known to a few Parquet developers. Until API consumers have a way to determine the intended target audience of our classes and methods, we should pay more attention to keeping our leaked internal API backwards-compatible as well. {quote} [~vrozov], regarding your comment: {quote} That information is hidden somewhere in the pom file. {quote} It's actually even worse than that. The exclusions are only added to the pom file when a breaking change is made, so even that list is unsuitable for determining whether something is considered internal, as it only contains those parts of the internal API that we already broke. was (Author: zi): I agree with [~vrozov], in fact I used the same argument [on the mailing list|https://lists.apache.org/thread.html/7db8ec906b29c917d70120fab78382cdfb2406c4188f10829933ed87@%3Cdev.parquet.apache.org%3E]. {quote}That information is hidden somewhere in the pom file.{quote} It's actually even worse than that. The exclusions are only added to the pom file when a breaking change is made, so even that list is unsuitable for determining whether something is considered internal, as it only contains those parts of the internal API that we already broke. > Parquet libraries do not follow proper semantic versioning > -- > > Key: PARQUET-1295 > URL: https://issues.apache.org/jira/browse/PARQUET-1295 > Project: Parquet > Issue Type: Bug >Reporter: Vlad Rozov >Priority: Major > > There are changes between 1.8.0 and 1.10.0 that break API compatibility. A > minor version change is supposed to be backward compatible with 1.9.0 and > 1.8.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PARQUET-1295) Parquet libraries do not follow proper semantic versioning
[ https://issues.apache.org/jira/browse/PARQUET-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483843#comment-16483843 ] Zoltan Ivanfi commented on PARQUET-1295: I agree with [~vrozov], in fact I used the same argument [on the mailing list|https://lists.apache.org/thread.html/7db8ec906b29c917d70120fab78382cdfb2406c4188f10829933ed87@%3Cdev.parquet.apache.org%3E]. {quote}That information is hidden somewhere in the pom file.{quote} It's actually even worse than that. The exclusions are only added to the pom file when a breaking change is made, so even that list is unsuitable for determining whether something is considered internal, as it only contains those parts of the internal API that we already broke. > Parquet libraries do not follow proper semantic versioning > -- > > Key: PARQUET-1295 > URL: https://issues.apache.org/jira/browse/PARQUET-1295 > Project: Parquet > Issue Type: Bug >Reporter: Vlad Rozov >Priority: Major > > There are changes between 1.8.0 and 1.10.0 that break API compatibility. A > minor version change is supposed to be backward compatible with 1.9.0 and > 1.8.0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (PARQUET-1304) Release 1.10 contains breaking changes for Hive
Zoltan Ivanfi created PARQUET-1304: -- Summary: Release 1.10 contains breaking changes for Hive Key: PARQUET-1304 URL: https://issues.apache.org/jira/browse/PARQUET-1304 Project: Parquet Issue Type: Bug Components: parquet-mr Affects Versions: 1.10.0 Reporter: Zoltan Ivanfi Hive uses the initFromPage(int valueCount, ByteBuffer page, int offset) method that [got removed|https://github.com/apache/parquet-mr/commit/8bbc6cb95fd9b4b9e86c924ca1e40fd555ecac1d#diff-175c27f5147df0043ac57c7685629934L574] in PARQUET-787. As a result, Hive does not compile with Paruqet 1.10. -- This message was sent by Atlassian JIRA (v7.6.3#76005)