[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
[ https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018953#comment-15018953 ] Ryan Blue commented on PARQUET-390: --- [~julienledem], what do you think about getting this done for 1.9.0? Seems pretty reasonable to me, but it needs tests. > GroupType.union(Type toMerge, boolean strict) does not honor strict parameter > - > > Key: PARQUET-390 > URL: https://issues.apache.org/jira/browse/PARQUET-390 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Michael Allman > Labels: newbie, parquet > > This is the code as it currently stands in master: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType())); > } > {code} > Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe > the code should be: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType(), strict)); > } > {code} > Note the call to {{mergeFields}} includes the {{strict}} parameter. > I would work on this myself, but I'm having considerable trouble working with > the codebase (see e.g. > http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure). > Given the (assumed) simplicity of the fix, can a seasoned Parquet > contributor take this up? Cheers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
[ https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997083#comment-14997083 ] Ryan Blue commented on PARQUET-390: --- You're right that my suggestion is a much larger issue. For this problem, I'm fine with fixing the union function, though I'd like to see it fixed and tested rather than just tweaked, if that sounds reasonable. > GroupType.union(Type toMerge, boolean strict) does not honor strict parameter > - > > Key: PARQUET-390 > URL: https://issues.apache.org/jira/browse/PARQUET-390 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Michael Allman > Labels: newbie, parquet > > This is the code as it currently stands in master: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType())); > } > {code} > Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe > the code should be: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType(), strict)); > } > {code} > Note the call to {{mergeFields}} includes the {{strict}} parameter. > I would work on this myself, but I'm having considerable trouble working with > the codebase (see e.g. > http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure). > Given the (assumed) simplicity of the fix, can a seasoned Parquet > contributor take this up? Cheers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
[ https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990275#comment-14990275 ] Michael Allman commented on PARQUET-390: Hi Ryan, The approach you're suggesting sounds prudent but ambitious at the same time. Does it make sense to commit this change as a simple bug fix for the functionality that currently exists (however limited)? Thanks. > GroupType.union(Type toMerge, boolean strict) does not honor strict parameter > - > > Key: PARQUET-390 > URL: https://issues.apache.org/jira/browse/PARQUET-390 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Michael Allman > Labels: newbie, parquet > > This is the code as it currently stands in master: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType())); > } > {code} > Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe > the code should be: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType(), strict)); > } > {code} > Note the call to {{mergeFields}} includes the {{strict}} parameter. > I would work on this myself, but I'm having considerable trouble working with > the codebase (see e.g. > http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure). > Given the (assumed) simplicity of the fix, can a seasoned Parquet > contributor take this up? Cheers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
[ https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989977#comment-14989977 ] Michael Allman commented on PARQUET-390: Hi Ryan, I ran into this when I was trying to test whether two different but similar parquet message types would be compatible. Ultimately, these are derived from Hive table schema of two tables---an older table and a newer table. The message types are not union-compatible from a strict perspective, but I got a message about not being able to convert an optional int64 to an optional int32 when I passed "strict = false" and it made me scratch my head. It seems this particular source code is behind the reason I get that error even when I try non-strict type union. Cheers. > GroupType.union(Type toMerge, boolean strict) does not honor strict parameter > - > > Key: PARQUET-390 > URL: https://issues.apache.org/jira/browse/PARQUET-390 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Michael Allman > Labels: newbie, parquet > > This is the code as it currently stands in master: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType())); > } > {code} > Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe > the code should be: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType(), strict)); > } > {code} > Note the call to {{mergeFields}} includes the {{strict}} parameter. > I would work on this myself, but I'm having considerable trouble working with > the codebase (see e.g. > http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure). > Given the (assumed) simplicity of the fix, can a seasoned Parquet > contributor take this up? Cheers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
[ https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989939#comment-14989939 ] Ryan Blue commented on PARQUET-390: --- Thanks for the bug report, Michael. I think you're right about this. Could you share with us what you're using this for? This was originally used for building an overall schema for the files in a job, but it isn't necessary to do that and we mostly removed the need to in PARQUET-139. I'd like to see what your use case for it is. Thanks! > GroupType.union(Type toMerge, boolean strict) does not honor strict parameter > - > > Key: PARQUET-390 > URL: https://issues.apache.org/jira/browse/PARQUET-390 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Michael Allman > Labels: parquet > > This is the code as it currently stands in master: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType())); > } > {code} > Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe > the code should be: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType(), strict)); > } > {code} > Note the call to {{mergeFields}} includes the {{strict}} parameter. > I would work on this myself, but I'm having considerable trouble working with > the codebase (see e.g. > http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure). > Given the (assumed) simplicity of the fix, can a seasoned Parquet > contributor take this up? Cheers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
[ https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990046#comment-14990046 ] Michael Allman commented on PARQUET-390: Hi Ryan, Correct, I want to test compatibility between the two schemas modulo some basic primitive type conversions. FWIW, all of our fields are optional so for our use case that's not an issue. To answer your question, the object model is the Spark SQL Catalyst data types, which are converted to parquet types and hive schema. I think I came up with a tentative solution this morning for testing compatibility using the catalyst types. I'm going to give that a shot. Cheers. > GroupType.union(Type toMerge, boolean strict) does not honor strict parameter > - > > Key: PARQUET-390 > URL: https://issues.apache.org/jira/browse/PARQUET-390 > Project: Parquet > Issue Type: Bug > Components: parquet-mr >Reporter: Michael Allman > Labels: newbie, parquet > > This is the code as it currently stands in master: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType())); > } > {code} > Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe > the code should be: > {code} > @Override > protected Type union(Type toMerge, boolean strict) { > if (toMerge.isPrimitive()) { > throw new IncompatibleSchemaModificationException("can not merge > primitive type " + toMerge + " into group type " + this); > } > return new GroupType(toMerge.getRepetition(), getName(), > mergeFields(toMerge.asGroupType(), strict)); > } > {code} > Note the call to {{mergeFields}} includes the {{strict}} parameter. > I would work on this myself, but I'm having considerable trouble working with > the codebase (see e.g. > http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure). > Given the (assumed) simplicity of the fix, can a seasoned Parquet > contributor take this up? Cheers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)