[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter

2015-11-20 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15018953#comment-15018953
 ] 

Ryan Blue commented on PARQUET-390:
---

[~julienledem], what do you think about getting this done for 1.9.0? Seems 
pretty reasonable to me, but it needs tests.

> GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
> -
>
> Key: PARQUET-390
> URL: https://issues.apache.org/jira/browse/PARQUET-390
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Michael Allman
>  Labels: newbie, parquet
>
> This is the code as it currently stands in master:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType()));
> }
> {code}
> Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe 
> the code should be:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType(), strict));
> }
> {code}
> Note the call to {{mergeFields}} includes the {{strict}} parameter.
> I would work on this myself, but I'm having considerable trouble working with 
> the codebase (see e.g. 
> http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure).
>  Given the (assumed) simplicity of the fix, can a seasoned Parquet 
> contributor take this up? Cheers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter

2015-11-09 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997083#comment-14997083
 ] 

Ryan Blue commented on PARQUET-390:
---

You're right that my suggestion is a much larger issue. For this problem, I'm 
fine with fixing the union function, though I'd like to see it fixed and tested 
rather than just tweaked, if that sounds reasonable.

> GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
> -
>
> Key: PARQUET-390
> URL: https://issues.apache.org/jira/browse/PARQUET-390
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Michael Allman
>  Labels: newbie, parquet
>
> This is the code as it currently stands in master:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType()));
> }
> {code}
> Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe 
> the code should be:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType(), strict));
> }
> {code}
> Note the call to {{mergeFields}} includes the {{strict}} parameter.
> I would work on this myself, but I'm having considerable trouble working with 
> the codebase (see e.g. 
> http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure).
>  Given the (assumed) simplicity of the fix, can a seasoned Parquet 
> contributor take this up? Cheers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter

2015-11-04 Thread Michael Allman (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990275#comment-14990275
 ] 

Michael Allman commented on PARQUET-390:


Hi Ryan,

The approach you're suggesting sounds prudent but ambitious at the same time. 
Does it make sense to commit this change as a simple bug fix for the 
functionality that currently exists (however limited)?

Thanks.

> GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
> -
>
> Key: PARQUET-390
> URL: https://issues.apache.org/jira/browse/PARQUET-390
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Michael Allman
>  Labels: newbie, parquet
>
> This is the code as it currently stands in master:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType()));
> }
> {code}
> Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe 
> the code should be:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType(), strict));
> }
> {code}
> Note the call to {{mergeFields}} includes the {{strict}} parameter.
> I would work on this myself, but I'm having considerable trouble working with 
> the codebase (see e.g. 
> http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure).
>  Given the (assumed) simplicity of the fix, can a seasoned Parquet 
> contributor take this up? Cheers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter

2015-11-04 Thread Michael Allman (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989977#comment-14989977
 ] 

Michael Allman commented on PARQUET-390:


Hi Ryan,

I ran into this when I was trying to test whether two different but similar 
parquet message types would be compatible. Ultimately, these are derived from 
Hive table schema of two tables---an older table and a newer table. The message 
types are not union-compatible from a strict perspective, but I got a message 
about not being able to convert an optional int64 to an optional int32 when I 
passed "strict = false" and it made me scratch my head. It seems this 
particular source code is behind the reason I get that error even when I try 
non-strict type union.

Cheers.

> GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
> -
>
> Key: PARQUET-390
> URL: https://issues.apache.org/jira/browse/PARQUET-390
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Michael Allman
>  Labels: newbie, parquet
>
> This is the code as it currently stands in master:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType()));
> }
> {code}
> Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe 
> the code should be:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType(), strict));
> }
> {code}
> Note the call to {{mergeFields}} includes the {{strict}} parameter.
> I would work on this myself, but I'm having considerable trouble working with 
> the codebase (see e.g. 
> http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure).
>  Given the (assumed) simplicity of the fix, can a seasoned Parquet 
> contributor take this up? Cheers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter

2015-11-04 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14989939#comment-14989939
 ] 

Ryan Blue commented on PARQUET-390:
---

Thanks for the bug report, Michael. I think you're right about this.

Could you share with us what you're using this for? This was originally used 
for building an overall schema for the files in a job, but it isn't necessary 
to do that and we mostly removed the need to in PARQUET-139. I'd like to see 
what your use case for it is. Thanks!

> GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
> -
>
> Key: PARQUET-390
> URL: https://issues.apache.org/jira/browse/PARQUET-390
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Michael Allman
>  Labels: parquet
>
> This is the code as it currently stands in master:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType()));
> }
> {code}
> Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe 
> the code should be:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType(), strict));
> }
> {code}
> Note the call to {{mergeFields}} includes the {{strict}} parameter.
> I would work on this myself, but I'm having considerable trouble working with 
> the codebase (see e.g. 
> http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure).
>  Given the (assumed) simplicity of the fix, can a seasoned Parquet 
> contributor take this up? Cheers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PARQUET-390) GroupType.union(Type toMerge, boolean strict) does not honor strict parameter

2015-11-04 Thread Michael Allman (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990046#comment-14990046
 ] 

Michael Allman commented on PARQUET-390:


Hi Ryan,

Correct, I want to test compatibility between the two schemas modulo some basic 
primitive type conversions. FWIW, all of our fields are optional so for our use 
case that's not an issue.

To answer your question, the object model is the Spark SQL Catalyst data types, 
which are converted to parquet types and hive schema.

I think I came up with a tentative solution this morning for testing 
compatibility using the catalyst types. I'm going to give that a shot.

Cheers.

> GroupType.union(Type toMerge, boolean strict) does not honor strict parameter
> -
>
> Key: PARQUET-390
> URL: https://issues.apache.org/jira/browse/PARQUET-390
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Michael Allman
>  Labels: newbie, parquet
>
> This is the code as it currently stands in master:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType()));
> }
> {code}
> Note the call to {{mergeFields}} omits the {{strict}} parameter. I believe 
> the code should be:
> {code}
> @Override
> protected Type union(Type toMerge, boolean strict) {
>   if (toMerge.isPrimitive()) {
> throw new IncompatibleSchemaModificationException("can not merge 
> primitive type " + toMerge + " into group type " + this);
>   }
>   return new GroupType(toMerge.getRepetition(), getName(), 
> mergeFields(toMerge.asGroupType(), strict));
> }
> {code}
> Note the call to {{mergeFields}} includes the {{strict}} parameter.
> I would work on this myself, but I'm having considerable trouble working with 
> the codebase (see e.g. 
> http://stackoverflow.com/questions/31229445/build-failure-apache-parquet-mr-source-mvn-install-failure).
>  Given the (assumed) simplicity of the fix, can a seasoned Parquet 
> contributor take this up? Cheers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)