[jira] [Commented] (SPARK-10113) Support for unsigned Parquet logical types

2015-11-11 Thread Cheng Lian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000203#comment-15000203
 ] 

Cheng Lian commented on SPARK-10113:


I think emitting a clear error message is more reasonable since Spark SQL only 
handles signed integral types and can't hold the whole value ranges of those 
unsigned integral types.

> Support for unsigned Parquet logical types
> --
>
> Key: SPARK-10113
> URL: https://issues.apache.org/jira/browse/SPARK-10113
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Jordan Thomas
>
> Add support for unsigned Parquet logical types UINT_16, UINT_32 and UINT_64.
> {code}
> org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (UINT_64);
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.illegalType$1(CatalystSchemaConverter.scala:130)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertPrimitiveField(CatalystSchemaConverter.scala:169)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:115)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertGroupField(CatalystSchemaConverter.scala:200)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:116)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:91)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$.readSchemaFromFooter(ParquetRelation.scala:734)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:714)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:713)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at 

[jira] [Commented] (SPARK-10113) Support for unsigned Parquet logical types

2015-11-11 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001435#comment-15001435
 ] 

Apache Spark commented on SPARK-10113:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/9646

> Support for unsigned Parquet logical types
> --
>
> Key: SPARK-10113
> URL: https://issues.apache.org/jira/browse/SPARK-10113
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Jordan Thomas
>
> Add support for unsigned Parquet logical types UINT_16, UINT_32 and UINT_64.
> {code}
> org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (UINT_64);
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.illegalType$1(CatalystSchemaConverter.scala:130)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertPrimitiveField(CatalystSchemaConverter.scala:169)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:115)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertGroupField(CatalystSchemaConverter.scala:200)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:116)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:91)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$.readSchemaFromFooter(ParquetRelation.scala:734)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:714)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:713)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at 

[jira] [Commented] (SPARK-10113) Support for unsigned Parquet logical types

2015-11-08 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995992#comment-14995992
 ] 

Hyukjin Kwon commented on SPARK-10113:
--

I think I can work on this but we need to make several things clear as far as I 
understood correctly.
As far as I know, Spark does not support unsigned types directly and therefore 
it emitted the exception.
So to deal with this,

1. we might need to convert {{UINT16}} to {{IntegerType}}, {{UINT32}} to 
{{LongType}} and {{UINT96}} to {{Decimal}}.
2. or emits with the message such as {{"Unsupported data type 
$field.dataType"}}.


> Support for unsigned Parquet logical types
> --
>
> Key: SPARK-10113
> URL: https://issues.apache.org/jira/browse/SPARK-10113
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Jordan Thomas
>
> Add support for unsigned Parquet logical types UINT_16, UINT_32 and UINT_64.
> {code}
> org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (UINT_64);
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.illegalType$1(CatalystSchemaConverter.scala:130)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertPrimitiveField(CatalystSchemaConverter.scala:169)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:115)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertGroupField(CatalystSchemaConverter.scala:200)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:116)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:91)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$.readSchemaFromFooter(ParquetRelation.scala:734)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:714)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:713)
>   at 
> 

[jira] [Commented] (SPARK-10113) Support for unsigned Parquet logical types

2015-11-08 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996026#comment-14996026
 ] 

Hyukjin Kwon commented on SPARK-10113:
--

[~lian cheng] Just to double check.., Could I ask if this is intentionally 
omitted?


> Support for unsigned Parquet logical types
> --
>
> Key: SPARK-10113
> URL: https://issues.apache.org/jira/browse/SPARK-10113
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Jordan Thomas
>
> Add support for unsigned Parquet logical types UINT_16, UINT_32 and UINT_64.
> {code}
> org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (UINT_64);
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.illegalType$1(CatalystSchemaConverter.scala:130)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertPrimitiveField(CatalystSchemaConverter.scala:169)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:115)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200)
>   at scala.Option.fold(Option.scala:158)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertGroupField(CatalystSchemaConverter.scala:200)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:116)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94)
>   at 
> org.apache.spark.sql.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:91)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$.readSchemaFromFooter(ParquetRelation.scala:734)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:714)
>   at 
> org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:713)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at