[jira] [Commented] (SPARK-10113) Support for unsigned Parquet logical types
[ https://issues.apache.org/jira/browse/SPARK-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000203#comment-15000203 ] Cheng Lian commented on SPARK-10113: I think emitting a clear error message is more reasonable since Spark SQL only handles signed integral types and can't hold the whole value ranges of those unsigned integral types. > Support for unsigned Parquet logical types > -- > > Key: SPARK-10113 > URL: https://issues.apache.org/jira/browse/SPARK-10113 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Jordan Thomas > > Add support for unsigned Parquet logical types UINT_16, UINT_32 and UINT_64. > {code} > org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (UINT_64); > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.illegalType$1(CatalystSchemaConverter.scala:130) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertPrimitiveField(CatalystSchemaConverter.scala:169) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:115) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertGroupField(CatalystSchemaConverter.scala:200) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:116) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:91) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.parquet.ParquetRelation$.readSchemaFromFooter(ParquetRelation.scala:734) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:714) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:713) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at
[jira] [Commented] (SPARK-10113) Support for unsigned Parquet logical types
[ https://issues.apache.org/jira/browse/SPARK-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001435#comment-15001435 ] Apache Spark commented on SPARK-10113: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/9646 > Support for unsigned Parquet logical types > -- > > Key: SPARK-10113 > URL: https://issues.apache.org/jira/browse/SPARK-10113 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Jordan Thomas > > Add support for unsigned Parquet logical types UINT_16, UINT_32 and UINT_64. > {code} > org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (UINT_64); > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.illegalType$1(CatalystSchemaConverter.scala:130) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertPrimitiveField(CatalystSchemaConverter.scala:169) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:115) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertGroupField(CatalystSchemaConverter.scala:200) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:116) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:91) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.parquet.ParquetRelation$.readSchemaFromFooter(ParquetRelation.scala:734) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:714) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:713) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at
[jira] [Commented] (SPARK-10113) Support for unsigned Parquet logical types
[ https://issues.apache.org/jira/browse/SPARK-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14995992#comment-14995992 ] Hyukjin Kwon commented on SPARK-10113: -- I think I can work on this but we need to make several things clear as far as I understood correctly. As far as I know, Spark does not support unsigned types directly and therefore it emitted the exception. So to deal with this, 1. we might need to convert {{UINT16}} to {{IntegerType}}, {{UINT32}} to {{LongType}} and {{UINT96}} to {{Decimal}}. 2. or emits with the message such as {{"Unsupported data type $field.dataType"}}. > Support for unsigned Parquet logical types > -- > > Key: SPARK-10113 > URL: https://issues.apache.org/jira/browse/SPARK-10113 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Jordan Thomas > > Add support for unsigned Parquet logical types UINT_16, UINT_32 and UINT_64. > {code} > org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (UINT_64); > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.illegalType$1(CatalystSchemaConverter.scala:130) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertPrimitiveField(CatalystSchemaConverter.scala:169) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:115) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertGroupField(CatalystSchemaConverter.scala:200) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:116) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:91) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.parquet.ParquetRelation$.readSchemaFromFooter(ParquetRelation.scala:734) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:714) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:713) > at >
[jira] [Commented] (SPARK-10113) Support for unsigned Parquet logical types
[ https://issues.apache.org/jira/browse/SPARK-10113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14996026#comment-14996026 ] Hyukjin Kwon commented on SPARK-10113: -- [~lian cheng] Just to double check.., Could I ask if this is intentionally omitted? > Support for unsigned Parquet logical types > -- > > Key: SPARK-10113 > URL: https://issues.apache.org/jira/browse/SPARK-10113 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Jordan Thomas > > Add support for unsigned Parquet logical types UINT_16, UINT_32 and UINT_64. > {code} > org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (UINT_64); > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.illegalType$1(CatalystSchemaConverter.scala:130) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertPrimitiveField(CatalystSchemaConverter.scala:169) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:115) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$convertGroupField$1.apply(CatalystSchemaConverter.scala:200) > at scala.Option.fold(Option.scala:158) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertGroupField(CatalystSchemaConverter.scala:200) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convertField(CatalystSchemaConverter.scala:116) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:97) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter$$anonfun$2.apply(CatalystSchemaConverter.scala:94) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) > at scala.collection.AbstractIterable.foreach(Iterable.scala:54) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.org$apache$spark$sql$parquet$CatalystSchemaConverter$$convert(CatalystSchemaConverter.scala:94) > at > org.apache.spark.sql.parquet.CatalystSchemaConverter.convert(CatalystSchemaConverter.scala:91) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$readSchemaFromFooter$2.apply(ParquetRelation.scala:734) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.parquet.ParquetRelation$.readSchemaFromFooter(ParquetRelation.scala:734) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:714) > at > org.apache.spark.sql.parquet.ParquetRelation$$anonfun$28$$anonfun$apply$8.apply(ParquetRelation.scala:713) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.Iterator$class.foreach(Iterator.scala:742) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1194) > at