Spark SQL (version 1.1.0-SNAPSHOT) should allow SELECT with duplicated columns

Jianshi Huang Wed, 06 Aug 2014 03:47:28 -0700

Spark reported error java.lang.IllegalArgumentException with messages:

java.lang.IllegalArgumentException: requirement failed: Found fields with
the same name.
        at scala.Predef$.require(Predef.scala:233)
        at
org.apache.spark.sql.catalyst.types.StructType.<init>(dataTypes.scala:317)
        at
org.apache.spark.sql.catalyst.types.StructType$.fromAttributes(dataTypes.scala:310)
        at
org.apache.spark.sql.parquet.ParquetTypesConverter$.convertToString(ParquetTypes.scala:306)
        at
org.apache.spark.sql.parquet.ParquetTableScan.execute(ParquetTableOperations.scala:83)
        at
org.apache.spark.sql.execution.Filter.execute(basicOperators.scala:57)
        at
org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:85)
        at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:433)


After trial and error, it seems it's caused by duplicated columns in my
select clause.

I made the duplication on purpose for my code to parse correctly. I think
we should allow users to specify duplicated columns as return value.


-- 
Jianshi Huang

LinkedIn: jianshi
Twitter: @jshuang
Github & Blog: http://huangjs.github.com/

Spark SQL (version 1.1.0-SNAPSHOT) should allow SELECT with duplicated columns

Reply via email to