Yes, I just tested it against the nightly build from 8/31. Looking at the PR, I'm happy the test added verifies my issue.
Thanks. -Don On Wed, Aug 31, 2016 at 6:49 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > Hi Don, I guess this should be fixed from 2.0.1. > > Please refer this PR. https://github.com/apache/spark/pull/14339 > > On 1 Sep 2016 2:48 a.m., "Don Drake" <dondr...@gmail.com> wrote: > >> I am in the process of migrating a set of Spark 1.6.2 ETL jobs to Spark >> 2.0 and have encountered some interesting issues. >> >> First, it seems the SQL parsing is different, and I had to rewrite some >> SQL that was doing a mix of inner joins (using where syntax, not inner) and >> outer joins to get the SQL to work. It was complaining about columns not >> existing. I can't reproduce that one easily and can't share the SQL. Just >> curious if anyone else is seeing this? >> >> I do have a showstopper problem with Parquet dataset that have fields >> containing a "." in the field name. This data comes from an external >> provider (CSV) and we just pass through the field names. This has worked >> flawlessly in Spark 1.5 and 1.6, but now spark can't seem to read these >> parquet files. >> >> I've reproduced a trivial example below. Jira created: >> https://issues.apache.org/jira/browse/SPARK-17341 >> >> >> Spark context available as 'sc' (master = local[*], app id = >> local-1472664486578). >> Spark session available as 'spark'. >> Welcome to >> ____ __ >> / __/__ ___ _____/ /__ >> _\ \/ _ \/ _ `/ __/ '_/ >> /___/ .__/\_,_/_/ /_/\_\ version 2.0.0 >> /_/ >> >> Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java >> 1.7.0_51) >> Type in expressions to have them evaluated. >> Type :help for more information. >> >> scala> val squaresDF = spark.sparkContext.makeRDD(1 to 5).map(i => (i, i >> * i)).toDF("value", "squared.value") >> 16/08/31 12:28:44 WARN ObjectStore: Version information not found in >> metastore. hive.metastore.schema.verification is not enabled so >> recording the schema version 1.2.0 >> 16/08/31 12:28:44 WARN ObjectStore: Failed to get database default, >> returning NoSuchObjectException >> squaresDF: org.apache.spark.sql.DataFrame = [value: int, squared.value: >> int] >> >> scala> squaresDF.take(2) >> res0: Array[org.apache.spark.sql.Row] = Array([1,1], [2,4]) >> >> scala> squaresDF.write.parquet("squares") >> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". >> SLF4J: Defaulting to no-operation (NOP) logger implementation >> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for >> further details. >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: >> Compression: SNAPPY >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: >> Compression: SNAPPY >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: >> Compression: SNAPPY >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: >> Compression: SNAPPY >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: >> Compression: SNAPPY >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: >> Compression: SNAPPY >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: >> Compression: SNAPPY >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.codec.CodecConfig: >> Compression: SNAPPY >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet block size to 134217728 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet block size to 134217728 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet block size to 134217728 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet block size to 134217728 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet block size to 134217728 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet block size to 134217728 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet block size to 134217728 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet block size to 134217728 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet dictionary page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet dictionary page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Dictionary is on >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet dictionary page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet dictionary page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet dictionary page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet dictionary page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Dictionary is on >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Validation is off >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Dictionary is on >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Validation is off >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet dictionary page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Dictionary is on >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Writer version is: PARQUET_1_0 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Writer version is: PARQUET_1_0 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Validation is off >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Writer version is: PARQUET_1_0 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Dictionary is on >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Dictionary is on >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Validation is off >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Dictionary is on >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Validation is off >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Writer version is: PARQUET_1_0 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Writer version is: PARQUET_1_0 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Validation is off >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Validation is off >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Parquet dictionary page size to 1048576 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Writer version is: PARQUET_1_0 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Writer version is: PARQUET_1_0 >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Dictionary is on >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Validation is off >> Aug 31, 2016 12:29:08 PM INFO: org.apache.parquet.hadoop.ParquetOutputFormat: >> Writer version is: PARQUET_1_0 >> Aug 31, 2016 12:29:08 PM WARNING: org.apache.parquet.hadoop.MemoryManager: >> Total allocation exceeds 95.00% (906,992,000 bytes) of heap memory >> Scaling row group sizes to 96.54% for 7 writers >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.InternalParquetRecordWriter: >> Flushing mem columnStore to file. allocated memory: 0 >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.InternalParquetRecordWriter: >> Flushing mem columnStore to file. allocated memory: 0 >> Aug 31, 2016 12:29:08 PM WARNING: org.apache.parquet.hadoop.MemoryManager: >> Total allocation exceeds 95.00% (906,992,000 bytes) of heap memory >> Scaling row group sizes to 84.47% for 8 writers >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.InternalParquetRecordWriter: >> Flushing mem columnStore to file. allocated memory: 0 >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.InternalParquetRecordWriter: >> Flushing mem columnStore to file. allocated memory: 16 >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.InternalParquetRecordWriter: >> Flushing mem columnStore to file. allocated memory: 16 >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.InternalParquetRecordWriter: >> Flushing mem columnStore to file. allocated memory: 16 >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.InternalParquetRecordWriter: >> Flushing mem columnStore to file. allocated memory: 16 >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.InternalParquetRecordWriter: >> Flushing mem columnStore to file. allocated memory: 16 >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: >> written 39B for [value] INT32: 1 values, 4B raw, 6B comp, 1 pages, >> encodings: [BIT_PACKED, PLAIN] >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: >> written 39B for [value] INT32: 1 values, 4B raw, 6B comp, 1 pages, >> encodings: [BIT_PACKED, PLAIN] >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: >> written 39B for [value] INT32: 1 values, 4B raw, 6B comp, 1 pages, >> encodings: [BIT_PACKED, PLAIN] >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: >> written 39B for [value] INT32: 1 values, 4B raw, 6B comp, 1 pages, >> encodings: [BIT_PACKED, PLAIN] >> Aug 31, 2016 12:29:08 PM INFO: >> org.apache.parquet.hadoop.ColumnChunkPageWriteStore: >> written 39B for [value] INT >> scala> val inSquare = spark.read.parquet("squares") >> inSquare: org.apache.spark.sql.DataFrame = [value: int, squared.value: >> int] >> >> scala> inSquare.take(2) >> org.apache.spark.sql.AnalysisException: Unable to resolve squared.value >> given [value, squared.value]; >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$ano >> nfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$ano >> nfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134) >> at scala.Option.getOrElse(Option.scala:121) >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$ano >> nfun$resolve$1.apply(LogicalPlan.scala:133) >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$ano >> nfun$resolve$1.apply(LogicalPlan.scala:129) >> at scala.collection.TraversableLike$$anonfun$map$1.apply( >> TraversableLike.scala:234) >> at scala.collection.TraversableLike$$anonfun$map$1.apply( >> TraversableLike.scala:234) >> at scala.collection.Iterator$class.foreach(Iterator.scala:893) >> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) >> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) >> at org.apache.spark.sql.types.StructType.foreach(StructType.scala:95) >> at scala.collection.TraversableLike$class.map(TraversableLike. >> scala:234) >> at org.apache.spark.sql.types.StructType.map(StructType.scala:95) >> at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.reso >> lve(LogicalPlan.scala:129) >> at org.apache.spark.sql.execution.datasources.FileSourceStrateg >> y$.apply(FileSourceStrategy.scala:87) >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> $1.apply(QueryPlanner.scala:60) >> at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun >> $1.apply(QueryPlanner.scala:60) >> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) >> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) >> at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(Que >> ryPlanner.scala:61) >> at org.apache.spark.sql.execution.SparkPlanner.plan(SparkPlanne >> r.scala:47) >> at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$ >> $anonfun$apply$1.applyOrElse(SparkPlanner.scala:51) >> at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1$ >> $anonfun$apply$1.applyOrElse(SparkPlanner.scala:48) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transf >> ormUp$1.apply(TreeNode.scala:301) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transf >> ormUp$1.apply(TreeNode.scala:301) >> at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigi >> n(TreeNode.scala:69) >> at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(Tre >> eNode.scala:300) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.appl >> y(TreeNode.scala:298) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.appl >> y(TreeNode.scala:298) >> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.appl >> y(TreeNode.scala:321) >> at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductItera >> tor(TreeNode.scala:179) >> at org.apache.spark.sql.catalyst.trees.TreeNode.transformChildr >> en(TreeNode.scala:319) >> at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(Tre >> eNode.scala:298) >> at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1. >> apply(SparkPlanner.scala:48) >> at org.apache.spark.sql.execution.SparkPlanner$$anonfun$plan$1. >> apply(SparkPlanner.scala:48) >> at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) >> at org.apache.spark.sql.execution.QueryExecution.sparkPlan$ >> lzycompute(QueryExecution.scala:78) >> at org.apache.spark.sql.execution.QueryExecution.sparkPlan( >> QueryExecution.scala:76) >> at org.apache.spark.sql.execution.QueryExecution.executedPlan$ >> lzycompute(QueryExecution.scala:83) >> at org.apache.spark.sql.execution.QueryExecution.executedPlan( >> QueryExecution.scala:83) >> at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2558) >> at org.apache.spark.sql.Dataset.head(Dataset.scala:1924) >> at org.apache.spark.sql.Dataset.take(Dataset.scala:2139) >> ... 48 elided >> >> scala> >> >> -- >> Donald Drake >> Drake Consulting >> http://www.drakeconsulting.com/ >> https://twitter.com/dondrake <http://www.MailLaunder.com/> >> 800-733-2143 >> > -- Donald Drake Drake Consulting http://www.drakeconsulting.com/ https://twitter.com/dondrake <http://www.MailLaunder.com/> 800-733-2143