Re: PhysicalRDD problem?
I'm hesitant to merge that PR in as it is using a brand new configuration path that is different from the way that the rest of Spark SQL / Spark are configured. I'm suspicious that that hitting max iterations is emblematic of some other issue, as typically resolution happens bottom up, in a single pass. Can you provide more details about your schema / query? On Tue, Dec 9, 2014 at 11:43 PM, Nitin Goyal wrote: > I see that somebody had already raised a PR for this but it hasn't been > merged. > > https://issues.apache.org/jira/browse/SPARK-4339 > > Can we merge this in next 1.2 RC? > > Thanks > -Nitin > > > On Wed, Dec 10, 2014 at 11:50 AM, Nitin Goyal > wrote: > >> Hi Michael, >> >> I think I have found the exact problem in my case. I see that we have >> written something like following in Analyzer.scala :- >> >> // TODO: pass this in as a parameter. >> >> val fixedPoint = FixedPoint(100) >> >> >> and >> >> >> Batch("Resolution", fixedPoint, >> >> ResolveReferences :: >> >> ResolveRelations :: >> >> ResolveSortReferences :: >> >> NewRelationInstances :: >> >> ImplicitGenerate :: >> >> StarExpansion :: >> >> ResolveFunctions :: >> >> GlobalAggregates :: >> >> UnresolvedHavingClauseAttributes :: >> >> TrimGroupingAliases :: >> >> typeCoercionRules ++ >> >> extendedRules : _*), >> >> Perhaps in my case, it reaches the 100 iterations and break out of while >> loop in RuleExecutor.scala and thus, doesn't "resolve" all the attributes. >> >> Exception in my logs :- >> >> 14/12/10 04:45:28 INFO HiveContext$$anon$4: Max iterations (100) reached >> for batch Resolution >> >> 14/12/10 04:45:28 ERROR [Sql]: Servlet.service() for servlet [Sql] in >> context with path [] threw exception [Servlet execution threw an exception] >> with root cause >> >> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: >> Unresolved attributes: 'T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS >> DOWN_BYTESHTTPSUBCR#6567, tree: >> >> 'Project ['T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS >> DOWN_BYTESHTTPSUBCR#6567] >> >> ... >> >> ... >> >> ... >> >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:80) >> >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78) >> >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) >> >> at >> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) >> >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78) >> >> at >> org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76) >> >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) >> >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) >> >> at >> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) >> >> at >> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) >> >> at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) >> >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) >> >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) >> >> at scala.collection.immutable.List.foreach(List.scala:318) >> >> at >> org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) >> >> at >> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) >> >> at >> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) >> >> at >> org.apache.spark.sql.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:86) >> >> at >> org.apache.spark.sql.CacheManager$class.writeLock(CacheManager.scala:67) >> >> at >> org.apache.spark.sql.CacheManager$class.cacheQuery(CacheManager.scala:85) >> >> at org.apache.spark.sql.SQLContext.cacheQuery(SQLContext.scala:50) >> >> at org.apache.spark.sql.SchemaRDD.cache(SchemaRDD.scala:490) >> >> >> I think the solution here is to have the FixedPoint constructor argument >> as configurable/parameterized (also written as TODO). Do we have a plan to >> do this in 1.2 release? Or I can take this up as a task for myself if you >> want (since this is very crucial for our release). >> >> >> Thanks >> >> -Nitin >> >> On Wed, Dec 10, 2014 at 1:06 AM, Michael Armbrust > > wrote: >> >>> val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD, existingSchemaRDD.schema) >>> >>> This line is throwing away the logical information about >>> existingSchemaRDD and thus Spark SQL can't know how to push down >>> projections or predicates past this operator. >>> >>> Can you describe more the problems that you see if you don't do this >>> re
Re: PhysicalRDD problem?
I see that somebody had already raised a PR for this but it hasn't been merged. https://issues.apache.org/jira/browse/SPARK-4339 Can we merge this in next 1.2 RC? Thanks -Nitin On Wed, Dec 10, 2014 at 11:50 AM, Nitin Goyal wrote: > Hi Michael, > > I think I have found the exact problem in my case. I see that we have > written something like following in Analyzer.scala :- > > // TODO: pass this in as a parameter. > > val fixedPoint = FixedPoint(100) > > > and > > > Batch("Resolution", fixedPoint, > > ResolveReferences :: > > ResolveRelations :: > > ResolveSortReferences :: > > NewRelationInstances :: > > ImplicitGenerate :: > > StarExpansion :: > > ResolveFunctions :: > > GlobalAggregates :: > > UnresolvedHavingClauseAttributes :: > > TrimGroupingAliases :: > > typeCoercionRules ++ > > extendedRules : _*), > > Perhaps in my case, it reaches the 100 iterations and break out of while > loop in RuleExecutor.scala and thus, doesn't "resolve" all the attributes. > > Exception in my logs :- > > 14/12/10 04:45:28 INFO HiveContext$$anon$4: Max iterations (100) reached > for batch Resolution > > 14/12/10 04:45:28 ERROR [Sql]: Servlet.service() for servlet [Sql] in > context with path [] threw exception [Servlet execution threw an exception] > with root cause > > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved > attributes: 'T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS > DOWN_BYTESHTTPSUBCR#6567, tree: > > 'Project ['T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS > DOWN_BYTESHTTPSUBCR#6567] > > ... > > ... > > ... > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:80) > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78) > > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) > > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78) > > at > org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) > > at > scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) > > at > scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) > > at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) > > at scala.collection.immutable.List.foreach(List.scala:318) > > at > org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) > > at > org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) > > at > org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) > > at > org.apache.spark.sql.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:86) > > at org.apache.spark.sql.CacheManager$class.writeLock(CacheManager.scala:67) > > at > org.apache.spark.sql.CacheManager$class.cacheQuery(CacheManager.scala:85) > > at org.apache.spark.sql.SQLContext.cacheQuery(SQLContext.scala:50) > > at org.apache.spark.sql.SchemaRDD.cache(SchemaRDD.scala:490) > > > I think the solution here is to have the FixedPoint constructor argument > as configurable/parameterized (also written as TODO). Do we have a plan to > do this in 1.2 release? Or I can take this up as a task for myself if you > want (since this is very crucial for our release). > > > Thanks > > -Nitin > > On Wed, Dec 10, 2014 at 1:06 AM, Michael Armbrust > wrote: > >> val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD, >>> existingSchemaRDD.schema) >>> >> >> This line is throwing away the logical information about >> existingSchemaRDD and thus Spark SQL can't know how to push down >> projections or predicates past this operator. >> >> Can you describe more the problems that you see if you don't do this >> reapplication of the schema. >> > > > > -- > Regards > Nitin Goyal > -- Regards Nitin Goyal
Re: PhysicalRDD problem?
Hi Michael, I think I have found the exact problem in my case. I see that we have written something like following in Analyzer.scala :- // TODO: pass this in as a parameter. val fixedPoint = FixedPoint(100) and Batch("Resolution", fixedPoint, ResolveReferences :: ResolveRelations :: ResolveSortReferences :: NewRelationInstances :: ImplicitGenerate :: StarExpansion :: ResolveFunctions :: GlobalAggregates :: UnresolvedHavingClauseAttributes :: TrimGroupingAliases :: typeCoercionRules ++ extendedRules : _*), Perhaps in my case, it reaches the 100 iterations and break out of while loop in RuleExecutor.scala and thus, doesn't "resolve" all the attributes. Exception in my logs :- 14/12/10 04:45:28 INFO HiveContext$$anon$4: Max iterations (100) reached for batch Resolution 14/12/10 04:45:28 ERROR [Sql]: Servlet.service() for servlet [Sql] in context with path [] threw exception [Servlet execution threw an exception] with root cause org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved attributes: 'T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS DOWN_BYTESHTTPSUBCR#6567, tree: 'Project ['T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS DOWN_BYTESHTTPSUBCR#6567] ... ... ... at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:80) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78) at org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59) at scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51) at scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60) at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59) at org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411) at org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411) at org.apache.spark.sql.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:86) at org.apache.spark.sql.CacheManager$class.writeLock(CacheManager.scala:67) at org.apache.spark.sql.CacheManager$class.cacheQuery(CacheManager.scala:85) at org.apache.spark.sql.SQLContext.cacheQuery(SQLContext.scala:50) at org.apache.spark.sql.SchemaRDD.cache(SchemaRDD.scala:490) I think the solution here is to have the FixedPoint constructor argument as configurable/parameterized (also written as TODO). Do we have a plan to do this in 1.2 release? Or I can take this up as a task for myself if you want (since this is very crucial for our release). Thanks -Nitin On Wed, Dec 10, 2014 at 1:06 AM, Michael Armbrust wrote: > val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD, >> existingSchemaRDD.schema) >> > > This line is throwing away the logical information about existingSchemaRDD > and thus Spark SQL can't know how to push down projections or predicates > past this operator. > > Can you describe more the problems that you see if you don't do this > reapplication of the schema. > -- Regards Nitin Goyal
Re: PhysicalRDD problem?
> > val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD, > existingSchemaRDD.schema) > This line is throwing away the logical information about existingSchemaRDD and thus Spark SQL can't know how to push down projections or predicates past this operator. Can you describe more the problems that you see if you don't do this reapplication of the schema.