Hi Michael,
I think I have found the exact problem in my case. I see that we have
written something like following in Analyzer.scala :-
// TODO: pass this in as a parameter.
val fixedPoint = FixedPoint(100)
and
Batch("Resolution", fixedPoint,
ResolveReferences ::
ResolveRelations ::
ResolveSortReferences ::
NewRelationInstances ::
ImplicitGenerate ::
StarExpansion ::
ResolveFunctions ::
GlobalAggregates ::
UnresolvedHavingClauseAttributes ::
TrimGroupingAliases ::
typeCoercionRules ++
extendedRules : _*),
Perhaps in my case, it reaches the 100 iterations and break out of while
loop in RuleExecutor.scala and thus, doesn't "resolve" all the attributes.
Exception in my logs :-
14/12/10 04:45:28 INFO HiveContext$$anon$4: Max iterations (100) reached
for batch Resolution
14/12/10 04:45:28 ERROR [Sql]: Servlet.service() for servlet [Sql] in
context with path [] threw exception [Servlet execution threw an exception]
with root cause
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Unresolved
attributes: 'T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS
DOWN_BYTESHTTPSUBCR#6567, tree:
'Project ['T1.SP AS SP#6566,'T1.DOWN_BYTESHTTPSUBCR AS
DOWN_BYTESHTTPSUBCR#6567]
...
...
...
at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:80)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$$anonfun$1.applyOrElse(Analyzer.scala:78)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:144)
at
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:135)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:78)
at
org.apache.spark.sql.catalyst.analysis.Analyzer$CheckResolution$.apply(Analyzer.scala:76)
at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:61)
at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1$$anonfun$apply$2.apply(RuleExecutor.scala:59)
at
scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:51)
at
scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:60)
at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:34)
at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:59)
at
org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$apply$1.apply(RuleExecutor.scala:51)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.sql.catalyst.rules.RuleExecutor.apply(RuleExecutor.scala:51)
at
org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
at
org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
at
org.apache.spark.sql.CacheManager$$anonfun$cacheQuery$1.apply(CacheManager.scala:86)
at org.apache.spark.sql.CacheManager$class.writeLock(CacheManager.scala:67)
at org.apache.spark.sql.CacheManager$class.cacheQuery(CacheManager.scala:85)
at org.apache.spark.sql.SQLContext.cacheQuery(SQLContext.scala:50)
at org.apache.spark.sql.SchemaRDD.cache(SchemaRDD.scala:490)
I think the solution here is to have the FixedPoint constructor argument as
configurable/parameterized (also written as TODO). Do we have a plan to do
this in 1.2 release? Or I can take this up as a task for myself if you want
(since this is very crucial for our release).
Thanks
-Nitin
On Wed, Dec 10, 2014 at 1:06 AM, Michael Armbrust <[email protected]>
wrote:
> val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD,
>> existingSchemaRDD.schema)
>>
>
> This line is throwing away the logical information about existingSchemaRDD
> and thus Spark SQL can't know how to push down projections or predicates
> past this operator.
>
> Can you describe more the problems that you see if you don't do this
> reapplication of the schema.
>
--
Regards
Nitin Goyal