[jira] [Commented] (SPARK-21380) Join with Columns thinks inner join is cross join even when aliased

2023-05-31 Thread shufan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17728184#comment-17728184
 ] 

shufan commented on SPARK-21380:


[~dongjoon]
Another situation:

SELECT
*
FROM
(
SELECT
age,
'bob' AS NAME
FROM
person
) p
LEFT JOIN temp_person t_p ON t_p.NAME = p.NAME;

The JoinSelection will choose BroadcastNestedLoopJoinExec ,which may lead to 
OOM. In fact, it did cause OOM

When adding parameters: 
   set 
spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.FoldablePropagation
The JoinSelection will choose SortMergeJoin,and will not result in an OOM.But I 
shouldn't have done it in practice.

Do you have a better suggestion?

> Join with Columns thinks inner join is cross join even when aliased
> ---
>
> Key: SPARK-21380
> URL: https://issues.apache.org/jira/browse/SPARK-21380
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, SQL
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Everett Toews
>Priority: Major
>  Labels: correctness
>
> While this seemed to work in Spark 2.0.2, it fails in 2.1.0 and 2.1.1.
> Even after aliasing both the table names and all the columns, joining 
> Datasets using a criteria assembled from Columns rather than the with the 
> join( usingColumns) method variants errors complaining that a join is a 
> cross join / cartesian product even when it isn't.
> Example:
> {noformat}
> Dataset left = spark.sql("select 'bob' as name, 23 as age");
> left = left
> .alias("l")
> .select(
> left.col("name").as("l_name"),
> left.col("age").as("l_age"));
> Dataset right = spark.sql("select 'bob' as name, 'bobco' as 
> company");
> right = right
> .alias("r")
> .select(
> right.col("name").as("r_name"),
> right.col("company").as("r_age"));
> Dataset result = left.join(
> right,
> left.col("l_name").equalTo(right.col("r_name")),
> "inner");
> result.show();
> {noformat}
> Results in
> {noformat}
> org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER 
> join between logical plans
> Project [bob AS l_name#22, 23 AS l_age#23]
> +- OneRowRelation$
> and
> Project [bob AS r_name#33, bobco AS r_age#34]
> +- OneRowRelation$
> Join condition is missing or trivial.
> Use the CROSS JOIN syntax to allow cartesian products between these 
> relations.;
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1049)
>   at 
> 

[jira] [Commented] (SPARK-21380) Join with Columns thinks inner join is cross join even when aliased

2017-07-12 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16084274#comment-16084274
 ] 

Dongjoon Hyun commented on SPARK-21380:
---

I see. I agree your point about that warning is misleading here.

> Join with Columns thinks inner join is cross join even when aliased
> ---
>
> Key: SPARK-21380
> URL: https://issues.apache.org/jira/browse/SPARK-21380
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Everett Anderson
>  Labels: correctness
>
> While this seemed to work in Spark 2.0.2, it fails in 2.1.0 and 2.1.1.
> Even after aliasing both the table names and all the columns, joining 
> Datasets using a criteria assembled from Columns rather than the with the 
> join( usingColumns) method variants errors complaining that a join is a 
> cross join / cartesian product even when it isn't.
> Example:
> {noformat}
> Dataset left = spark.sql("select 'bob' as name, 23 as age");
> left = left
> .alias("l")
> .select(
> left.col("name").as("l_name"),
> left.col("age").as("l_age"));
> Dataset right = spark.sql("select 'bob' as name, 'bobco' as 
> company");
> right = right
> .alias("r")
> .select(
> right.col("name").as("r_name"),
> right.col("company").as("r_age"));
> Dataset result = left.join(
> right,
> left.col("l_name").equalTo(right.col("r_name")),
> "inner");
> result.show();
> {noformat}
> Results in
> {noformat}
> org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER 
> join between logical plans
> Project [bob AS l_name#22, 23 AS l_age#23]
> +- OneRowRelation$
> and
> Project [bob AS r_name#33, bobco AS r_age#34]
> +- OneRowRelation$
> Join condition is missing or trivial.
> Use the CROSS JOIN syntax to allow cartesian products between these 
> relations.;
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1049)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>   at 
> 

[jira] [Commented] (SPARK-21380) Join with Columns thinks inner join is cross join even when aliased

2017-07-12 Thread Everett Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083429#comment-16083429
 ] 

Everett Anderson commented on SPARK-21380:
--

[~dongjoon] Hey -- I don't totally follow. It sounds like you're saying that 
it's correct for a 2 single row tables to fail due to the join being considered 
a Cartesian product. What if you happened to only have 1 rows in each table? It 
seems unfortunate to error out.

> Join with Columns thinks inner join is cross join even when aliased
> ---
>
> Key: SPARK-21380
> URL: https://issues.apache.org/jira/browse/SPARK-21380
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Everett Anderson
>  Labels: correctness
>
> While this seemed to work in Spark 2.0.2, it fails in 2.1.0 and 2.1.1.
> Even after aliasing both the table names and all the columns, joining 
> Datasets using a criteria assembled from Columns rather than the with the 
> join( usingColumns) method variants errors complaining that a join is a 
> cross join / cartesian product even when it isn't.
> Example:
> {noformat}
> Dataset left = spark.sql("select 'bob' as name, 23 as age");
> left = left
> .alias("l")
> .select(
> left.col("name").as("l_name"),
> left.col("age").as("l_age"));
> Dataset right = spark.sql("select 'bob' as name, 'bobco' as 
> company");
> right = right
> .alias("r")
> .select(
> right.col("name").as("r_name"),
> right.col("company").as("r_age"));
> Dataset result = left.join(
> right,
> left.col("l_name").equalTo(right.col("r_name")),
> "inner");
> result.show();
> {noformat}
> Results in
> {noformat}
> org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER 
> join between logical plans
> Project [bob AS l_name#22, 23 AS l_age#23]
> +- OneRowRelation$
> and
> Project [bob AS r_name#33, bobco AS r_age#34]
> +- OneRowRelation$
> Join condition is missing or trivial.
> Use the CROSS JOIN syntax to allow cartesian products between these 
> relations.;
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1049)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at 

[jira] [Commented] (SPARK-21380) Join with Columns thinks inner join is cross join even when aliased

2017-07-11 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083434#comment-16083434
 ] 

Dongjoon Hyun commented on SPARK-21380:
---

One row in real normal table is okay.
Your example is a constant. So, `FoldablePropagation` and `ConstantFolding` is 
applied. See the optimized result.

> Join with Columns thinks inner join is cross join even when aliased
> ---
>
> Key: SPARK-21380
> URL: https://issues.apache.org/jira/browse/SPARK-21380
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Everett Anderson
>  Labels: correctness
>
> While this seemed to work in Spark 2.0.2, it fails in 2.1.0 and 2.1.1.
> Even after aliasing both the table names and all the columns, joining 
> Datasets using a criteria assembled from Columns rather than the with the 
> join( usingColumns) method variants errors complaining that a join is a 
> cross join / cartesian product even when it isn't.
> Example:
> {noformat}
> Dataset left = spark.sql("select 'bob' as name, 23 as age");
> left = left
> .alias("l")
> .select(
> left.col("name").as("l_name"),
> left.col("age").as("l_age"));
> Dataset right = spark.sql("select 'bob' as name, 'bobco' as 
> company");
> right = right
> .alias("r")
> .select(
> right.col("name").as("r_name"),
> right.col("company").as("r_age"));
> Dataset result = left.join(
> right,
> left.col("l_name").equalTo(right.col("r_name")),
> "inner");
> result.show();
> {noformat}
> Results in
> {noformat}
> org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER 
> join between logical plans
> Project [bob AS l_name#22, 23 AS l_age#23]
> +- OneRowRelation$
> and
> Project [bob AS r_name#33, bobco AS r_age#34]
> +- OneRowRelation$
> Join condition is missing or trivial.
> Use the CROSS JOIN syntax to allow cartesian products between these 
> relations.;
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1049)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
>   at 
> 

[jira] [Commented] (SPARK-21380) Join with Columns thinks inner join is cross join even when aliased

2017-07-11 Thread Everett Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083439#comment-16083439
 ] 

Everett Anderson commented on SPARK-21380:
--

Ah, I see. Okay, that makes sense. Thanks for the explanation!

I sure wish we didn't have so many quirky 'This looks like a Cartesian product 
join' cases in Spark, though!

> Join with Columns thinks inner join is cross join even when aliased
> ---
>
> Key: SPARK-21380
> URL: https://issues.apache.org/jira/browse/SPARK-21380
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Everett Anderson
>  Labels: correctness
>
> While this seemed to work in Spark 2.0.2, it fails in 2.1.0 and 2.1.1.
> Even after aliasing both the table names and all the columns, joining 
> Datasets using a criteria assembled from Columns rather than the with the 
> join( usingColumns) method variants errors complaining that a join is a 
> cross join / cartesian product even when it isn't.
> Example:
> {noformat}
> Dataset left = spark.sql("select 'bob' as name, 23 as age");
> left = left
> .alias("l")
> .select(
> left.col("name").as("l_name"),
> left.col("age").as("l_age"));
> Dataset right = spark.sql("select 'bob' as name, 'bobco' as 
> company");
> right = right
> .alias("r")
> .select(
> right.col("name").as("r_name"),
> right.col("company").as("r_age"));
> Dataset result = left.join(
> right,
> left.col("l_name").equalTo(right.col("r_name")),
> "inner");
> result.show();
> {noformat}
> Results in
> {noformat}
> org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER 
> join between logical plans
> Project [bob AS l_name#22, 23 AS l_age#23]
> +- OneRowRelation$
> and
> Project [bob AS r_name#33, bobco AS r_age#34]
> +- OneRowRelation$
> Join condition is missing or trivial.
> Use the CROSS JOIN syntax to allow cartesian products between these 
> relations.;
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1049)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
>   at 
> 

[jira] [Commented] (SPARK-21380) Join with Columns thinks inner join is cross join even when aliased

2017-07-11 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083272#comment-16083272
 ] 

Dongjoon Hyun commented on SPARK-21380:
---

Your case are too simple, so it's optimized. The following is a normal case 
what you mentioned.
{code}
scala> val l = spark.sql("select name, age from values ('bob', 1), ('sam', 2) 
T(name,age)")
scala> val r = spark.sql("select name, company from values ('bob', 'bobco'), 
('larry', 'larryco') T(name,company)")
scala> val left = l.alias("l").select(l.col("name").as("l_name"), 
l.col("age").as("l_age"))
scala> val right = r.alias("r").select(r.col("name").as("r_name"), 
r.col("company").as("r_age"))
scala> l.show()
++---+
|name|age|
++---+
| bob|  1|
| sam|  2|
++---+

scala> r.show()
+-+---+
| name|company|
+-+---+
|  bob|  bobco|
|larry|larryco|
+-+---+

scala> left.join(right, left.col("l_name").equalTo(right.col("r_name")), 
"inner").show
+--+-+--+-+
|l_name|l_age|r_name|r_age|
+--+-+--+-+
|   bob|1|   bob|bobco|
+--+-+--+-+
{code}

> Join with Columns thinks inner join is cross join even when aliased
> ---
>
> Key: SPARK-21380
> URL: https://issues.apache.org/jira/browse/SPARK-21380
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Everett Anderson
>  Labels: correctness
>
> While this seemed to work in Spark 2.0.2, it fails in 2.1.0 and 2.1.1.
> Even after aliasing both the table names and all the columns, joining 
> Datasets using a criteria assembled from Columns rather than the with the 
> join( usingColumns) method variants errors complaining that a join is a 
> cross join / cartesian product even when it isn't.
> Example:
> {noformat}
> Dataset left = spark.sql("select 'bob' as name, 23 as age");
> left = left
> .alias("l")
> .select(
> left.col("name").as("l_name"),
> left.col("age").as("l_age"));
> Dataset right = spark.sql("select 'bob' as name, 'bobco' as 
> company");
> right = right
> .alias("r")
> .select(
> right.col("name").as("r_name"),
> right.col("company").as("r_age"));
> Dataset result = left.join(
> right,
> left.col("l_name").equalTo(right.col("r_name")),
> "inner");
> result.show();
> {noformat}
> Results in
> {noformat}
> org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER 
> join between logical plans
> Project [bob AS l_name#22, 23 AS l_age#23]
> +- OneRowRelation$
> and
> Project [bob AS r_name#33, bobco AS r_age#34]
> +- OneRowRelation$
> Join condition is missing or trivial.
> Use the CROSS JOIN syntax to allow cartesian products between these 
> relations.;
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
>   at 
> 

[jira] [Commented] (SPARK-21380) Join with Columns thinks inner join is cross join even when aliased

2017-07-11 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083266#comment-16083266
 ] 

Dongjoon Hyun commented on SPARK-21380:
---

Hi, [~everett].
It's the correct result of optimization. Please see the following.
{code}
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.FoldablePropagation 
===
!Join Inner, (l_name#11 = r_name#17)Join Inner, (bob = bob)
 :- Project [bob AS l_name#11, 23 AS l_age#12]  :- Project [bob AS 
l_name#11, 23 AS l_age#12]
 :  +- OneRowRelation$  :  +- OneRowRelation$
 +- Project [bob AS r_name#17, bobco AS r_age#18]   +- Project [bob AS 
r_name#17, bobco AS r_age#18]
+- OneRowRelation$ +- OneRowRelation$

=== Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantFolding ===
!Join Inner, (bob = bob)Join Inner, true
 :- Project [bob AS l_name#11, 23 AS l_age#12]  :- Project [bob AS 
l_name#11, 23 AS l_age#12]
 :  +- OneRowRelation$  :  +- OneRowRelation$
 +- Project [bob AS r_name#17, bobco AS r_age#18]   +- Project [bob AS 
r_name#17, bobco AS r_age#18]
+- OneRowRelation$ +- OneRowRelation$
{code}

> Join with Columns thinks inner join is cross join even when aliased
> ---
>
> Key: SPARK-21380
> URL: https://issues.apache.org/jira/browse/SPARK-21380
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Everett Anderson
>  Labels: correctness
>
> While this seemed to work in Spark 2.0.2, it fails in 2.1.0 and 2.1.1.
> Even after aliasing both the table names and all the columns, joining 
> Datasets using a criteria assembled from Columns rather than the with the 
> join( usingColumns) method variants errors complaining that a join is a 
> cross join / cartesian product even when it isn't.
> Example:
> {noformat}
> Dataset left = spark.sql("select 'bob' as name, 23 as age");
> left = left
> .alias("l")
> .select(
> left.col("name").as("l_name"),
> left.col("age").as("l_age"));
> Dataset right = spark.sql("select 'bob' as name, 'bobco' as 
> company");
> right = right
> .alias("r")
> .select(
> right.col("name").as("r_name"),
> right.col("company").as("r_age"));
> Dataset result = left.join(
> right,
> left.col("l_name").equalTo(right.col("r_name")),
> "inner");
> result.show();
> {noformat}
> Results in
> {noformat}
> org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER 
> join between logical plans
> Project [bob AS l_name#22, 23 AS l_age#23]
> +- OneRowRelation$
> and
> Project [bob AS r_name#33, bobco AS r_age#34]
> +- OneRowRelation$
> Join condition is missing or trivial.
> Use the CROSS JOIN syntax to allow cartesian products between these 
> relations.;
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> 

[jira] [Commented] (SPARK-21380) Join with Columns thinks inner join is cross join even when aliased

2017-07-11 Thread Everett Anderson (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083233#comment-16083233
 ] 

Everett Anderson commented on SPARK-21380:
--

[~dongjoon] Sure thing! I'll update this when I've tried it.

> Join with Columns thinks inner join is cross join even when aliased
> ---
>
> Key: SPARK-21380
> URL: https://issues.apache.org/jira/browse/SPARK-21380
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Everett Anderson
>  Labels: correctness
>
> While this seemed to work in Spark 2.0.2, it fails in 2.1.0 and 2.1.1.
> Even after aliasing both the table names and all the columns, joining 
> Datasets using a criteria assembled from Columns rather than the with the 
> join( usingColumns) method variants errors complaining that a join is a 
> cross join / cartesian product even when it isn't.
> Example:
> {noformat}
> Dataset left = spark.sql("select 'bob' as name, 23 as age");
> left = left
> .alias("l")
> .select(
> left.col("name").as("l_name"),
> left.col("age").as("l_age"));
> Dataset right = spark.sql("select 'bob' as name, 'bobco' as 
> company");
> right = right
> .alias("r")
> .select(
> right.col("name").as("r_name"),
> right.col("company").as("r_age"));
> Dataset result = left.join(
> right,
> left.col("l_name").equalTo(right.col("r_name")),
> "inner");
> result.show();
> {noformat}
> Results in
> {noformat}
> org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER 
> join between logical plans
> Project [bob AS l_name#22, 23 AS l_age#23]
> +- OneRowRelation$
> and
> Project [bob AS r_name#33, bobco AS r_age#34]
> +- OneRowRelation$
> Join condition is missing or trivial.
> Use the CROSS JOIN syntax to allow cartesian products between these 
> relations.;
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1049)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>   at 
> 

[jira] [Commented] (SPARK-21380) Join with Columns thinks inner join is cross join even when aliased

2017-07-11 Thread Dongjoon Hyun (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16083230#comment-16083230
 ] 

Dongjoon Hyun commented on SPARK-21380:
---

Hi, [~everett].
Thank you for reporting. Today, Apache Spark 2.2.0 is released. Could you check 
that on 2.2?

> Join with Columns thinks inner join is cross join even when aliased
> ---
>
> Key: SPARK-21380
> URL: https://issues.apache.org/jira/browse/SPARK-21380
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.1.0, 2.1.1
>Reporter: Everett Anderson
>  Labels: correctness
>
> While this seemed to work in Spark 2.0.2, it fails in 2.1.0 and 2.1.1.
> Even after aliasing both the table names and all the columns, joining 
> Datasets using a criteria assembled from Columns rather than the with the 
> join( usingColumns) method variants errors complaining that a join is a 
> cross join / cartesian product even when it isn't.
> Example:
> {noformat}
> Dataset left = spark.sql("select 'bob' as name, 23 as age");
> left = left
> .alias("l")
> .select(
> left.col("name").as("l_name"),
> left.col("age").as("l_age"));
> Dataset right = spark.sql("select 'bob' as name, 'bobco' as 
> company");
> right = right
> .alias("r")
> .select(
> right.col("name").as("r_name"),
> right.col("company").as("r_age"));
> Dataset result = left.join(
> right,
> left.col("l_name").equalTo(right.col("r_name")),
> "inner");
> result.show();
> {noformat}
> Results in
> {noformat}
> org.apache.spark.sql.AnalysisException: Detected cartesian product for INNER 
> join between logical plans
> Project [bob AS l_name#22, 23 AS l_age#23]
> +- OneRowRelation$
> and
> Project [bob AS r_name#33, bobco AS r_age#34]
> +- OneRowRelation$
> Join condition is missing or trivial.
> Use the CROSS JOIN syntax to allow cartesian products between these 
> relations.;
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1067)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts$$anonfun$apply$21.applyOrElse(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:268)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:307)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:188)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:305)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:273)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:257)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1064)
>   at 
> org.apache.spark.sql.catalyst.optimizer.CheckCartesianProducts.apply(Optimizer.scala:1049)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:85)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1$$anonfun$apply$1.apply(RuleExecutor.scala:82)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldl(IndexedSeqOptimized.scala:57)
>   at 
> scala.collection.IndexedSeqOptimized$class.foldLeft(IndexedSeqOptimized.scala:66)
>   at scala.collection.mutable.WrappedArray.foldLeft(WrappedArray.scala:35)
>   at 
> org.apache.spark.sql.catalyst.rules.RuleExecutor$$anonfun$execute$1.apply(RuleExecutor.scala:82)
>   at 
>