[jira] [Commented] (SPARK-41233) High-order function: array_prepend

2022-12-06 Thread Navin Viswanath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644070#comment-17644070
 ] 

Navin Viswanath commented on SPARK-41233:
-

PR : [https://github.com/apache/spark/pull/38947]

 

 

> High-order function: array_prepend
> --
>
> Key: SPARK-41233
> URL: https://issues.apache.org/jira/browse/SPARK-41233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41233) High-order function: array_prepend

2022-11-30 Thread Navin Viswanath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17641400#comment-17641400
 ] 

Navin Viswanath commented on SPARK-41233:
-

I can take up at this one if that's ok.

> High-order function: array_prepend
> --
>
> Key: SPARK-41233
> URL: https://issues.apache.org/jira/browse/SPARK-41233
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> refer to 
> https://docs.snowflake.com/en/developer-guide/snowpark/reference/python/api/snowflake.snowpark.functions.array_prepend.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31450) Make ExpressionEncoder thread safe

2020-11-16 Thread Navin Viswanath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17233054#comment-17233054
 ] 

Navin Viswanath commented on SPARK-31450:
-

[~hvanhovell]  [~dongjoon] I was in the process of migrating some code from 
Spark 2.4 to Spark 3 and noticed that this required a change in our code. We 
use the following process to go from a Thrift type T to InternalRow(reading 
thrift files on HDFS into a Dataframe):
 # We construct a Spark schema by inspecting the thrift metadata.
 # We convert a thrift object to a GenericRow using the thrift metadata to read 
columns.
 # We then construct an ExpressionEncoder[Row] and use it to create an 
InternalRow as follows:
{code:java}
val schema: StructType = ... // infer thrift schema
val encoder: ExpressionEncoder[Row] = RowEncoder(schema)
val genericRow: GenericRow = toGenericRow(thriftObject, schema)
val internalRow: InternalRow = encoder.toRow(genericRow)
{code}

The above steps are used to implement
{code:java}
protected def buildReader(
  sparkSession: SparkSession,
  dataSchema: StructType,
  partitionSchema: StructType,
  requiredSchema: StructType,
  filters: Seq[Filter],
  options: Map[String, String],
  hadoopConf: Configuration): PartitionedFile => Iterator[InternalRow]
{code}
in trait org.apache.spark.sql.execution.datasources.FileFormat where we need an 
Iterator[InternalRow].

With the change in this ticket, I would have to replace 
{code:java}
val internalRow: InternalRow = encoder.toRow(genericRow)  
{code}
with
{code:java}
val serializer = encoder.createSerializer()
val internalRow: InternalRow = serializer(genericRow){code}
Since this is marked as an internal API in the PR, I was wondering if there is 
a way to implement this so that it is compatible with both Spark 2.4 and Spark 
3. 

My goal is to not require a code change if possible. It seems to me that since 
I know the schema of the thrift type it should be possible to construct an 
InternalRow, but I don't see a way to do this in the code base.

> Make ExpressionEncoder thread safe
> --
>
> Key: SPARK-31450
> URL: https://issues.apache.org/jira/browse/SPARK-31450
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
> Fix For: 3.0.0
>
>
> ExpressionEncoder is currently not thread-safe because it contains stateful 
> objects that are required for converting objects to internal rows and vise 
> versa. We have been working around this by (excessively) cloning 
> ExpressionEncoders which is not free. I propose that we move the stateful 
> bits of the expression encoder into two helper classes that will take care of 
> the conversions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30876) Optimizer cannot infer from inferred constraints with join

2020-06-20 Thread Navin Viswanath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140898#comment-17140898
 ] 

Navin Viswanath edited comment on SPARK-30876 at 6/20/20, 6:09 AM:
---

[~yumwang] would this be in the logical plan optimization? I was looking into 
the logical plans and got this for the following query:

 
{noformat}
val x = testRelation.subquery('x)
val y = testRelation1.subquery('y)
val z = testRelation.subquery('z)
val query = Join(x,
  Join(y, z, Inner,
  Some(("x.a".attr === "y.b".attr) &&
("y.b".attr === "z.c".attr) &&
("z.c".attr === 1)), JoinHint.NONE),
  Inner, None, JoinHint.NONE){noformat}
 

Unoptimized:
{noformat}
'Join Inner
:- SubqueryAlias x
:  +- LocalRelation , [a#0, b#1, c#2]
+- 'Join Inner, ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
   :- SubqueryAlias y
   :  +- LocalRelation , [a#0, b#1, c#2]
   +- SubqueryAlias z
  +- LocalRelation , [a#0, b#1, c#2]{noformat}
Optimized:
{noformat}
'Join Inner
:- LocalRelation , [a#0, b#1, c#2]
+- 'Join Inner, ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
   :- LocalRelation , [a#0, b#1, c#2]
   +- LocalRelation , [a#22, b#23, c#24]{noformat}
Or was this supposed to be in the physical plan? Any pointers would help. 
Thanks!

 


was (Author: navinvishy):
[~yumwang] would this be in the logical plan optimization? I was looking into 
the logical plans and got this for the following query:

 
{noformat}
val x = testRelation.subquery('x)
val y = testRelation1.subquery('y)
val z = testRelation.subquery('z)
val query = x.join(y).join(z)
  .where(("x.a".attr === "y.b".attr) && ("y.b".attr === "z.c".attr) && 
("z.c".attr === 1)){noformat}
 

Unoptimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- SubqueryAlias x
 : : +- LocalRelation , [a#0, b#1, c#2]
 : +- SubqueryAlias y
 : +- LocalRelation , [d#3]
 +- SubqueryAlias z
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Optimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- LocalRelation , [a#0, b#1, c#2]
 : +- LocalRelation , [d#3]
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Or was this supposed to be in the physical plan? Any pointers would help. 
Thanks!

 

> Optimizer cannot infer from inferred constraints with join
> --
>
> Key: SPARK-30876
> URL: https://issues.apache.org/jira/browse/SPARK-30876
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> create table t1(a int, b int, c int);
> create table t2(a int, b int, c int);
> create table t3(a int, b int, c int);
> select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and 
> t3.c = 1);
> {code}
> Spark 2.3+:
> {noformat}
> == Physical Plan ==
> *(4) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, true, [id=#102]
>+- *(3) HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *(3) Project
>  +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight
> :- *(3) Project [b#10]
> :  +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight
> : :- *(3) Project [a#6]
> : :  +- *(3) Filter isnotnull(a#6)
> : : +- *(3) ColumnarToRow
> : :+- FileScan parquet default.t1[a#6] Batched: true, 
> DataFilters: [isnotnull(a#6)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: 
> struct
> : +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#87]
> :+- *(1) Project [b#10]
> :   +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1))
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t2[b#10] Batched: 
> true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], 
> ReadSchema: struct
> +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#96]
>+- *(2) Project [c#14]
>   +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1))
>  +- *(2) ColumnarToRow
> +- FileScan parquet default.t3[c#14] Batched: true, 
> DataFilters: [isnotnull(c#14), (c#14 = 1)], 

[jira] [Comment Edited] (SPARK-30876) Optimizer cannot infer from inferred constraints with join

2020-06-19 Thread Navin Viswanath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140898#comment-17140898
 ] 

Navin Viswanath edited comment on SPARK-30876 at 6/20/20, 2:27 AM:
---

[~yumwang] would this be in the logical plan optimization? I was looking into 
the logical plans and got this for the following query:

 
{noformat}
val x = testRelation.subquery('x)
val y = testRelation1.subquery('y)
val z = testRelation.subquery('z)
val query = x.join(y).join(z)
  .where(("x.a".attr === "y.b".attr) && ("y.b".attr === "z.c".attr) && 
("z.c".attr === 1)){noformat}
 

Unoptimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- SubqueryAlias x
 : : +- LocalRelation , [a#0, b#1, c#2]
 : +- SubqueryAlias y
 : +- LocalRelation , [d#3]
 +- SubqueryAlias z
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Optimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- LocalRelation , [a#0, b#1, c#2]
 : +- LocalRelation , [d#3]
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Or was this supposed to be in the physical plan? Any pointers would help. 
Thanks!

 


was (Author: navinvishy):
[~yumwang] would this be in the logical plan optimization? I was looking into 
the logical plans and got this.

Unoptimized:

 
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- SubqueryAlias x
 : : +- LocalRelation , [a#0, b#1, c#2]
 : +- SubqueryAlias y
 : +- LocalRelation , [d#3]
 +- SubqueryAlias z
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Optimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- LocalRelation , [a#0, b#1, c#2]
 : +- LocalRelation , [d#3]
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Or was this supposed to be in the physical plan? Any pointers would help. 
Thanks!

 

> Optimizer cannot infer from inferred constraints with join
> --
>
> Key: SPARK-30876
> URL: https://issues.apache.org/jira/browse/SPARK-30876
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> create table t1(a int, b int, c int);
> create table t2(a int, b int, c int);
> create table t3(a int, b int, c int);
> select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and 
> t3.c = 1);
> {code}
> Spark 2.3+:
> {noformat}
> == Physical Plan ==
> *(4) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, true, [id=#102]
>+- *(3) HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *(3) Project
>  +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight
> :- *(3) Project [b#10]
> :  +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight
> : :- *(3) Project [a#6]
> : :  +- *(3) Filter isnotnull(a#6)
> : : +- *(3) ColumnarToRow
> : :+- FileScan parquet default.t1[a#6] Batched: true, 
> DataFilters: [isnotnull(a#6)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: 
> struct
> : +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#87]
> :+- *(1) Project [b#10]
> :   +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1))
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t2[b#10] Batched: 
> true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], 
> ReadSchema: struct
> +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#96]
>+- *(2) Project [c#14]
>   +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1))
>  +- *(2) ColumnarToRow
> +- FileScan parquet default.t3[c#14] Batched: true, 
> DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], 
> ReadSchema: struct
> Time taken: 3.785 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2.x:
> {noformat}
> == Physical Plan ==
> 

[jira] [Commented] (SPARK-30876) Optimizer cannot infer from inferred constraints with join

2020-06-19 Thread Navin Viswanath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17140898#comment-17140898
 ] 

Navin Viswanath commented on SPARK-30876:
-

[~yumwang] would this be in the logical plan optimization? I was looking into 
the logical plans and got this.

Unoptimized:

 
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- SubqueryAlias x
 : : +- LocalRelation , [a#0, b#1, c#2]
 : +- SubqueryAlias y
 : +- LocalRelation , [d#3]
 +- SubqueryAlias z
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Optimized:
{noformat}
'Filter ((('x.a = 'y.b) AND ('y.b = 'z.c)) AND ('z.c = 1))
+- 'Join Inner
 :- Join Inner
 : :- LocalRelation , [a#0, b#1, c#2]
 : +- LocalRelation , [d#3]
 +- LocalRelation , [a#0, b#1, c#2]{noformat}
Or was this supposed to be in the physical plan? Any pointers would help. 
Thanks!

 

> Optimizer cannot infer from inferred constraints with join
> --
>
> Key: SPARK-30876
> URL: https://issues.apache.org/jira/browse/SPARK-30876
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> create table t1(a int, b int, c int);
> create table t2(a int, b int, c int);
> create table t3(a int, b int, c int);
> select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and 
> t3.c = 1);
> {code}
> Spark 2.3+:
> {noformat}
> == Physical Plan ==
> *(4) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, true, [id=#102]
>+- *(3) HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *(3) Project
>  +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight
> :- *(3) Project [b#10]
> :  +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight
> : :- *(3) Project [a#6]
> : :  +- *(3) Filter isnotnull(a#6)
> : : +- *(3) ColumnarToRow
> : :+- FileScan parquet default.t1[a#6] Batched: true, 
> DataFilters: [isnotnull(a#6)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: 
> struct
> : +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#87]
> :+- *(1) Project [b#10]
> :   +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1))
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t2[b#10] Batched: 
> true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], 
> ReadSchema: struct
> +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#96]
>+- *(2) Project [c#14]
>   +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1))
>  +- *(2) ColumnarToRow
> +- FileScan parquet default.t3[c#14] Batched: true, 
> DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], 
> ReadSchema: struct
> Time taken: 3.785 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2.x:
> {noformat}
> == Physical Plan ==
> *HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition
>+- *HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *Project
>  +- *SortMergeJoin [b#19], [c#23], Inner
> :- *Project [b#19]
> :  +- *SortMergeJoin [a#15], [b#19], Inner
> : :- *Sort [a#15 ASC NULLS FIRST], false, 0
> : :  +- Exchange hashpartitioning(a#15, 200)
> : : +- *Filter (isnotnull(a#15) && (a#15 = 1))
> : :+- HiveTableScan [a#15], HiveTableRelation 
> `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, 
> b#16, c#17]
> : +- *Sort [b#19 ASC NULLS FIRST], false, 0
> :+- Exchange hashpartitioning(b#19, 200)
> :   +- *Filter (isnotnull(b#19) && (b#19 = 1))
> :  +- HiveTableScan [b#19], HiveTableRelation 
> `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, 
> b#19, c#20]
> +- *Sort [c#23 ASC NULLS FIRST], false, 0
>   

[jira] [Commented] (SPARK-30876) Optimizer cannot infer from inferred constraints with join

2020-06-16 Thread Navin Viswanath (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136753#comment-17136753
 ] 

Navin Viswanath commented on SPARK-30876:
-

Hi [~yumwang] I'd be happy to take a look at this if that's ok.

> Optimizer cannot infer from inferred constraints with join
> --
>
> Key: SPARK-30876
> URL: https://issues.apache.org/jira/browse/SPARK-30876
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce this issue:
> {code:sql}
> create table t1(a int, b int, c int);
> create table t2(a int, b int, c int);
> create table t3(a int, b int, c int);
> select count(*) from t1 join t2 join t3 on (t1.a = t2.b and t2.b = t3.c and 
> t3.c = 1);
> {code}
> Spark 2.3+:
> {noformat}
> == Physical Plan ==
> *(4) HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition, true, [id=#102]
>+- *(3) HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *(3) Project
>  +- *(3) BroadcastHashJoin [b#10], [c#14], Inner, BuildRight
> :- *(3) Project [b#10]
> :  +- *(3) BroadcastHashJoin [a#6], [b#10], Inner, BuildRight
> : :- *(3) Project [a#6]
> : :  +- *(3) Filter isnotnull(a#6)
> : : +- *(3) ColumnarToRow
> : :+- FileScan parquet default.t1[a#6] Batched: true, 
> DataFilters: [isnotnull(a#6)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(a)], ReadSchema: 
> struct
> : +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#87]
> :+- *(1) Project [b#10]
> :   +- *(1) Filter (isnotnull(b#10) AND (b#10 = 1))
> :  +- *(1) ColumnarToRow
> : +- FileScan parquet default.t2[b#10] Batched: 
> true, DataFilters: [isnotnull(b#10), (b#10 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(b), EqualTo(b,1)], 
> ReadSchema: struct
> +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, true] as bigint))), 
> [id=#96]
>+- *(2) Project [c#14]
>   +- *(2) Filter (isnotnull(c#14) AND (c#14 = 1))
>  +- *(2) ColumnarToRow
> +- FileScan parquet default.t3[c#14] Batched: true, 
> DataFilters: [isnotnull(c#14), (c#14 = 1)], Format: Parquet, Location: 
> InMemoryFileIndex[file:/Users/yumwang/Downloads/spark-3.0.0-preview2-bin-hadoop2.7/spark-warehous...,
>  PartitionFilters: [], PushedFilters: [IsNotNull(c), EqualTo(c,1)], 
> ReadSchema: struct
> Time taken: 3.785 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2.x:
> {noformat}
> == Physical Plan ==
> *HashAggregate(keys=[], functions=[count(1)])
> +- Exchange SinglePartition
>+- *HashAggregate(keys=[], functions=[partial_count(1)])
>   +- *Project
>  +- *SortMergeJoin [b#19], [c#23], Inner
> :- *Project [b#19]
> :  +- *SortMergeJoin [a#15], [b#19], Inner
> : :- *Sort [a#15 ASC NULLS FIRST], false, 0
> : :  +- Exchange hashpartitioning(a#15, 200)
> : : +- *Filter (isnotnull(a#15) && (a#15 = 1))
> : :+- HiveTableScan [a#15], HiveTableRelation 
> `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#15, 
> b#16, c#17]
> : +- *Sort [b#19 ASC NULLS FIRST], false, 0
> :+- Exchange hashpartitioning(b#19, 200)
> :   +- *Filter (isnotnull(b#19) && (b#19 = 1))
> :  +- HiveTableScan [b#19], HiveTableRelation 
> `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#18, 
> b#19, c#20]
> +- *Sort [c#23 ASC NULLS FIRST], false, 0
>+- Exchange hashpartitioning(c#23, 200)
>   +- *Filter (isnotnull(c#23) && (c#23 = 1))
>  +- HiveTableScan [c#23], HiveTableRelation 
> `default`.`t3`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [a#21, 
> b#22, c#23]
> Time taken: 0.728 seconds, Fetched 1 row(s)
> {noformat}
> Spark 2.2 can infer {{(a#15 = 1)}}, but Spark 2.3+ can't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org