maropu commented on pull request #32511:
URL: https://github.com/apache/spark/pull/32511#issuecomment-839374823
> I think the same issue happens in df.explain(...). If I want to see the
formatted result, I can't see the logical plan.
@cloud-fan Does your comment above mean a fix like this PR? Or, is it better
to simply concatenate the strings of the `extended` mode like this?
```
== Parsed Logical Plan ==
'Project [unresolvedalias(scalar-subquery#20 [], None),
unresolvedalias(scalar-subquery#21 [], None)]
: :- 'Aggregate ['b], [unresolvedalias('avg('a), None)]
: : +- 'UnresolvedRelation [t], [], false
: +- 'Aggregate ['b], [unresolvedalias('sum('b), None)]
: +- 'Filter ('b > 1)
: +- 'UnresolvedRelation [t], [], false
+- OneRowRelation
== Analyzed Logical Plan ==
scalarsubquery(): double, scalarsubquery(): bigint
Project [scalar-subquery#20 [] AS scalarsubquery()#28, scalar-subquery#21 []
AS scalarsubquery()#29L]
: :- Aggregate [b#3], [avg(a#2) AS avg(a)#23]
: : +- SubqueryAlias spark_catalog.default.t
: : +- Relation parquet default.t[a#2,b#3]
: +- Aggregate [b#27], [sum(b#27) AS sum(b)#25L]
: +- Filter (b#27 > 1)
: +- SubqueryAlias spark_catalog.default.t
: +- Relation parquet default.t[a#26,b#27]
+- OneRowRelation
== Optimized Logical Plan ==
Project [scalar-subquery#20 [] AS scalarsubquery()#28, scalar-subquery#21 []
AS scalarsubquery()#29L]
: :- Aggregate [b#3], [avg(a#2) AS avg(a)#23]
: : +- Relation parquet default.t[a#2,b#3]
: +- Aggregate [b#27], [sum(b#27) AS sum(b)#25L]
: +- Project [b#27]
: +- Filter (isnotnull(b#27) AND (b#27 > 1))
: +- Relation parquet default.t[a#26,b#27]
+- OneRowRelation
== Physical Plan ==
AdaptiveSparkPlan (3)
+- Project (2)
+- Scan OneRowRelation (1)
(1) Scan OneRowRelation
Output: []
Arguments: ParallelCollectionRDD[0] at explain at <console>:24,
OneRowRelation, UnknownPartitioning(0)
(2) Project
Output [2]: [Subquery subquery#0, [id=#17] AS scalarsubquery()#10, Subquery
subquery#1, [id=#32] AS scalarsubquery()#11L]
Input: []
(3) AdaptiveSparkPlan
Output [2]: [scalarsubquery()#10, scalarsubquery()#11L]
Arguments: isFinalPlan=false
===== Subqueries =====
Subquery:1 Hosting operator id = 2 Hosting Expression = Subquery subquery#0,
[id=#17]
AdaptiveSparkPlan (8)
+- HashAggregate (7)
+- Exchange (6)
+- HashAggregate (5)
+- Scan parquet default.t (4)
(4) Scan parquet default.t
Output [2]: [a#2, b#3]
Batched: true
Location: InMemoryFileIndex
[file:/Users/maropu/Repositories/spark/spark-master/spark-warehouse/t]
ReadSchema: struct<a:int,b:int>
(5) HashAggregate
Input [2]: [a#2, b#3]
Keys [1]: [b#3]
Functions [1]: [partial_avg(a#2)]
Aggregate Attributes [2]: [sum#14, count#15L]
Results [3]: [b#3, sum#16, count#17L]
(6) Exchange
Input [3]: [b#3, sum#16, count#17L]
Arguments: hashpartitioning(b#3, 200), ENSURE_REQUIREMENTS, [id=#15]
(7) HashAggregate
Input [3]: [b#3, sum#16, count#17L]
Keys [1]: [b#3]
Functions [1]: [avg(a#2)]
Aggregate Attributes [1]: [avg(a#2)#4]
Results [1]: [avg(a#2)#4 AS avg(a)#5]
(8) AdaptiveSparkPlan
Output [1]: [avg(a)#5]
Arguments: isFinalPlan=false
Subquery:2 Hosting operator id = 2 Hosting Expression = Subquery subquery#1,
[id=#32]
AdaptiveSparkPlan (14)
+- HashAggregate (13)
+- Exchange (12)
+- HashAggregate (11)
+- Filter (10)
+- Scan parquet default.t (9)
(9) Scan parquet default.t
Output [1]: [b#9]
Batched: true
Location: InMemoryFileIndex
[file:/Users/maropu/Repositories/spark/spark-master/spark-warehouse/t]
PushedFilters: [IsNotNull(b), GreaterThan(b,1)]
ReadSchema: struct<b:int>
(10) Filter
Input [1]: [b#9]
Condition : (isnotnull(b#9) AND (b#9 > 1))
(11) HashAggregate
Input [1]: [b#9]
Keys [1]: [b#9]
Functions [1]: [partial_sum(b#9)]
Aggregate Attributes [1]: [sum#18L]
Results [2]: [b#9, sum#19L]
(12) Exchange
Input [2]: [b#9, sum#19L]
Arguments: hashpartitioning(b#9, 200), ENSURE_REQUIREMENTS, [id=#30]
(13) HashAggregate
Input [2]: [b#9, sum#19L]
Keys [1]: [b#9]
Functions [1]: [sum(b#9)]
Aggregate Attributes [1]: [sum(b#9)#6L]
Results [1]: [sum(b#9)#6L AS sum(b)#7L]
(14) AdaptiveSparkPlan
Output [1]: [sum(b)#7L]
Arguments: isFinalPlan=false
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]