[jira] [Commented] (SPARK-42890) Add Identifier to the InMemoryTableScan node on the SQL page

2023-06-09 Thread Yian Liou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731109#comment-17731109
 ] 

Yian Liou commented on SPARK-42890:
---

PR open at https://github.com/apache/spark/pull/40529

> Add Identifier to the InMemoryTableScan node on the SQL page
> 
>
> Key: SPARK-42890
> URL: https://issues.apache.org/jira/browse/SPARK-42890
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
>
> On the SQL page in the Web UI, there is no distinction for which 
> InMemoryTableScan is being used at a specific point in the DAG. This Jira 
> aims to add a repeat identifier to distinguish which InMemoryTableScan is 
> being used at a certain location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42890) Add Identifier to the InMemoryTableScan node on the SQL page

2023-03-21 Thread Yian Liou (Jira)
Yian Liou created SPARK-42890:
-

 Summary: Add Identifier to the InMemoryTableScan node on the SQL 
page
 Key: SPARK-42890
 URL: https://issues.apache.org/jira/browse/SPARK-42890
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.3.2
Reporter: Yian Liou


On the SQL page in the Web UI, there is no distinction for which 
InMemoryTableScan is being used at a specific point in the DAG. This Jira aims 
to add a repeat identifier to distinguish which InMemoryTableScan is being used 
at a certain location.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42829) Add Identifier to the cached RDD node on the Stages page

2023-03-21 Thread Yian Liou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yian Liou updated SPARK-42829:
--
Summary: Add Identifier to the cached RDD node on the Stages page   (was: 
Added Identifier to the cached RDD operator on the Stages page )

> Add Identifier to the cached RDD node on the Stages page 
> -
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page

2023-03-21 Thread Yian Liou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703403#comment-17703403
 ] 

Yian Liou commented on SPARK-42829:
---

Will file another Jira for adding repeat identifier for InMemoryTableScan 
operator on SQL page which is related to this jira.

> Added Identifier to the cached RDD operator on the Stages page 
> ---
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page

2023-03-20 Thread Yian Liou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703038#comment-17703038
 ] 

Yian Liou commented on SPARK-42829:
---

Opened PR at [https://github.com/apache/spark/pull/40502] and included 
screenshot there. [~gurwls223] 

> Added Identifier to the cached RDD operator on the Stages page 
> ---
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page

2023-03-20 Thread Yian Liou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yian Liou updated SPARK-42829:
--
Attachment: Screen Shot 2023-03-20 at 3.55.40 PM.png

> Added Identifier to the cached RDD operator on the Stages page 
> ---
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42830) Link skipped stages on Spark UI

2023-03-16 Thread Yian Liou (Jira)
Yian Liou created SPARK-42830:
-

 Summary: Link skipped stages on Spark UI
 Key: SPARK-42830
 URL: https://issues.apache.org/jira/browse/SPARK-42830
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.3.2
Reporter: Yian Liou


Add a link to the skipped Spark stages so that its easier to find the execution 
details on the UI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page

2023-03-16 Thread Yian Liou (Jira)
Yian Liou created SPARK-42829:
-

 Summary: Added Identifier to the cached RDD operator on the Stages 
page 
 Key: SPARK-42829
 URL: https://issues.apache.org/jira/browse/SPARK-42829
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.3.2
Reporter: Yian Liou


On the stages page in the Web UI, there is no distinction for which cached RDD 
is being executed in a particular stage. This Jira aims to add an repeat 
identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38617) Show timing for Spark rules and phases in SQL UI and Rest API

2022-03-21 Thread Yian Liou (Jira)
Yian Liou created SPARK-38617:
-

 Summary: Show timing for Spark rules and phases in SQL UI and Rest 
API
 Key: SPARK-38617
 URL: https://issues.apache.org/jira/browse/SPARK-38617
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Web UI
Affects Versions: 3.2.1
Reporter: Yian Liou


Currently information regarding spark phases and rule timing is not available 
on the SQL UI nor Rest API. The Jira aims to add clickable field on SQL UI to 
show timing of spark phases and rule timing information along with a separate 
endpoint to show the timing statistics.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37349) Improve SQL Rest API Parsing

2021-11-19 Thread Yian Liou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yian Liou updated SPARK-37349:
--
Description: 
https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
Rest API. Currently, values like
`"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
59.0 B (stage 1.0: task 5))"` are currently shown from Rest API calls which are 
not easily digested in its current form.New processing logic of the values is 
introduced to organize and process new metric fields in a more user friendly 
manner.

  was:https://issues.apache.org/jira/browse/SPARK-31440 added improvements for 
SQL Rest API. This Jira aims to add further enhancements in regards to parsing 
the incoming data by accounting for `StageIds` and `TaskIds` fields that came 
in Spark 3.


> Improve SQL Rest API Parsing
> 
>
> Key: SPARK-37349
> URL: https://issues.apache.org/jira/browse/SPARK-37349
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yian Liou
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
> Rest API. Currently, values like
> `"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
> 59.0 B (stage 1.0: task 5))"` are currently shown from Rest API calls which 
> are not easily digested in its current form.New processing logic of the 
> values is introduced to organize and process new metric fields in a more user 
> friendly manner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37349) Improve SQL Rest API Parsing

2021-11-19 Thread Yian Liou (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yian Liou updated SPARK-37349:
--
Description: 
https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
Rest API. Currently, values like
{code:java}
"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
59.0 B (stage 1.0: task 5))"{code}
 are currently shown from Rest API calls which are not easily digested in its 
current form.New processing logic of the values is introduced to organize and 
process new metric fields in a more user friendly manner.

  was:
https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
Rest API. Currently, values like
`"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
59.0 B (stage 1.0: task 5))"` are currently shown from Rest API calls which are 
not easily digested in its current form.New processing logic of the values is 
introduced to organize and process new metric fields in a more user friendly 
manner.


> Improve SQL Rest API Parsing
> 
>
> Key: SPARK-37349
> URL: https://issues.apache.org/jira/browse/SPARK-37349
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yian Liou
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
> Rest API. Currently, values like
> {code:java}
> "value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 
> 59.0 B (stage 1.0: task 5))"{code}
>  are currently shown from Rest API calls which are not easily digested in its 
> current form.New processing logic of the values is introduced to organize and 
> process new metric fields in a more user friendly manner.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37349) Improve SQL Rest API Parsing

2021-11-16 Thread Yian Liou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444891#comment-17444891
 ] 

Yian Liou commented on SPARK-37349:
---

Will be working on PR

> Improve SQL Rest API Parsing
> 
>
> Key: SPARK-37349
> URL: https://issues.apache.org/jira/browse/SPARK-37349
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yian Liou
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
> Rest API. This Jira aims to add further enhancements in regards to parsing 
> the incoming data by accounting for `StageIds` and `TaskIds` fields that came 
> in Spark 3.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37349) Improve SQL Rest API Parsing

2021-11-16 Thread Yian Liou (Jira)
Yian Liou created SPARK-37349:
-

 Summary: Improve SQL Rest API Parsing
 Key: SPARK-37349
 URL: https://issues.apache.org/jira/browse/SPARK-37349
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yian Liou


https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL 
Rest API. This Jira aims to add further enhancements in regards to parsing the 
incoming data by accounting for `StageIds` and `TaskIds` fields that came in 
Spark 3.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37340) Display StageIds in Operators for SQL UI

2021-11-15 Thread Yian Liou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444168#comment-17444168
 ] 

Yian Liou commented on SPARK-37340:
---

Will be working on this issue and opening pull request.

> Display StageIds in Operators for SQL UI
> 
>
> Key: SPARK-37340
> URL: https://issues.apache.org/jira/browse/SPARK-37340
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.2.0
>Reporter: Yian Liou
>Priority: Major
>
> This proposes a more generalized solution of 
> https://issues.apache.org/jira/browse/SPARK-30209, where a stageId-> operator 
> mapping is done with the following algorithm.
>  1. Read SparkGraph to get every Node's name and respective AccumulatorIDs.
>  2. Gets each stage's AccumulatorIDs.
>  3. Maps Operators to stages by checking for non-zero intersection of Step 1 
> and 2's AccumulatorIDs.
>  4. Connect SparkGraphNodes to respective StageIDs for rendering in SQL UI.
> As a result, some operators without max metrics values will also have 
> stageIds in the UI. This Jira also aims to add minor enhancements to the SQL 
> UI tab.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37340) Display StageIds in Operators for SQL UI

2021-11-15 Thread Yian Liou (Jira)
Yian Liou created SPARK-37340:
-

 Summary: Display StageIds in Operators for SQL UI
 Key: SPARK-37340
 URL: https://issues.apache.org/jira/browse/SPARK-37340
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.2.0
Reporter: Yian Liou


This proposes a more generalized solution of 
https://issues.apache.org/jira/browse/SPARK-30209, where a stageId-> operator 
mapping is done with the following algorithm.

 1. Read SparkGraph to get every Node's name and respective AccumulatorIDs.
 2. Gets each stage's AccumulatorIDs.
 3. Maps Operators to stages by checking for non-zero intersection of Step 1 
and 2's AccumulatorIDs.
 4. Connect SparkGraphNodes to respective StageIDs for rendering in SQL UI.

As a result, some operators without max metrics values will also have stageIds 
in the UI. This Jira also aims to add minor enhancements to the SQL UI tab.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2020-12-15 Thread Yian Liou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246812#comment-17246812
 ] 

Yian Liou edited comment on SPARK-33726 at 12/15/20, 6:27 PM:
--

Will create a PR for the issue, which is at 
https://github.com/apache/spark/pull/30788


was (Author: yliou):
Will create a PR for the issue.

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Priority: Major
>  Labels: correctness
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2020-12-09 Thread Yian Liou (Jira)
Yian Liou created SPARK-33726:
-

 Summary: Duplicate field names causes wrong answers during 
aggregation
 Key: SPARK-33726
 URL: https://issues.apache.org/jira/browse/SPARK-33726
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1, 2.4.4
Reporter: Yian Liou


We saw this bug at Workday.

Duplicate field names for different fields can cause  
org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
return a fixed batch when it should have returned a variable batch leading to 
wrong results.

This example produces wrong results in the spark shell:

scala> sql("with T as (select id as a, -id as x from range(3)), U as (select id 
as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as ma, 
min(b) as mb from T join U on a=b group by U.x, T.x").show
 
|*x*|*x*|*ma*|*mb*|
|-2|2|0|null|
|-1|1|null|1|
|0|0|0|0|

 instead of correct output : 
|*x*|*x*|*ma*|*mb*|
|0|0|0|0|
|-2|2|2|2|
|-1|1|1|1|

The issue can be solved by iterating over the fields themselves instead of 
field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation

2020-12-09 Thread Yian Liou (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246812#comment-17246812
 ] 

Yian Liou commented on SPARK-33726:
---

Will create a PR for the issue.

> Duplicate field names causes wrong answers during aggregation
> -
>
> Key: SPARK-33726
> URL: https://issues.apache.org/jira/browse/SPARK-33726
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.1
>Reporter: Yian Liou
>Priority: Major
>
> We saw this bug at Workday.
> Duplicate field names for different fields can cause  
> org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to 
> return a fixed batch when it should have returned a variable batch leading to 
> wrong results.
> This example produces wrong results in the spark shell:
> scala> sql("with T as (select id as a, -id as x from range(3)), U as (select 
> id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as 
> ma, min(b) as mb from T join U on a=b group by U.x, T.x").show
>  
> |*x*|*x*|*ma*|*mb*|
> |-2|2|0|null|
> |-1|1|null|1|
> |0|0|0|0|
>  instead of correct output : 
> |*x*|*x*|*ma*|*mb*|
> |0|0|0|0|
> |-2|2|2|2|
> |-1|1|1|1|
> The issue can be solved by iterating over the fields themselves instead of 
> field names. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org