[jira] [Commented] (SPARK-42890) Add Identifier to the InMemoryTableScan node on the SQL page
[ https://issues.apache.org/jira/browse/SPARK-42890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17731109#comment-17731109 ] Yian Liou commented on SPARK-42890: --- PR open at https://github.com/apache/spark/pull/40529 > Add Identifier to the InMemoryTableScan node on the SQL page > > > Key: SPARK-42890 > URL: https://issues.apache.org/jira/browse/SPARK-42890 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > > On the SQL page in the Web UI, there is no distinction for which > InMemoryTableScan is being used at a specific point in the DAG. This Jira > aims to add a repeat identifier to distinguish which InMemoryTableScan is > being used at a certain location. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42890) Add Identifier to the InMemoryTableScan node on the SQL page
Yian Liou created SPARK-42890: - Summary: Add Identifier to the InMemoryTableScan node on the SQL page Key: SPARK-42890 URL: https://issues.apache.org/jira/browse/SPARK-42890 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.3.2 Reporter: Yian Liou On the SQL page in the Web UI, there is no distinction for which InMemoryTableScan is being used at a specific point in the DAG. This Jira aims to add a repeat identifier to distinguish which InMemoryTableScan is being used at a certain location. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42829) Add Identifier to the cached RDD node on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yian Liou updated SPARK-42829: -- Summary: Add Identifier to the cached RDD node on the Stages page (was: Added Identifier to the cached RDD operator on the Stages page ) > Add Identifier to the cached RDD node on the Stages page > - > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703403#comment-17703403 ] Yian Liou commented on SPARK-42829: --- Will file another Jira for adding repeat identifier for InMemoryTableScan operator on SQL page which is related to this jira. > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17703038#comment-17703038 ] Yian Liou commented on SPARK-42829: --- Opened PR at [https://github.com/apache/spark/pull/40502] and included screenshot there. [~gurwls223] > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
[ https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yian Liou updated SPARK-42829: -- Attachment: Screen Shot 2023-03-20 at 3.55.40 PM.png > Added Identifier to the cached RDD operator on the Stages page > --- > > Key: SPARK-42829 > URL: https://issues.apache.org/jira/browse/SPARK-42829 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.3.2 >Reporter: Yian Liou >Priority: Major > Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png > > > On the stages page in the Web UI, there is no distinction for which cached > RDD is being executed in a particular stage. This Jira aims to add an repeat > identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42830) Link skipped stages on Spark UI
Yian Liou created SPARK-42830: - Summary: Link skipped stages on Spark UI Key: SPARK-42830 URL: https://issues.apache.org/jira/browse/SPARK-42830 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.3.2 Reporter: Yian Liou Add a link to the skipped Spark stages so that its easier to find the execution details on the UI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42829) Added Identifier to the cached RDD operator on the Stages page
Yian Liou created SPARK-42829: - Summary: Added Identifier to the cached RDD operator on the Stages page Key: SPARK-42829 URL: https://issues.apache.org/jira/browse/SPARK-42829 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.3.2 Reporter: Yian Liou On the stages page in the Web UI, there is no distinction for which cached RDD is being executed in a particular stage. This Jira aims to add an repeat identifier to distinguish which cached RDD is being executed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38617) Show timing for Spark rules and phases in SQL UI and Rest API
Yian Liou created SPARK-38617: - Summary: Show timing for Spark rules and phases in SQL UI and Rest API Key: SPARK-38617 URL: https://issues.apache.org/jira/browse/SPARK-38617 Project: Spark Issue Type: Improvement Components: SQL, Web UI Affects Versions: 3.2.1 Reporter: Yian Liou Currently information regarding spark phases and rule timing is not available on the SQL UI nor Rest API. The Jira aims to add clickable field on SQL UI to show timing of spark phases and rule timing information along with a separate endpoint to show the timing statistics. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37349) Improve SQL Rest API Parsing
[ https://issues.apache.org/jira/browse/SPARK-37349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yian Liou updated SPARK-37349: -- Description: https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL Rest API. Currently, values like `"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 59.0 B (stage 1.0: task 5))"` are currently shown from Rest API calls which are not easily digested in its current form.New processing logic of the values is introduced to organize and process new metric fields in a more user friendly manner. was:https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL Rest API. This Jira aims to add further enhancements in regards to parsing the incoming data by accounting for `StageIds` and `TaskIds` fields that came in Spark 3. > Improve SQL Rest API Parsing > > > Key: SPARK-37349 > URL: https://issues.apache.org/jira/browse/SPARK-37349 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yian Liou >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL > Rest API. Currently, values like > `"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, > 59.0 B (stage 1.0: task 5))"` are currently shown from Rest API calls which > are not easily digested in its current form.New processing logic of the > values is introduced to organize and process new metric fields in a more user > friendly manner. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37349) Improve SQL Rest API Parsing
[ https://issues.apache.org/jira/browse/SPARK-37349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yian Liou updated SPARK-37349: -- Description: https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL Rest API. Currently, values like {code:java} "value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 59.0 B (stage 1.0: task 5))"{code} are currently shown from Rest API calls which are not easily digested in its current form.New processing logic of the values is introduced to organize and process new metric fields in a more user friendly manner. was: https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL Rest API. Currently, values like `"value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, 59.0 B (stage 1.0: task 5))"` are currently shown from Rest API calls which are not easily digested in its current form.New processing logic of the values is introduced to organize and process new metric fields in a more user friendly manner. > Improve SQL Rest API Parsing > > > Key: SPARK-37349 > URL: https://issues.apache.org/jira/browse/SPARK-37349 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yian Liou >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL > Rest API. Currently, values like > {code:java} > "value" : "total (min, med, max (stageId: taskId))\n177.0 B (59.0 B, 59.0 B, > 59.0 B (stage 1.0: task 5))"{code} > are currently shown from Rest API calls which are not easily digested in its > current form.New processing logic of the values is introduced to organize and > process new metric fields in a more user friendly manner. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37349) Improve SQL Rest API Parsing
[ https://issues.apache.org/jira/browse/SPARK-37349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444891#comment-17444891 ] Yian Liou commented on SPARK-37349: --- Will be working on PR > Improve SQL Rest API Parsing > > > Key: SPARK-37349 > URL: https://issues.apache.org/jira/browse/SPARK-37349 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Yian Liou >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL > Rest API. This Jira aims to add further enhancements in regards to parsing > the incoming data by accounting for `StageIds` and `TaskIds` fields that came > in Spark 3. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37349) Improve SQL Rest API Parsing
Yian Liou created SPARK-37349: - Summary: Improve SQL Rest API Parsing Key: SPARK-37349 URL: https://issues.apache.org/jira/browse/SPARK-37349 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.2.0 Reporter: Yian Liou https://issues.apache.org/jira/browse/SPARK-31440 added improvements for SQL Rest API. This Jira aims to add further enhancements in regards to parsing the incoming data by accounting for `StageIds` and `TaskIds` fields that came in Spark 3. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37340) Display StageIds in Operators for SQL UI
[ https://issues.apache.org/jira/browse/SPARK-37340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17444168#comment-17444168 ] Yian Liou commented on SPARK-37340: --- Will be working on this issue and opening pull request. > Display StageIds in Operators for SQL UI > > > Key: SPARK-37340 > URL: https://issues.apache.org/jira/browse/SPARK-37340 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.2.0 >Reporter: Yian Liou >Priority: Major > > This proposes a more generalized solution of > https://issues.apache.org/jira/browse/SPARK-30209, where a stageId-> operator > mapping is done with the following algorithm. > 1. Read SparkGraph to get every Node's name and respective AccumulatorIDs. > 2. Gets each stage's AccumulatorIDs. > 3. Maps Operators to stages by checking for non-zero intersection of Step 1 > and 2's AccumulatorIDs. > 4. Connect SparkGraphNodes to respective StageIDs for rendering in SQL UI. > As a result, some operators without max metrics values will also have > stageIds in the UI. This Jira also aims to add minor enhancements to the SQL > UI tab. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37340) Display StageIds in Operators for SQL UI
Yian Liou created SPARK-37340: - Summary: Display StageIds in Operators for SQL UI Key: SPARK-37340 URL: https://issues.apache.org/jira/browse/SPARK-37340 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.2.0 Reporter: Yian Liou This proposes a more generalized solution of https://issues.apache.org/jira/browse/SPARK-30209, where a stageId-> operator mapping is done with the following algorithm. 1. Read SparkGraph to get every Node's name and respective AccumulatorIDs. 2. Gets each stage's AccumulatorIDs. 3. Maps Operators to stages by checking for non-zero intersection of Step 1 and 2's AccumulatorIDs. 4. Connect SparkGraphNodes to respective StageIDs for rendering in SQL UI. As a result, some operators without max metrics values will also have stageIds in the UI. This Jira also aims to add minor enhancements to the SQL UI tab. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-33726) Duplicate field names causes wrong answers during aggregation
[ https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246812#comment-17246812 ] Yian Liou edited comment on SPARK-33726 at 12/15/20, 6:27 PM: -- Will create a PR for the issue, which is at https://github.com/apache/spark/pull/30788 was (Author: yliou): Will create a PR for the issue. > Duplicate field names causes wrong answers during aggregation > - > > Key: SPARK-33726 > URL: https://issues.apache.org/jira/browse/SPARK-33726 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.1 >Reporter: Yian Liou >Priority: Major > Labels: correctness > > We saw this bug at Workday. > Duplicate field names for different fields can cause > org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to > return a fixed batch when it should have returned a variable batch leading to > wrong results. > This example produces wrong results in the spark shell: > scala> sql("with T as (select id as a, -id as x from range(3)), U as (select > id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as > ma, min(b) as mb from T join U on a=b group by U.x, T.x").show > > |*x*|*x*|*ma*|*mb*| > |-2|2|0|null| > |-1|1|null|1| > |0|0|0|0| > instead of correct output : > |*x*|*x*|*ma*|*mb*| > |0|0|0|0| > |-2|2|2|2| > |-1|1|1|1| > The issue can be solved by iterating over the fields themselves instead of > field names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-33726) Duplicate field names causes wrong answers during aggregation
Yian Liou created SPARK-33726: - Summary: Duplicate field names causes wrong answers during aggregation Key: SPARK-33726 URL: https://issues.apache.org/jira/browse/SPARK-33726 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.1, 2.4.4 Reporter: Yian Liou We saw this bug at Workday. Duplicate field names for different fields can cause org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to return a fixed batch when it should have returned a variable batch leading to wrong results. This example produces wrong results in the spark shell: scala> sql("with T as (select id as a, -id as x from range(3)), U as (select id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as ma, min(b) as mb from T join U on a=b group by U.x, T.x").show |*x*|*x*|*ma*|*mb*| |-2|2|0|null| |-1|1|null|1| |0|0|0|0| instead of correct output : |*x*|*x*|*ma*|*mb*| |0|0|0|0| |-2|2|2|2| |-1|1|1|1| The issue can be solved by iterating over the fields themselves instead of field names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-33726) Duplicate field names causes wrong answers during aggregation
[ https://issues.apache.org/jira/browse/SPARK-33726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246812#comment-17246812 ] Yian Liou commented on SPARK-33726: --- Will create a PR for the issue. > Duplicate field names causes wrong answers during aggregation > - > > Key: SPARK-33726 > URL: https://issues.apache.org/jira/browse/SPARK-33726 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4, 3.0.1 >Reporter: Yian Liou >Priority: Major > > We saw this bug at Workday. > Duplicate field names for different fields can cause > org.apache.spark.sql.catalyst.expressions.RowBasedKeyValueBatch#allocate to > return a fixed batch when it should have returned a variable batch leading to > wrong results. > This example produces wrong results in the spark shell: > scala> sql("with T as (select id as a, -id as x from range(3)), U as (select > id as b, cast(id as string) as x from range(3)) select T.x, U.x, min(a) as > ma, min(b) as mb from T join U on a=b group by U.x, T.x").show > > |*x*|*x*|*ma*|*mb*| > |-2|2|0|null| > |-1|1|null|1| > |0|0|0|0| > instead of correct output : > |*x*|*x*|*ma*|*mb*| > |0|0|0|0| > |-2|2|2|2| > |-1|1|1|1| > The issue can be solved by iterating over the fields themselves instead of > field names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org