[jira] [Updated] (SPARK-40067) Add table name to Spark plan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-40067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-40067: --- Description: [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] introduced `Scan#name()` API to expose the name of the TableScan in the `BatchScan` node in SparkUI. However, a better suggestion was to use the `Table#name()`. Furthermore, we can also extract other useful information `Table` thus revert [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] and use `Table` to fetch relevant information. > Add table name to Spark plan node in SparkUI > > > Key: SPARK-40067 > URL: https://issues.apache.org/jira/browse/SPARK-40067 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Sumeet >Priority: Major > > [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] introduced > `Scan#name()` API to expose the name of the TableScan in the `BatchScan` node > in SparkUI. > However, a better suggestion was to use the `Table#name()`. Furthermore, we > can also extract other useful information `Table` thus revert > [SPARK-39902|https://issues.apache.org/jira/browse/SPARK-39902] and use > `Table` to fetch relevant information. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40067) Add table name to Spark plan node in SparkUI
Sumeet created SPARK-40067: -- Summary: Add table name to Spark plan node in SparkUI Key: SPARK-40067 URL: https://issues.apache.org/jira/browse/SPARK-40067 Project: Spark Issue Type: Improvement Components: SQL, Web UI Affects Versions: 3.4.0 Reporter: Sumeet -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17574029#comment-17574029 ] Sumeet commented on SPARK-39902: Hi [~dongjoon] - Noted. Thank you for updating the ticket and merging the changes. > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.4.0 >Reporter: Sumeet >Assignee: Sumeet >Priority: Major > Fix For: 3.4.0 > > Attachments: Screen Shot 2022-07-27 at 6.00.27 PM.png, Screen Shot > 2022-07-27 at 6.00.50 PM.png, Screen Shot 2022-07-27 at 6.38.56 PM.png, > Screen Shot 2022-07-27 at 6.39.48 PM.png > > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > > DSv1 > !Screen Shot 2022-07-27 at 6.00.27 PM.png|width=356,height=212! > > DSv2 > !Screen Shot 2022-07-27 at 6.00.50 PM.png|width=293,height=277! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572193#comment-17572193 ] Sumeet commented on SPARK-39902: An example of this change can be seen while viewing the Iceberg scans on SparkUI. h2. Before this change: !Screen Shot 2022-07-27 at 6.39.48 PM.png|width=415,height=211! h2. After this change: !Screen Shot 2022-07-27 at 6.38.56 PM.png|width=430,height=216! > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > Attachments: Screen Shot 2022-07-27 at 6.00.27 PM.png, Screen Shot > 2022-07-27 at 6.00.50 PM.png, Screen Shot 2022-07-27 at 6.38.56 PM.png, > Screen Shot 2022-07-27 at 6.39.48 PM.png > > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > > DSv1 > !Screen Shot 2022-07-27 at 6.00.27 PM.png|width=356,height=212! > > DSv2 > !Screen Shot 2022-07-27 at 6.00.50 PM.png|width=293,height=277! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-39902: --- Attachment: Screen Shot 2022-07-27 at 6.39.48 PM.png > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > Attachments: Screen Shot 2022-07-27 at 6.00.27 PM.png, Screen Shot > 2022-07-27 at 6.00.50 PM.png, Screen Shot 2022-07-27 at 6.38.56 PM.png, > Screen Shot 2022-07-27 at 6.39.48 PM.png > > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > > DSv1 > !Screen Shot 2022-07-27 at 6.00.27 PM.png|width=356,height=212! > > DSv2 > !Screen Shot 2022-07-27 at 6.00.50 PM.png|width=293,height=277! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-39902: --- Attachment: Screen Shot 2022-07-27 at 6.38.56 PM.png > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > Attachments: Screen Shot 2022-07-27 at 6.00.27 PM.png, Screen Shot > 2022-07-27 at 6.00.50 PM.png, Screen Shot 2022-07-27 at 6.38.56 PM.png, > Screen Shot 2022-07-27 at 6.39.48 PM.png > > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > > DSv1 > !Screen Shot 2022-07-27 at 6.00.27 PM.png|width=356,height=212! > > DSv2 > !Screen Shot 2022-07-27 at 6.00.50 PM.png|width=293,height=277! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-39902: --- Description: Hi, For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" as opposed to "Scan ". Add a method "String name()" to the Scan interface, that "BatchScanExec" can invoke to set the node name the plan. This nodeName will be eventually used by "SparkPlanGraphNode" to display it in the header of the UI node. DSv1 !Screen Shot 2022-07-27 at 6.00.27 PM.png|width=356,height=212! DSv2 !Screen Shot 2022-07-27 at 6.00.50 PM.png|width=293,height=277! was: Hi, For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" as opposed to "Scan ". Add a method "String name()" to the Scan interface, that "BatchScanExec" can invoke to set the node name the plan. This nodeName will be eventually used by "SparkPlanGraphNode" to display it in the header of the UI node. DSv1 > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > Attachments: Screen Shot 2022-07-27 at 6.00.27 PM.png, Screen Shot > 2022-07-27 at 6.00.50 PM.png > > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > > DSv1 > !Screen Shot 2022-07-27 at 6.00.27 PM.png|width=356,height=212! > > DSv2 > !Screen Shot 2022-07-27 at 6.00.50 PM.png|width=293,height=277! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-39902: --- Attachment: Screen Shot 2022-07-27 at 6.00.50 PM.png > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > Attachments: Screen Shot 2022-07-27 at 6.00.27 PM.png, Screen Shot > 2022-07-27 at 6.00.50 PM.png > > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > > DSv1 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-39902: --- Attachment: Screen Shot 2022-07-27 at 6.00.27 PM.png > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > Attachments: Screen Shot 2022-07-27 at 6.00.27 PM.png > > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > > DSv1 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-39902: --- Attachment: (was: Screen Shot 2022-07-27 at 6.00.27 PM.png) > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > > DSv1 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-39902: --- Attachment: Screen Shot 2022-07-27 at 6.00.27 PM.png > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-39902: --- Description: Hi, For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" as opposed to "Scan ". Add a method "String name()" to the Scan interface, that "BatchScanExec" can invoke to set the node name the plan. This nodeName will be eventually used by "SparkPlanGraphNode" to display it in the header of the UI node. DSv1 was: Hi, For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" as opposed to "Scan ". Add a method "String name()" to the Scan interface, that "BatchScanExec" can invoke to set the node name the plan. This nodeName will be eventually used by "SparkPlanGraphNode" to display it in the header of the UI node. > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > > DSv1 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
[ https://issues.apache.org/jira/browse/SPARK-39902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572186#comment-17572186 ] Sumeet commented on SPARK-39902: I'm working on it and will publish a patch soon. > Add Scan details to spark plan scan node in SparkUI > --- > > Key: SPARK-39902 > URL: https://issues.apache.org/jira/browse/SPARK-39902 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.3.1 >Reporter: Sumeet >Priority: Major > > Hi, > For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" > as opposed to "Scan ". > Add a method "String name()" to the Scan interface, that "BatchScanExec" can > invoke to set the node name the plan. This nodeName will be eventually used > by "SparkPlanGraphNode" to display it in the header of the UI node. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39902) Add Scan details to spark plan scan node in SparkUI
Sumeet created SPARK-39902: -- Summary: Add Scan details to spark plan scan node in SparkUI Key: SPARK-39902 URL: https://issues.apache.org/jira/browse/SPARK-39902 Project: Spark Issue Type: Improvement Components: SQL, Web UI Affects Versions: 3.3.1 Reporter: Sumeet Hi, For DSv2, the scan node in the spark plan on SparkUI simply shows "BatchScan" as opposed to "Scan ". Add a method "String name()" to the Scan interface, that "BatchScanExec" can invoke to set the node name the plan. This nodeName will be eventually used by "SparkPlanGraphNode" to display it in the header of the UI node. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37175) Performance improvement to hash joins with many duplicate keys
[ https://issues.apache.org/jira/browse/SPARK-37175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436588#comment-17436588 ] Sumeet commented on SPARK-37175: I am working on this. > Performance improvement to hash joins with many duplicate keys > -- > > Key: SPARK-37175 > URL: https://issues.apache.org/jira/browse/SPARK-37175 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Bruce Robbins >Priority: Major > Attachments: hash_rel_examples.txt > > > I noticed that HashedRelations with many duplicate keys perform significantly > slower than HashedRelations with similar number of entries but few or no > duplicate keys. > A hypothesis: > * Because of the order in which rows are appended to the map, rows for a > given key are typically non-adjacent in memory, resulting in poor locality. > * The map would perform better if all rows for a given key are next to each > other in memory. > To test this hypothesis, I made a [somewhat brute force change to > HashedRelation|https://github.com/apache/spark/compare/master...bersprockets:hash_rel_play] > to reorganize the map such that all rows for a given key are adjacent in > memory. This yielded some performance improvements, at least in my contrived > examples: > (Run on a Intel-based MacBook Pro with 4 cores/8 hyperthreads): > Example 1: > Shuffled Hash Join, LongHashedRelation: > Stream side: 300M rows > Build side: 90M rows, but only 200K unique keys > 136G output rows > |Join strategy|Time (in seconds)|Notes| > |Shuffled hash join (No reorganization)|1092| | > |Shuffled hash join (with reorganization)|234|4.6 times faster than regular > SHJ| > |Sort merge join|164|This beats the SHJ when there are lots of duplicate > keys, I presume because of better cache locality on both sides of the join| > Example 2: > Broadcast Hash Join, LongHashedRelation: > Stream side: 350M rows > Build side 9M rows, but only 18K unique keys > 175G output rows > |Join strategy|Time (in seconds)|Notes| > |Broadcast hash join (No reorganization)|872| | > |Broadcast hash join (with reorganization)|263|3 times faster than regular > BHJ| > |Sort merge join|174|This beats the BHJ when there are lots of duplicate > keys, I presume because of better cache locality on both sides of the join| > Example 3: > Shuffled Hash Join, UnsafeHashedRelation > Stream side: 300M rows > Build side 90M rows, but only 200K unique keys > 135G output rows > |Join strategy|Time (in seconds)|Notes| > |Shuffled Hash Join (No reorganization)|3154| | > |Shuffled Hash Join (with reorganization)|533|5.9 times faster| > |Sort merge join|190|This beats the SHJ when there are lots of duplicate > keys, I presume because of better cache locality on both sides of the join| > Example 4: > Broadcast Hash Join, UnsafeHashedRelation: > Stream side: 70M rows > Build side 9M rows, but only 18K unique keys > 35G output rows > |Join strategy|Time (in seconds)|Notes| > |Broadcast hash join (No reorganization)|849| | > |Broadcast hash join (with reorganization)|130|6.5 times faster| > |Sort merge join|46|This beats the BHJ when there are lots of duplicate keys, > I presume because of better cache locality on both sides of the join| > The code for these examples is attached here [^hash_rel_examples.txt] > Even the brute force approach could be useful in production if > * Toggled by a feature flag > * Reorganizes only if the ratio of keys to rows drops below some threshold > * Falls back to using the original map if building the new map results in a > memory-related SparkException. > Another incidental lesson is that sort merge join seems to outperform > broadcast hash join when the build side has lots of duplicate keys. So maybe > a long term improvement would be to avoid hash joins (broadcast or shuffle) > if there are many duplicate keys. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35011) Avoid Block Manager registerations when StopExecutor msg is in-flight.
[ https://issues.apache.org/jira/browse/SPARK-35011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-35011: --- Fix Version/s: 3.0.4 > Avoid Block Manager registerations when StopExecutor msg is in-flight. > -- > > Key: SPARK-35011 > URL: https://issues.apache.org/jira/browse/SPARK-35011 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.1, 3.2.0 >Reporter: Sumeet >Assignee: Sumeet >Priority: Major > Labels: BlockManager, core > Fix For: 3.2.0, 3.1.3, 3.0.4 > > > *Note:* This is a follow-up on SPARK-34949, even after the heartbeat fix, > driver reports dead executors as alive. > *Problem:* > I was testing Dynamic Allocation on K8s with about 300 executors. While doing > so, when the executors were torn down due to > "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor > pods being removed from K8s, however, under the "Executors" tab in SparkUI, I > could see some executors listed as alive. > [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] > also returned a value greater than 1. > > *Cause:* > * "CoarseGrainedSchedulerBackend" issues async "StopExecutor" on > executorEndpoint > * "CoarseGrainedSchedulerBackend" removes that executor from Driver's > internal data structures and publishes "SparkListenerExecutorRemoved" on the > "listenerBus". > * Executor has still not processed "StopExecutor" from the Driver > * Driver receives heartbeat from the Executor, since it cannot find the > "executorId" in its data structures, it responds with > "HeartbeatResponse(reregisterBlockManager = true)" > * "BlockManager" on the Executor reregisters with the "BlockManagerMaster" > and "SparkListenerBlockManagerAdded" is published on the "listenerBus" > * Executor starts processing the "StopExecutor" and exits > * "AppStatusListener" picks the "SparkListenerBlockManagerAdded" event and > updates "AppStatusStore" > * "statusTracker.getExecutorInfos" refers "AppStatusStore" to get the list > of executors which returns the dead executor as alive. > > *Proposed Solution:* > Maintain a Cache of recently removed executors on Driver. During the > registration in BlockManagerMasterEndpoint if the BlockManager belongs to a > recently removed executor, return None indicating the registration is ignored > since the executor will be shutting down soon. > On BlockManagerHeartbeat, if the BlockManager belongs to a recently removed > executor, return true indicating the driver knows about it, thereby > preventing reregisteration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-36065) date_trunc returns incorrect output
[ https://issues.apache.org/jira/browse/SPARK-36065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17380050#comment-17380050 ] Sumeet commented on SPARK-36065: cc [~maxgekk] > date_trunc returns incorrect output > --- > > Key: SPARK-36065 > URL: https://issues.apache.org/jira/browse/SPARK-36065 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Sumeet >Priority: Major > Labels: date_trunc, sql, timestamp > > Hi, > Running date_trunc on any hour of "1891-10-01" returns incorrect output for > "Europe/Bratislava" timezone. > Use the following steps in order to reproduce the issue: > * Run spark-shell using: > {code:java} > TZ="Europe/Bratislava" ./bin/spark-shell --conf > spark.driver.extraJavaOptions='-Duser.timezone=Europe/Bratislava' --conf > spark.executor.extraJavaOptions='-Duser.timezone=Europe/Bratislava' --conf > spark.sql.session.timeZone="Europe/Bratislava"{code} > * Generate test data: > {code:java} > ((0 until 9).map(hour => s"1891-10-01 00:0$hour:00") ++ (10 until > 24).map(hour => s"1891-10-01 > 00:$hour:00")).toDF("ts_string").createOrReplaceTempView("temp_ts") > {code} > * Run query: > {code:java} > sql("select ts_string, cast(ts_string as TIMESTAMP) as ts, date_trunc('day', > ts_string) from temp_ts").show(false) > {code} > * Output: > {code:java} > +---+---+--+ > |ts_string |ts |date_trunc(day, ts_string)| > +---+---+--+ > |1891-10-01 00:00:00|1891-10-01 00:02:16|1891-10-01 00:02:16 | > |1891-10-01 00:01:00|1891-10-01 00:03:16|1891-10-01 00:02:16 | > |1891-10-01 00:02:00|1891-10-01 00:04:16|1891-10-01 00:02:16 | > |1891-10-01 00:03:00|1891-10-01 00:03:00|1891-10-01 00:02:16 | > |1891-10-01 00:04:00|1891-10-01 00:04:00|1891-10-01 00:02:16 | > |1891-10-01 00:05:00|1891-10-01 00:05:00|1891-10-01 00:02:16 | > |1891-10-01 00:06:00|1891-10-01 00:06:00|1891-10-01 00:02:16 | > |1891-10-01 00:07:00|1891-10-01 00:07:00|1891-10-01 00:02:16 | > |1891-10-01 00:08:00|1891-10-01 00:08:00|1891-10-01 00:02:16 | > |1891-10-01 00:10:00|1891-10-01 00:10:00|1891-10-01 00:02:16 | > |1891-10-01 00:11:00|1891-10-01 00:11:00|1891-10-01 00:02:16 | > |1891-10-01 00:12:00|1891-10-01 00:12:00|1891-10-01 00:02:16 | > |1891-10-01 00:13:00|1891-10-01 00:13:00|1891-10-01 00:02:16 | > |1891-10-01 00:14:00|1891-10-01 00:14:00|1891-10-01 00:02:16 | > |1891-10-01 00:15:00|1891-10-01 00:15:00|1891-10-01 00:02:16 | > |1891-10-01 00:16:00|1891-10-01 00:16:00|1891-10-01 00:02:16 | > |1891-10-01 00:17:00|1891-10-01 00:17:00|1891-10-01 00:02:16 | > |1891-10-01 00:18:00|1891-10-01 00:18:00|1891-10-01 00:02:16 | > |1891-10-01 00:19:00|1891-10-01 00:19:00|1891-10-01 00:02:16 | > |1891-10-01 00:20:00|1891-10-01 00:20:00|1891-10-01 00:02:16 | > +---+---+--+ > only showing top 20 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36065) date_trunc returns incorrect output
[ https://issues.apache.org/jira/browse/SPARK-36065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-36065: --- Affects Version/s: 3.2.0 > date_trunc returns incorrect output > --- > > Key: SPARK-36065 > URL: https://issues.apache.org/jira/browse/SPARK-36065 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Sumeet >Priority: Major > Labels: date_trunc, sql, timestamp > > Hi, > Running date_trunc on any hour of "1891-10-01" returns incorrect output for > "Europe/Bratislava" timezone. > Use the following steps in order to reproduce the issue: > * Run spark-shell using: > {code:java} > TZ="Europe/Bratislava" ./bin/spark-shell --conf > spark.driver.extraJavaOptions='-Duser.timezone=Europe/Bratislava' --conf > spark.executor.extraJavaOptions='-Duser.timezone=Europe/Bratislava' --conf > spark.sql.session.timeZone="Europe/Bratislava"{code} > * Generate test data: > {code:java} > ((0 until 9).map(hour => s"1891-10-01 00:0$hour:00") ++ (10 until > 24).map(hour => s"1891-10-01 > 00:$hour:00")).toDF("ts_string").createOrReplaceTempView("temp_ts") > {code} > * Run query: > {code:java} > sql("select ts_string, cast(ts_string as TIMESTAMP) as ts, date_trunc('day', > ts_string) from temp_ts").show(false) > {code} > * Output: > {code:java} > +---+---+--+ > |ts_string |ts |date_trunc(day, ts_string)| > +---+---+--+ > |1891-10-01 00:00:00|1891-10-01 00:02:16|1891-10-01 00:02:16 | > |1891-10-01 00:01:00|1891-10-01 00:03:16|1891-10-01 00:02:16 | > |1891-10-01 00:02:00|1891-10-01 00:04:16|1891-10-01 00:02:16 | > |1891-10-01 00:03:00|1891-10-01 00:03:00|1891-10-01 00:02:16 | > |1891-10-01 00:04:00|1891-10-01 00:04:00|1891-10-01 00:02:16 | > |1891-10-01 00:05:00|1891-10-01 00:05:00|1891-10-01 00:02:16 | > |1891-10-01 00:06:00|1891-10-01 00:06:00|1891-10-01 00:02:16 | > |1891-10-01 00:07:00|1891-10-01 00:07:00|1891-10-01 00:02:16 | > |1891-10-01 00:08:00|1891-10-01 00:08:00|1891-10-01 00:02:16 | > |1891-10-01 00:10:00|1891-10-01 00:10:00|1891-10-01 00:02:16 | > |1891-10-01 00:11:00|1891-10-01 00:11:00|1891-10-01 00:02:16 | > |1891-10-01 00:12:00|1891-10-01 00:12:00|1891-10-01 00:02:16 | > |1891-10-01 00:13:00|1891-10-01 00:13:00|1891-10-01 00:02:16 | > |1891-10-01 00:14:00|1891-10-01 00:14:00|1891-10-01 00:02:16 | > |1891-10-01 00:15:00|1891-10-01 00:15:00|1891-10-01 00:02:16 | > |1891-10-01 00:16:00|1891-10-01 00:16:00|1891-10-01 00:02:16 | > |1891-10-01 00:17:00|1891-10-01 00:17:00|1891-10-01 00:02:16 | > |1891-10-01 00:18:00|1891-10-01 00:18:00|1891-10-01 00:02:16 | > |1891-10-01 00:19:00|1891-10-01 00:19:00|1891-10-01 00:02:16 | > |1891-10-01 00:20:00|1891-10-01 00:20:00|1891-10-01 00:02:16 | > +---+---+--+ > only showing top 20 rows > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-36065) date_trunc returns incorrect output
Sumeet created SPARK-36065: -- Summary: date_trunc returns incorrect output Key: SPARK-36065 URL: https://issues.apache.org/jira/browse/SPARK-36065 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Sumeet Hi, Running date_trunc on any hour of "1891-10-01" returns incorrect output for "Europe/Bratislava" timezone. Use the following steps in order to reproduce the issue: * Run spark-shell using: {code:java} TZ="Europe/Bratislava" ./bin/spark-shell --conf spark.driver.extraJavaOptions='-Duser.timezone=Europe/Bratislava' --conf spark.executor.extraJavaOptions='-Duser.timezone=Europe/Bratislava' --conf spark.sql.session.timeZone="Europe/Bratislava"{code} * Generate test data: {code:java} ((0 until 9).map(hour => s"1891-10-01 00:0$hour:00") ++ (10 until 24).map(hour => s"1891-10-01 00:$hour:00")).toDF("ts_string").createOrReplaceTempView("temp_ts") {code} * Run query: {code:java} sql("select ts_string, cast(ts_string as TIMESTAMP) as ts, date_trunc('day', ts_string) from temp_ts").show(false) {code} * Output: {code:java} +---+---+--+ |ts_string |ts |date_trunc(day, ts_string)| +---+---+--+ |1891-10-01 00:00:00|1891-10-01 00:02:16|1891-10-01 00:02:16 | |1891-10-01 00:01:00|1891-10-01 00:03:16|1891-10-01 00:02:16 | |1891-10-01 00:02:00|1891-10-01 00:04:16|1891-10-01 00:02:16 | |1891-10-01 00:03:00|1891-10-01 00:03:00|1891-10-01 00:02:16 | |1891-10-01 00:04:00|1891-10-01 00:04:00|1891-10-01 00:02:16 | |1891-10-01 00:05:00|1891-10-01 00:05:00|1891-10-01 00:02:16 | |1891-10-01 00:06:00|1891-10-01 00:06:00|1891-10-01 00:02:16 | |1891-10-01 00:07:00|1891-10-01 00:07:00|1891-10-01 00:02:16 | |1891-10-01 00:08:00|1891-10-01 00:08:00|1891-10-01 00:02:16 | |1891-10-01 00:10:00|1891-10-01 00:10:00|1891-10-01 00:02:16 | |1891-10-01 00:11:00|1891-10-01 00:11:00|1891-10-01 00:02:16 | |1891-10-01 00:12:00|1891-10-01 00:12:00|1891-10-01 00:02:16 | |1891-10-01 00:13:00|1891-10-01 00:13:00|1891-10-01 00:02:16 | |1891-10-01 00:14:00|1891-10-01 00:14:00|1891-10-01 00:02:16 | |1891-10-01 00:15:00|1891-10-01 00:15:00|1891-10-01 00:02:16 | |1891-10-01 00:16:00|1891-10-01 00:16:00|1891-10-01 00:02:16 | |1891-10-01 00:17:00|1891-10-01 00:17:00|1891-10-01 00:02:16 | |1891-10-01 00:18:00|1891-10-01 00:18:00|1891-10-01 00:02:16 | |1891-10-01 00:19:00|1891-10-01 00:19:00|1891-10-01 00:02:16 | |1891-10-01 00:20:00|1891-10-01 00:20:00|1891-10-01 00:02:16 | +---+---+--+ only showing top 20 rows {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35429) Remove commons-httpclient due to EOL and CVEs
[ https://issues.apache.org/jira/browse/SPARK-35429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17363131#comment-17363131 ] Sumeet commented on SPARK-35429: Re-opening this Jira since Spark upgraded to Hive 2.3.9 which no longer depends on commons-httpclient. [https://github.com/apache/spark/commit/463daabd5afd9abfb8027ebcb2e608f169ad1e40] > Remove commons-httpclient due to EOL and CVEs > - > > Key: SPARK-35429 > URL: https://issues.apache.org/jira/browse/SPARK-35429 > Project: Spark > Issue Type: Task > Components: Spark Core, SQL >Affects Versions: 3.0.0, 3.1.1, 3.2.0 >Reporter: Sumeet >Priority: Major > > Spark is pulling in commons-httpclient as a dependency directly. See > dependency:tree: > {code:java} > ./build/mvn dependency:tree | grep -i "commons-httpclient" > > Using `mvn` from path: > /Users/sumeet.gajjar/cloudera/upstream-spark/build/apache-maven-3.6.3/bin/mvn > [INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile > [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:provided > {code} > commons-httpclient went EOL years ago and there are most likely CVEs not > being reported against it, thus we should remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35429) Remove commons-httpclient due to EOL and CVEs
[ https://issues.apache.org/jira/browse/SPARK-35429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-35429: --- Affects Version/s: 3.2.0 > Remove commons-httpclient due to EOL and CVEs > - > > Key: SPARK-35429 > URL: https://issues.apache.org/jira/browse/SPARK-35429 > Project: Spark > Issue Type: Task > Components: Spark Core, SQL >Affects Versions: 3.0.0, 3.1.1, 3.2.0 >Reporter: Sumeet >Priority: Major > > Spark is pulling in commons-httpclient as a dependency directly. See > dependency:tree: > {code:java} > ./build/mvn dependency:tree | grep -i "commons-httpclient" > > Using `mvn` from path: > /Users/sumeet.gajjar/cloudera/upstream-spark/build/apache-maven-3.6.3/bin/mvn > [INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile > [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:provided > {code} > commons-httpclient went EOL years ago and there are most likely CVEs not > being reported against it, thus we should remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35429) Remove commons-httpclient due to EOL and CVEs
[ https://issues.apache.org/jira/browse/SPARK-35429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17346464#comment-17346464 ] Sumeet commented on SPARK-35429: I am working on this. > Remove commons-httpclient due to EOL and CVEs > - > > Key: SPARK-35429 > URL: https://issues.apache.org/jira/browse/SPARK-35429 > Project: Spark > Issue Type: Task > Components: Spark Core, SQL >Affects Versions: 3.0.0, 3.1.1 >Reporter: Sumeet >Priority: Major > > Spark is pulling in commons-httpclient as a dependency directly. See > dependency:tree: > {code:java} > ./build/mvn dependency:tree | grep -i "commons-httpclient" > > Using `mvn` from path: > /Users/sumeet.gajjar/cloudera/upstream-spark/build/apache-maven-3.6.3/bin/mvn > [INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile > [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:provided > {code} > commons-httpclient went EOL years ago and there are most likely CVEs not > being reported against it, thus we should remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35429) Remove commons-httpclient due to EOL and CVEs
Sumeet created SPARK-35429: -- Summary: Remove commons-httpclient due to EOL and CVEs Key: SPARK-35429 URL: https://issues.apache.org/jira/browse/SPARK-35429 Project: Spark Issue Type: Task Components: Spark Core, SQL Affects Versions: 3.1.1, 3.0.0 Reporter: Sumeet Spark is pulling in commons-httpclient as a dependency directly. See dependency:tree: {code:java} ./build/mvn dependency:tree | grep -i "commons-httpclient" Using `mvn` from path: /Users/sumeet.gajjar/cloudera/upstream-spark/build/apache-maven-3.6.3/bin/mvn [INFO] +- commons-httpclient:commons-httpclient:jar:3.1:compile [INFO] | +- commons-httpclient:commons-httpclient:jar:3.1:provided {code} commons-httpclient went EOL years ago and there are most likely CVEs not being reported against it, thus we should remove it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35011) Avoid Block Manager registerations when StopExecutor msg is in-flight.
[ https://issues.apache.org/jira/browse/SPARK-35011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318679#comment-17318679 ] Sumeet commented on SPARK-35011: [~holden], [~dongjoon], [~attilapiros] could you please take a look at this? > Avoid Block Manager registerations when StopExecutor msg is in-flight. > -- > > Key: SPARK-35011 > URL: https://issues.apache.org/jira/browse/SPARK-35011 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0, 3.1.1 >Reporter: Sumeet >Priority: Major > Labels: BlockManager, core > > *Note:* This is a follow-up on SPARK-34949, even after the heartbeat fix, > driver reports dead executors as alive. > *Problem:* > I was testing Dynamic Allocation on K8s with about 300 executors. While doing > so, when the executors were torn down due to > "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor > pods being removed from K8s, however, under the "Executors" tab in SparkUI, I > could see some executors listed as alive. > [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] > also returned a value greater than 1. > > *Cause:* > * "CoarseGrainedSchedulerBackend" issues async "StopExecutor" on > executorEndpoint > * "CoarseGrainedSchedulerBackend" removes that executor from Driver's > internal data structures and publishes "SparkListenerExecutorRemoved" on the > "listenerBus". > * Executor has still not processed "StopExecutor" from the Driver > * Driver receives heartbeat from the Executor, since it cannot find the > "executorId" in its data structures, it responds with > "HeartbeatResponse(reregisterBlockManager = true)" > * "BlockManager" on the Executor reregisters with the "BlockManagerMaster" > and "SparkListenerBlockManagerAdded" is published on the "listenerBus" > * Executor starts processing the "StopExecutor" and exits > * "AppStatusListener" picks the "SparkListenerBlockManagerAdded" event and > updates "AppStatusStore" > * "statusTracker.getExecutorInfos" refers "AppStatusStore" to get the list > of executors which returns the dead executor as alive. > > *Proposed Solution:* > Maintain a Cache of recently removed executors on Driver. During the > registration in BlockManagerMasterEndpoint if the BlockManager belongs to a > recently removed executor, return None indicating the registration is ignored > since the executor will be shutting down soon. > On BlockManagerHeartbeat, if the BlockManager belongs to a recently removed > executor, return true indicating the driver knows about it, thereby > preventing reregisteration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35011) Avoid Block Manager registerations when StopExecutor msg is in-flight.
[ https://issues.apache.org/jira/browse/SPARK-35011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-35011: --- Description: *Note:* This is a follow-up on SPARK-34949, even after the heartbeat fix, driver reports dead executors as alive. *Problem:* I was testing Dynamic Allocation on K8s with about 300 executors. While doing so, when the executors were torn down due to "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor pods being removed from K8s, however, under the "Executors" tab in SparkUI, I could see some executors listed as alive. [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] also returned a value greater than 1. *Cause:* * "CoarseGrainedSchedulerBackend" issues async "StopExecutor" on executorEndpoint * "CoarseGrainedSchedulerBackend" removes that executor from Driver's internal data structures and publishes "SparkListenerExecutorRemoved" on the "listenerBus". * Executor has still not processed "StopExecutor" from the Driver * Driver receives heartbeat from the Executor, since it cannot find the "executorId" in its data structures, it responds with "HeartbeatResponse(reregisterBlockManager = true)" * "BlockManager" on the Executor reregisters with the "BlockManagerMaster" and "SparkListenerBlockManagerAdded" is published on the "listenerBus" * Executor starts processing the "StopExecutor" and exits * "AppStatusListener" picks the "SparkListenerBlockManagerAdded" event and updates "AppStatusStore" * "statusTracker.getExecutorInfos" refers "AppStatusStore" to get the list of executors which returns the dead executor as alive. *Proposed Solution:* Maintain a Cache of recently removed executors on Driver. During the registration in BlockManagerMasterEndpoint if the BlockManager belongs to a recently removed executor, return None indicating the registration is ignored since the executor will be shutting down soon. On BlockManagerHeartbeat, if the BlockManager belongs to a recently removed executor, return true indicating the driver knows about it, thereby preventing reregisteration. was: *Note:* This is a follow-up on SPARK-34949, even after the heartbeat fix, driver reports dead executors as alive. *Problem:* I was testing Dynamic Allocation on K8s with about 300 executors. While doing so, when the executors were torn down due to "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor pods being removed from K8s, however, under the "Executors" tab in SparkUI, I could see some executors listed as alive. [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] also returned a value greater than 1. *Cause:* * "CoarseGrainedSchedulerBackend" issues async "StopExecutor" on executorEndpoint * "CoarseGrainedSchedulerBackend" removes that executor from Driver's internal data structures and publishes "SparkListenerExecutorRemoved" on the "listenerBus". * Executor has still not processed "StopExecutor" from the Driver * Driver receives heartbeat from the Executor, since it cannot find the "executorId" in its data structures, it responds with "HeartbeatResponse(reregisterBlockManager = true)" * "BlockManager" on the Executor reregisters with the "BlockManagerMaster" and "SparkListenerBlockManagerAdded" is published on the "listenerBus" * Executor starts processing the "StopExecutor" and exits * "AppStatusListener" picks the "SparkListenerBlockManagerAdded" event and updates "AppStatusStore" * "statusTracker.getExecutorInfos" refers "AppStatusStore" to get the list of executors which returns the dead executor as alive. *Proposed Solution:* Maintain a Cache of recently removed executors on Driver. During the registration in BlockManagerMasterEndpoint if the BlockManager belongs to a recently removed executor, return None indicating the registration is ignored since the executor will be shutting down soon. On BlockManagerHeartbeat, if the BlockManager belongs to a recently removed executor, return true indicating the driver knows about it, thereby preventing reregisteration. > Avoid Block Manager registerations when StopExecutor msg is in-flight. > -- > > Key: SPARK-35011 > URL: https://issues.apache.org/jira/browse/SPARK-35011 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0, 3.1.1 >Reporter: Sumeet >Priority: Major > Labels: BlockManager, core > > *Note:* This is a follow-up on SPARK-34949, even
[jira] [Commented] (SPARK-35011) Avoid Block Manager registerations when StopExecutor msg is in-flight.
[ https://issues.apache.org/jira/browse/SPARK-35011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17318174#comment-17318174 ] Sumeet commented on SPARK-35011: I am working on this. > Avoid Block Manager registerations when StopExecutor msg is in-flight. > -- > > Key: SPARK-35011 > URL: https://issues.apache.org/jira/browse/SPARK-35011 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0, 3.1.1 >Reporter: Sumeet >Priority: Major > Labels: BlockManager, core > > *Note:* This is a follow-up on SPARK-34949, even after the heartbeat fix, > driver reports dead executors as alive. > *Problem:* > I was testing Dynamic Allocation on K8s with about 300 executors. While doing > so, when the executors were torn down due to > "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor > pods being removed from K8s, however, under the "Executors" tab in SparkUI, I > could see some executors listed as alive. > [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] > also returned a value greater than 1. > > *Cause:* > * "CoarseGrainedSchedulerBackend" issues async "StopExecutor" on > executorEndpoint > * "CoarseGrainedSchedulerBackend" removes that executor from Driver's > internal data structures and publishes "SparkListenerExecutorRemoved" on the > "listenerBus". > * Executor has still not processed "StopExecutor" from the Driver > * Driver receives heartbeat from the Executor, since it cannot find the > "executorId" in its data structures, it responds with > "HeartbeatResponse(reregisterBlockManager = true)" > * "BlockManager" on the Executor reregisters with the "BlockManagerMaster" > and "SparkListenerBlockManagerAdded" is published on the "listenerBus" > * Executor starts processing the "StopExecutor" and exits > * "AppStatusListener" picks the "SparkListenerBlockManagerAdded" event and > updates "AppStatusStore" > * "statusTracker.getExecutorInfos" refers "AppStatusStore" to get the list > of executors which returns the dead executor as alive. > > *Proposed Solution:* > Maintain a Cache of recently removed executors on Driver. During the > registration in BlockManagerMasterEndpoint if the BlockManager belongs to a > recently removed executor, return None indicating the registration is ignored > since the executor will be shutting down soon. > On BlockManagerHeartbeat, if the BlockManager belongs to a recently removed > executor, return true indicating the driver knows about it, thereby > preventing reregisteration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-35011) Avoid Block Manager registerations when StopExecutor msg is in-flight.
Sumeet created SPARK-35011: -- Summary: Avoid Block Manager registerations when StopExecutor msg is in-flight. Key: SPARK-35011 URL: https://issues.apache.org/jira/browse/SPARK-35011 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.1.1, 3.2.0 Reporter: Sumeet *Note:* This is a follow-up on SPARK-34949, even after the heartbeat fix, driver reports dead executors as alive. *Problem:* I was testing Dynamic Allocation on K8s with about 300 executors. While doing so, when the executors were torn down due to "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor pods being removed from K8s, however, under the "Executors" tab in SparkUI, I could see some executors listed as alive. [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] also returned a value greater than 1. *Cause:* * "CoarseGrainedSchedulerBackend" issues async "StopExecutor" on executorEndpoint * "CoarseGrainedSchedulerBackend" removes that executor from Driver's internal data structures and publishes "SparkListenerExecutorRemoved" on the "listenerBus". * Executor has still not processed "StopExecutor" from the Driver * Driver receives heartbeat from the Executor, since it cannot find the "executorId" in its data structures, it responds with "HeartbeatResponse(reregisterBlockManager = true)" * "BlockManager" on the Executor reregisters with the "BlockManagerMaster" and "SparkListenerBlockManagerAdded" is published on the "listenerBus" * Executor starts processing the "StopExecutor" and exits * "AppStatusListener" picks the "SparkListenerBlockManagerAdded" event and updates "AppStatusStore" * "statusTracker.getExecutorInfos" refers "AppStatusStore" to get the list of executors which returns the dead executor as alive. *Proposed Solution:* Maintain a Cache of recently removed executors on Driver. During the registration in BlockManagerMasterEndpoint if the BlockManager belongs to a recently removed executor, return None indicating the registration is ignored since the executor will be shutting down soon. On BlockManagerHeartbeat, if the BlockManager belongs to a recently removed executor, return true indicating the driver knows about it, thereby preventing reregisteration. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34949) Executor.reportHeartBeat reregisters blockManager even when Executor is shutting down
[ https://issues.apache.org/jira/browse/SPARK-34949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-34949: --- Affects Version/s: 3.1.1 > Executor.reportHeartBeat reregisters blockManager even when Executor is > shutting down > - > > Key: SPARK-34949 > URL: https://issues.apache.org/jira/browse/SPARK-34949 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0, 3.1.1 > Environment: Resource Manager: K8s >Reporter: Sumeet >Priority: Major > Labels: Executor, heartbeat > > *Problem:* > I was testing Dynamic Allocation on K8s with about 300 executors. While doing > so, when the executors were torn down due to > "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor > pods being removed from K8s, however, under the "Executors" tab in SparkUI, I > could see some executors listed as alive. > [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] > also returned a value greater than 1. > > *Cause:* > * "CoarseGrainedSchedulerBackend" issues RemoveExecutor on a > "executorEndpoint" and publishes "SparkListenerExecutorRemoved" on the > "listenerBus" > * "CoarseGrainedExecutorBackend" starts the executor shutdown > * "HeartbeatReceiver" picks the "SparkListenerExecutorRemoved" event and > removes the executor from "executorLastSeen" > * In the meantime, the executor reports a Heartbeat. Now "HeartbeatReceiver" > cannot find the "executorId" in "executorLastSeen" and hence responds with > "HeartbeatResponse(reregisterBlockManager = true)" > * The Executor now calls "env.blockManager.reregister()" and reregisters > itself thus creating inconsistency > > *Proposed Solution:* > The "reportHeartBeat" method is not aware of the fact that Executor is > shutting down, it should check "executorShutdown" before reregistering. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34949) Executor.reportHeartBeat reregisters blockManager even when Executor is shutting down
[ https://issues.apache.org/jira/browse/SPARK-34949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-34949: --- Priority: Major (was: Minor) > Executor.reportHeartBeat reregisters blockManager even when Executor is > shutting down > - > > Key: SPARK-34949 > URL: https://issues.apache.org/jira/browse/SPARK-34949 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 > Environment: Resource Manager: K8s >Reporter: Sumeet >Priority: Major > Labels: Executor, heartbeat > > *Problem:* > I was testing Dynamic Allocation on K8s with about 300 executors. While doing > so, when the executors were torn down due to > "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor > pods being removed from K8s, however, under the "Executors" tab in SparkUI, I > could see some executors listed as alive. > [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] > also returned a value greater than 1. > > *Cause:* > * "CoarseGrainedSchedulerBackend" issues RemoveExecutor on a > "executorEndpoint" and publishes "SparkListenerExecutorRemoved" on the > "listenerBus" > * "CoarseGrainedExecutorBackend" starts the executor shutdown > * "HeartbeatReceiver" picks the "SparkListenerExecutorRemoved" event and > removes the executor from "executorLastSeen" > * In the meantime, the executor reports a Heartbeat. Now "HeartbeatReceiver" > cannot find the "executorId" in "executorLastSeen" and hence responds with > "HeartbeatResponse(reregisterBlockManager = true)" > * The Executor now calls "env.blockManager.reregister()" and reregisters > itself thus creating inconsistency > > *Proposed Solution:* > The "reportHeartBeat" method is not aware of the fact that Executor is > shutting down, it should check "executorShutdown" before reregistering. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-34949) Executor.reportHeartBeat reregisters blockManager even when Executor is shutting down
Sumeet created SPARK-34949: -- Summary: Executor.reportHeartBeat reregisters blockManager even when Executor is shutting down Key: SPARK-34949 URL: https://issues.apache.org/jira/browse/SPARK-34949 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.0 Environment: Resource Manager: K8s Reporter: Sumeet *Problem:* I was testing Dynamic Allocation on K8s with about 300 executors. While doing so, when the executors were torn down due to "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor pods being removed from K8s, however, under the "Executors" tab in SparkUI, I could see some executors listed as alive. [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] also returned a value greater than 1. *Cause:* * "CoarseGrainedSchedulerBackend" issues RemoveExecutor on a "executorEndpoint" and publishes "SparkListenerExecutorRemoved" on the "listenerBus" * "CoarseGrainedExecutorBackend" starts the executor shutdown * "HeartbeatReceiver" picks the "SparkListenerExecutorRemoved" event and removes the executor from "executorLastSeen" * In the meantime, the executor reports a Heartbeat. Now "HeartbeatReceiver" cannot find the "executorId" in "executorLastSeen" and hence responds with "HeartbeatResponse(reregisterBlockManager = true)" * The Executor now calls "env.blockManager.reregister()" and reregisters itself thus creating inconsistency *Proposed Solution:* The "reportHeartBeat" method is not aware of the fact that Executor is shutting down, it should check "executorShutdown" before reregistering. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-34949) Executor.reportHeartBeat reregisters blockManager even when Executor is shutting down
[ https://issues.apache.org/jira/browse/SPARK-34949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314194#comment-17314194 ] Sumeet commented on SPARK-34949: I am working on this. > Executor.reportHeartBeat reregisters blockManager even when Executor is > shutting down > - > > Key: SPARK-34949 > URL: https://issues.apache.org/jira/browse/SPARK-34949 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 > Environment: Resource Manager: K8s >Reporter: Sumeet >Priority: Minor > Labels: Executor, heartbeat > > *Problem:* > I was testing Dynamic Allocation on K8s with about 300 executors. While doing > so, when the executors were torn down due to > "spark.dynamicAllocation.executorIdleTimeout", I noticed all the executor > pods being removed from K8s, however, under the "Executors" tab in SparkUI, I > could see some executors listed as alive. > [spark.sparkContext.statusTracker.getExecutorInfos.length|https://github.com/apache/spark/blob/65da9287bc5112564836a555cd2967fc6b05856f/core/src/main/scala/org/apache/spark/SparkStatusTracker.scala#L100] > also returned a value greater than 1. > > *Cause:* > * "CoarseGrainedSchedulerBackend" issues RemoveExecutor on a > "executorEndpoint" and publishes "SparkListenerExecutorRemoved" on the > "listenerBus" > * "CoarseGrainedExecutorBackend" starts the executor shutdown > * "HeartbeatReceiver" picks the "SparkListenerExecutorRemoved" event and > removes the executor from "executorLastSeen" > * In the meantime, the executor reports a Heartbeat. Now "HeartbeatReceiver" > cannot find the "executorId" in "executorLastSeen" and hence responds with > "HeartbeatResponse(reregisterBlockManager = true)" > * The Executor now calls "env.blockManager.reregister()" and reregisters > itself thus creating inconsistency > > *Proposed Solution:* > The "reportHeartBeat" method is not aware of the fact that Executor is > shutting down, it should check "executorShutdown" before reregistering. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-32611) Querying ORC table in Spark3 using spark.sql.orc.impl=hive produces incorrect when timestamp is present in predicate
[ https://issues.apache.org/jira/browse/SPARK-32611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumeet updated SPARK-32611: --- Description: *How to reproduce this behavior?* * TZ="America/Los_Angeles" ./bin/spark-shell * sql("set spark.sql.hive.convertMetastoreOrc=true") * sql("set spark.sql.orc.impl=hive") * sql("create table t_spark(col timestamp) stored as orc;") * sql("insert into t_spark values (cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp));") * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) *This will return empty results, which is incorrect.* * sql("set spark.sql.orc.impl=native") * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) *This will return 1 row, which is the expected output.* The above query using (True, hive) returns *correct results if pushdown filters are turned off*. * sql("set spark.sql.orc.filterPushdown=false") * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) *This will return 1 row, which is the expected output.* was: *How to reproduce this behavior?* * TZ="America/Los_Angeles" ./bin/spark-shell --conf spark.sql.catalogImplementation=hive * sql("set spark.sql.hive.convertMetastoreOrc=true") * sql("set spark.sql.orc.impl=hive") * sql("create table t_spark(col timestamp) stored as orc;") * sql("insert into t_spark values (cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp));") * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) *This will return empty results, which is incorrect.* * sql("set spark.sql.orc.impl=native") * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) *This will return 1 row, which is the expected output.* The above query using (True, hive) returns *correct results if pushdown filters are turned off*. * sql("set spark.sql.orc.filterPushdown=false") * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) *This will return 1 row, which is the expected output.* > Querying ORC table in Spark3 using spark.sql.orc.impl=hive produces incorrect > when timestamp is present in predicate > > > Key: SPARK-32611 > URL: https://issues.apache.org/jira/browse/SPARK-32611 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sumeet >Priority: Major > > *How to reproduce this behavior?* > * TZ="America/Los_Angeles" ./bin/spark-shell > * sql("set spark.sql.hive.convertMetastoreOrc=true") > * sql("set spark.sql.orc.impl=hive") > * sql("create table t_spark(col timestamp) stored as orc;") > * sql("insert into t_spark values (cast('2100-01-01 > 01:33:33.123America/Los_Angeles' as timestamp));") > * sql("select col, date_format(col, 'DD') from t_spark where col = > cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) > *This will return empty results, which is incorrect.* > * sql("set spark.sql.orc.impl=native") > * sql("select col, date_format(col, 'DD') from t_spark where col = > cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) > *This will return 1 row, which is the expected output.* > > The above query using (True, hive) returns *correct results if pushdown > filters are turned off*. > * sql("set spark.sql.orc.filterPushdown=false") > * sql("select col, date_format(col, 'DD') from t_spark where col = > cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) > *This will return 1 row, which is the expected output.* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-32611) Querying ORC table in Spark3 using spark.sql.orc.impl=hive produces incorrect when timestamp is present in predicate
Sumeet created SPARK-32611: -- Summary: Querying ORC table in Spark3 using spark.sql.orc.impl=hive produces incorrect when timestamp is present in predicate Key: SPARK-32611 URL: https://issues.apache.org/jira/browse/SPARK-32611 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0, 3.0.1 Reporter: Sumeet *How to reproduce this behavior?* * TZ="America/Los_Angeles" ./bin/spark-shell --conf spark.sql.catalogImplementation=hive * sql("set spark.sql.hive.convertMetastoreOrc=true") * sql("set spark.sql.orc.impl=hive") * sql("create table t_spark(col timestamp) stored as orc;") * sql("insert into t_spark values (cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp));") * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) *This will return empty results, which is incorrect.* * sql("set spark.sql.orc.impl=native") * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) *This will return 1 row, which is the expected output.* The above query using (True, hive) returns *correct results if pushdown filters are turned off*. * sql("set spark.sql.orc.filterPushdown=false") * sql("select col, date_format(col, 'DD') from t_spark where col = cast('2100-01-01 01:33:33.123America/Los_Angeles' as timestamp);").show(false) *This will return 1 row, which is the expected output.* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org