[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261384#comment-14261384 ] Sandy Ryza commented on SPARK-4921: --- Ah, makes sense. In the query, are some splits NODE_LOCAL and others NO_PREF? Or all NO_PREF? Looking deeper into the issue, as far as I can tell, changing the return value to NO_PREF as described above should have no effect at all in any scenario. Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks - Key: SPARK-4921 URL: https://issues.apache.org/jira/browse/SPARK-4921 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Xuefu Zhang Attachments: NO_PREF.patch During research for HIVE-9153, we found that TaskSetManager returns PROCESS_LOCAL for NO_PREF tasks, which may caused performance degradation. Changing the return value to NO_PREF, as demonstrated in the attached patch, seemingly improves the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14261436#comment-14261436 ] Xuefu Zhang commented on SPARK-4921: Some will be NODE_LOCAL, but others will be NO_PERF. Returning PROCESS_LOCAL seems at least confusing. As to performance implication, maybe [~lirui] can further confirm. Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks - Key: SPARK-4921 URL: https://issues.apache.org/jira/browse/SPARK-4921 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Xuefu Zhang Attachments: NO_PREF.patch During research for HIVE-9153, we found that TaskSetManager returns PROCESS_LOCAL for NO_PREF tasks, which may caused performance degradation. Changing the return value to NO_PREF, as demonstrated in the attached patch, seemingly improves the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260322#comment-14260322 ] Sandy Ryza commented on SPARK-4921: --- Offline [~xuefuz] and [~lirui] mentioned to me that the query they noticed this with was {code} select count(*) from store_sales where ss_sold_date_sk is not null; {code} Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks - Key: SPARK-4921 URL: https://issues.apache.org/jira/browse/SPARK-4921 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Xuefu Zhang Attachments: NO_PREF.patch During research for HIVE-9153, we found that TaskSetManager returns PROCESS_LOCAL for NO_PREF tasks, which may caused performance degradation. Changing the return value to NO_PREF, as demonstrated in the attached patch, seemingly improves the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260325#comment-14260325 ] Sandy Ryza commented on SPARK-4921: --- [~xuefuz] [~lirui] was that query against data not in HDFS? If the data is in HDFS, we'd expect NODE_LOCAL, not NO_PREF, right? Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks - Key: SPARK-4921 URL: https://issues.apache.org/jira/browse/SPARK-4921 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Xuefu Zhang Attachments: NO_PREF.patch During research for HIVE-9153, we found that TaskSetManager returns PROCESS_LOCAL for NO_PREF tasks, which may caused performance degradation. Changing the return value to NO_PREF, as demonstrated in the attached patch, seemingly improves the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14259562#comment-14259562 ] Apache Spark commented on SPARK-4921: - User 'sryza' has created a pull request for this issue: https://github.com/apache/spark/pull/3816 Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks - Key: SPARK-4921 URL: https://issues.apache.org/jira/browse/SPARK-4921 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Xuefu Zhang Attachments: NO_PREF.patch During research for HIVE-9153, we found that TaskSetManager returns PROCESS_LOCAL for NO_PREF tasks, which may caused performance degradation. Changing the return value to NO_PREF, as demonstrated in the attached patch, seemingly improves the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14257940#comment-14257940 ] Sandy Ryza commented on SPARK-4921: --- Is there a barebones Spark program that I could use to reproduce this? Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks - Key: SPARK-4921 URL: https://issues.apache.org/jira/browse/SPARK-4921 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Xuefu Zhang Attachments: NO_PREF.patch During research for HIVE-9153, we found that TaskSetManager returns PROCESS_LOCAL for NO_PREF tasks, which may caused performance degradation. Changing the return value to NO_PREF, as demonstrated in the attached patch, seemingly improves the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256015#comment-14256015 ] Xuefu Zhang commented on SPARK-4921: cc: [~lirui], [~sandyr] Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks - Key: SPARK-4921 URL: https://issues.apache.org/jira/browse/SPARK-4921 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Xuefu Zhang During research for HIVE-9153, we found that TaskSetManager returns PROCESS_LOCAL for NO_PREF tasks, which may caused performance degradation. Changing the return value to NO_PREF, as demonstrated in the attached patch, seemingly improves the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4921) Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks
[ https://issues.apache.org/jira/browse/SPARK-4921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14256425#comment-14256425 ] Rui Li commented on SPARK-4921: --- I'm not sure if this is intended, but returning process_local for no_pref tasks may reset {{currentLocalityIndex}} to 0 which may cause more delay later. Seems there's a check to avoid this but I doubt it's sufficient: {code} // Update our locality level for delay scheduling // NO_PREF will not affect the variables related to delay scheduling if (maxLocality != TaskLocality.NO_PREF) { currentLocalityIndex = getLocalityIndex(taskLocality) lastLaunchTime = curTime } {code} Performance issue caused by TaskSetManager returning PROCESS_LOCAL for NO_PREF tasks - Key: SPARK-4921 URL: https://issues.apache.org/jira/browse/SPARK-4921 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.2.0 Reporter: Xuefu Zhang Attachments: NO_PREF.patch During research for HIVE-9153, we found that TaskSetManager returns PROCESS_LOCAL for NO_PREF tasks, which may caused performance degradation. Changing the return value to NO_PREF, as demonstrated in the attached patch, seemingly improves the performance. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org