[GitHub] [spark] AmplabJenkins removed a comment on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
AmplabJenkins removed a comment on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError URL: https://github.com/apache/spark/pull/26924#issuecomment-568161229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20439/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
AmplabJenkins removed a comment on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError URL: https://github.com/apache/spark/pull/26924#issuecomment-568161227 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
AmplabJenkins commented on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError URL: https://github.com/apache/spark/pull/26924#issuecomment-568161229 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20439/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
AmplabJenkins commented on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError URL: https://github.com/apache/spark/pull/26924#issuecomment-568161227 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sev7e0 commented on issue #26951: [SPARK-30304][CORE]When the specified shufflemanager is incorrect, print the prompt.
sev7e0 commented on issue #26951: [SPARK-30304][CORE]When the specified shufflemanager is incorrect, print the prompt. URL: https://github.com/apache/spark/pull/26951#issuecomment-568161025 > Meh, maybe. This is already telling you the class you specified doesn't exist. Well, as you said, it's just that there's no obvious mention in the log of where the error occurred. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
SparkQA commented on issue #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError URL: https://github.com/apache/spark/pull/26924#issuecomment-568160958 **[Test build #115640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115640/testReport)** for PR 26924 at commit [`3d7f435`](https://github.com/apache/spark/commit/3d7f435f8452faff71b98a9163cd8e86e77c0a79). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError
wangshuo128 commented on a change in pull request #26924: [SPARK-30285][CORE] Fix deadlock between LiveListenerBus#stop and AsyncEventQueue#removeListenerOnError URL: https://github.com/apache/spark/pull/26924#discussion_r360633022 ## File path: core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala ## @@ -529,6 +529,46 @@ class SparkListenerSuite extends SparkFunSuite with LocalSparkContext with Match } } + Seq(true, false).foreach { throwInterruptedException => +val suffix = if (throwInterruptedException) "throw interrupt" else "set Thread interrupted" +test(s"SPARK-30285: Fix deadlock in AsyncEventQueue.removeListenerOnError: $suffix") { + val conf = new SparkConf(false) +.set(LISTENER_BUS_EVENT_QUEUE_CAPACITY, 5) + val bus = new LiveListenerBus(conf) + val counter1 = new BasicJobCounter() + val counter2 = new BasicJobCounter() + val interruptingListener = new DelayInterruptingJobCounter(throwInterruptedException, 3) + bus.addToSharedQueue(counter1) + bus.addToSharedQueue(interruptingListener) + bus.addToEventLogQueue(counter2) + assert(bus.activeQueues() === Set(SHARED_QUEUE, EVENT_LOG_QUEUE)) + assert(bus.findListenersByClass[BasicJobCounter]().size === 2) + assert(bus.findListenersByClass[DelayInterruptingJobCounter]().size === 1) + + bus.start(mockSparkContext, mockMetricsSystem) + + (0 until 5).foreach { jobId => +bus.post(SparkListenerJobEnd(jobId, jobCompletionTime, JobSucceeded)) + } + + // Call bus.stop in a separate thread, otherwise we will block here until bus is stopped + val stoppingThread = new Thread(() => { +bus.stop() + }) + stoppingThread.start() + // Notify interrupting listener starts to work + interruptingListener.sleep = false Review comment: Maybe we could check the `stopped` status of `bus` in the listener. This would be better than using a `CountDownLatch`, however, it can't get rid of racing completely. WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568157233 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568157234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115639/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568157234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115639/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568157233 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
SparkQA removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568150044 **[Test build #115639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115639/testReport)** for PR 26910 at commit [`d257dce`](https://github.com/apache/spark/commit/d257dce703986e4ed325af0bae07c88a467a684e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
SparkQA commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568157008 **[Test build #115639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115639/testReport)** for PR 26910 at commit [`d257dce`](https://github.com/apache/spark/commit/d257dce703986e4ed325af0bae07c88a467a684e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r360629642 ## File path: sql/core/src/test/resources/sql-tests/inputs/group-by-filter.sql ## @@ -0,0 +1,156 @@ +-- Test filter clause for aggregate expression. + +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES +(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null) +AS testData(a, b); + +CREATE OR REPLACE TEMPORARY VIEW EMP AS SELECT * FROM VALUES + (100, "emp 1", date "2005-01-01", 100.00D, 10), + (100, "emp 1", date "2005-01-01", 100.00D, 10), + (200, "emp 2", date "2003-01-01", 200.00D, 10), + (300, "emp 3", date "2002-01-01", 300.00D, 20), + (400, "emp 4", date "2005-01-01", 400.00D, 30), + (500, "emp 5", date "2001-01-01", 400.00D, NULL), + (600, "emp 6 - no dept", date "2001-01-01", 400.00D, 100), + (700, "emp 7", date "2010-01-01", 400.00D, 100), + (800, "emp 8", date "2016-01-01", 150.00D, 70) +AS EMP(id, emp_name, hiredate, salary, dept_id); + +CREATE OR REPLACE TEMPORARY VIEW DEPT AS SELECT * FROM VALUES + (10, "dept 1", "CA"), + (20, "dept 2", "NY"), + (30, "dept 3", "TX"), + (40, "dept 4 - unassigned", "OR"), + (50, "dept 5 - unassigned", "NJ"), + (70, "dept 7", "FL") +AS DEPT(dept_id, dept_name, state); + +-- Aggregate with filter and empty GroupBy expressions. +SELECT a, COUNT(b) FILTER (WHERE a >= 2) FROM testData; +SELECT COUNT(a) FILTER (WHERE a = 1), COUNT(b) FILTER (WHERE a > 1) FROM testData; +SELECT COUNT(id) FILTER (WHERE hiredate = date "2001-01-01") FROM emp; +SELECT COUNT(id) FILTER (WHERE hiredate = to_date('2001-01-01 00:00:00')) FROM emp; +SELECT COUNT(id) FILTER (WHERE hiredate = to_timestamp("2001-01-01 00:00:00")) FROM emp; +SELECT COUNT(id) FILTER (WHERE date_format(hiredate, "-MM-dd") = "2001-01-01") FROM emp; +-- [SPARK-30276] Support Filter expression allows simultaneous use of DISTINCT +-- SELECT COUNT(DISTINCT id) FILTER (WHERE date_format(hiredate, "-MM-dd HH:mm:ss") = "2001-01-01 00:00:00") FROM emp; + +-- Aggregate with filter and non-empty GroupBy expressions. +SELECT a, COUNT(b) FILTER (WHERE a >= 2) FROM testData GROUP BY a; +SELECT a, COUNT(b) FILTER (WHERE a != 2) FROM testData GROUP BY b; +SELECT COUNT(a) FILTER (WHERE a >= 0), COUNT(b) FILTER (WHERE a >= 3) FROM testData GROUP BY a; +SELECT dept_id, SUM(salary) FILTER (WHERE hiredate > date "2003-01-01") FROM emp GROUP BY dept_id; +SELECT dept_id, SUM(salary) FILTER (WHERE hiredate > to_date("2003-01-01")) FROM emp GROUP BY dept_id; +SELECT dept_id, SUM(salary) FILTER (WHERE hiredate > to_timestamp("2003-01-01 00:00:00")) FROM emp GROUP BY dept_id; +SELECT dept_id, SUM(salary) FILTER (WHERE date_format(hiredate, "-MM-dd") > "2003-01-01") FROM emp GROUP BY dept_id; +-- [SPARK-30276] Support Filter expression allows simultaneous use of DISTINCT +-- SELECT dept_id, SUM(DISTINCT salary) FILTER (WHERE date_format(hiredate, "-MM-dd HH:mm:ss") > "2001-01-01 00:00:00") FROM emp GROUP BY dept_id; + +-- Aggregate with filter and grouped by literals. +SELECT 'foo', COUNT(a) FILTER (WHERE b <= 2) FROM testData GROUP BY 1; +SELECT 'foo', SUM(salary) FILTER (WHERE hiredate >= date "2003-01-01") FROM emp GROUP BY 1; +SELECT 'foo', SUM(salary) FILTER (WHERE hiredate >= to_date("2003-01-01")) FROM emp GROUP BY 1; +SELECT 'foo', SUM(salary) FILTER (WHERE hiredate >= to_timestamp("2003-01-01")) FROM emp GROUP BY 1; + +-- Aggregate with filter, more than one aggregate function goes with distinct. +select dept_id, count(distinct emp_name), count(distinct hiredate), sum(salary), sum(salary) filter (where id > 200) from emp group by dept_id; +select dept_id, count(distinct emp_name), count(distinct hiredate), sum(salary) filter (where salary < 400.00D), sum(salary) filter (where id > 200) from emp group by dept_id; +-- [SPARK-30276] Support Filter expression allows simultaneous use of DISTINCT +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate), sum(salary) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate) filter (where hiredate > date "2003-01-01"), sum(salary) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate) filter (where hiredate > date "2003-01-01"), sum(salary) filter (where salary < 400.00D) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate) filter (where hiredate > date "2003-01-01"), sum(salary) filter (where salary < 400.00D), sum(salary) filter (where id > 200) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct emp_name), sum(salary) from emp group by dept_id; +-- select dept_id,
[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r360629597 ## File path: sql/core/src/test/resources/sql-tests/inputs/group-by-filter.sql ## @@ -0,0 +1,156 @@ +-- Test filter clause for aggregate expression. + +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES +(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null) +AS testData(a, b); + +CREATE OR REPLACE TEMPORARY VIEW EMP AS SELECT * FROM VALUES + (100, "emp 1", date "2005-01-01", 100.00D, 10), + (100, "emp 1", date "2005-01-01", 100.00D, 10), + (200, "emp 2", date "2003-01-01", 200.00D, 10), + (300, "emp 3", date "2002-01-01", 300.00D, 20), + (400, "emp 4", date "2005-01-01", 400.00D, 30), + (500, "emp 5", date "2001-01-01", 400.00D, NULL), + (600, "emp 6 - no dept", date "2001-01-01", 400.00D, 100), + (700, "emp 7", date "2010-01-01", 400.00D, 100), + (800, "emp 8", date "2016-01-01", 150.00D, 70) +AS EMP(id, emp_name, hiredate, salary, dept_id); + +CREATE OR REPLACE TEMPORARY VIEW DEPT AS SELECT * FROM VALUES + (10, "dept 1", "CA"), + (20, "dept 2", "NY"), + (30, "dept 3", "TX"), + (40, "dept 4 - unassigned", "OR"), + (50, "dept 5 - unassigned", "NJ"), + (70, "dept 7", "FL") +AS DEPT(dept_id, dept_name, state); + +-- Aggregate with filter and empty GroupBy expressions. +SELECT a, COUNT(b) FILTER (WHERE a >= 2) FROM testData; +SELECT COUNT(a) FILTER (WHERE a = 1), COUNT(b) FILTER (WHERE a > 1) FROM testData; +SELECT COUNT(id) FILTER (WHERE hiredate = date "2001-01-01") FROM emp; +SELECT COUNT(id) FILTER (WHERE hiredate = to_date('2001-01-01 00:00:00')) FROM emp; +SELECT COUNT(id) FILTER (WHERE hiredate = to_timestamp("2001-01-01 00:00:00")) FROM emp; +SELECT COUNT(id) FILTER (WHERE date_format(hiredate, "-MM-dd") = "2001-01-01") FROM emp; +-- [SPARK-30276] Support Filter expression allows simultaneous use of DISTINCT +-- SELECT COUNT(DISTINCT id) FILTER (WHERE date_format(hiredate, "-MM-dd HH:mm:ss") = "2001-01-01 00:00:00") FROM emp; + +-- Aggregate with filter and non-empty GroupBy expressions. +SELECT a, COUNT(b) FILTER (WHERE a >= 2) FROM testData GROUP BY a; +SELECT a, COUNT(b) FILTER (WHERE a != 2) FROM testData GROUP BY b; +SELECT COUNT(a) FILTER (WHERE a >= 0), COUNT(b) FILTER (WHERE a >= 3) FROM testData GROUP BY a; +SELECT dept_id, SUM(salary) FILTER (WHERE hiredate > date "2003-01-01") FROM emp GROUP BY dept_id; +SELECT dept_id, SUM(salary) FILTER (WHERE hiredate > to_date("2003-01-01")) FROM emp GROUP BY dept_id; +SELECT dept_id, SUM(salary) FILTER (WHERE hiredate > to_timestamp("2003-01-01 00:00:00")) FROM emp GROUP BY dept_id; +SELECT dept_id, SUM(salary) FILTER (WHERE date_format(hiredate, "-MM-dd") > "2003-01-01") FROM emp GROUP BY dept_id; +-- [SPARK-30276] Support Filter expression allows simultaneous use of DISTINCT +-- SELECT dept_id, SUM(DISTINCT salary) FILTER (WHERE date_format(hiredate, "-MM-dd HH:mm:ss") > "2001-01-01 00:00:00") FROM emp GROUP BY dept_id; + +-- Aggregate with filter and grouped by literals. +SELECT 'foo', COUNT(a) FILTER (WHERE b <= 2) FROM testData GROUP BY 1; +SELECT 'foo', SUM(salary) FILTER (WHERE hiredate >= date "2003-01-01") FROM emp GROUP BY 1; +SELECT 'foo', SUM(salary) FILTER (WHERE hiredate >= to_date("2003-01-01")) FROM emp GROUP BY 1; +SELECT 'foo', SUM(salary) FILTER (WHERE hiredate >= to_timestamp("2003-01-01")) FROM emp GROUP BY 1; + +-- Aggregate with filter, more than one aggregate function goes with distinct. +select dept_id, count(distinct emp_name), count(distinct hiredate), sum(salary), sum(salary) filter (where id > 200) from emp group by dept_id; +select dept_id, count(distinct emp_name), count(distinct hiredate), sum(salary) filter (where salary < 400.00D), sum(salary) filter (where id > 200) from emp group by dept_id; +-- [SPARK-30276] Support Filter expression allows simultaneous use of DISTINCT +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate), sum(salary) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate) filter (where hiredate > date "2003-01-01"), sum(salary) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate) filter (where hiredate > date "2003-01-01"), sum(salary) filter (where salary < 400.00D) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate) filter (where hiredate > date "2003-01-01"), sum(salary) filter (where salary < 400.00D), sum(salary) filter (where id > 200) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct emp_name), sum(salary) from emp group by dept_id; +-- select dept_id,
[GitHub] [spark] beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on a change in pull request #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#discussion_r360629486 ## File path: sql/core/src/test/resources/sql-tests/inputs/group-by-filter.sql ## @@ -0,0 +1,156 @@ +-- Test filter clause for aggregate expression. + +-- Test data. +CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES +(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, null) +AS testData(a, b); + +CREATE OR REPLACE TEMPORARY VIEW EMP AS SELECT * FROM VALUES + (100, "emp 1", date "2005-01-01", 100.00D, 10), + (100, "emp 1", date "2005-01-01", 100.00D, 10), + (200, "emp 2", date "2003-01-01", 200.00D, 10), + (300, "emp 3", date "2002-01-01", 300.00D, 20), + (400, "emp 4", date "2005-01-01", 400.00D, 30), + (500, "emp 5", date "2001-01-01", 400.00D, NULL), + (600, "emp 6 - no dept", date "2001-01-01", 400.00D, 100), + (700, "emp 7", date "2010-01-01", 400.00D, 100), + (800, "emp 8", date "2016-01-01", 150.00D, 70) +AS EMP(id, emp_name, hiredate, salary, dept_id); + +CREATE OR REPLACE TEMPORARY VIEW DEPT AS SELECT * FROM VALUES + (10, "dept 1", "CA"), + (20, "dept 2", "NY"), + (30, "dept 3", "TX"), + (40, "dept 4 - unassigned", "OR"), + (50, "dept 5 - unassigned", "NJ"), + (70, "dept 7", "FL") +AS DEPT(dept_id, dept_name, state); + +-- Aggregate with filter and empty GroupBy expressions. +SELECT a, COUNT(b) FILTER (WHERE a >= 2) FROM testData; +SELECT COUNT(a) FILTER (WHERE a = 1), COUNT(b) FILTER (WHERE a > 1) FROM testData; +SELECT COUNT(id) FILTER (WHERE hiredate = date "2001-01-01") FROM emp; +SELECT COUNT(id) FILTER (WHERE hiredate = to_date('2001-01-01 00:00:00')) FROM emp; +SELECT COUNT(id) FILTER (WHERE hiredate = to_timestamp("2001-01-01 00:00:00")) FROM emp; +SELECT COUNT(id) FILTER (WHERE date_format(hiredate, "-MM-dd") = "2001-01-01") FROM emp; +-- [SPARK-30276] Support Filter expression allows simultaneous use of DISTINCT +-- SELECT COUNT(DISTINCT id) FILTER (WHERE date_format(hiredate, "-MM-dd HH:mm:ss") = "2001-01-01 00:00:00") FROM emp; + +-- Aggregate with filter and non-empty GroupBy expressions. +SELECT a, COUNT(b) FILTER (WHERE a >= 2) FROM testData GROUP BY a; +SELECT a, COUNT(b) FILTER (WHERE a != 2) FROM testData GROUP BY b; +SELECT COUNT(a) FILTER (WHERE a >= 0), COUNT(b) FILTER (WHERE a >= 3) FROM testData GROUP BY a; +SELECT dept_id, SUM(salary) FILTER (WHERE hiredate > date "2003-01-01") FROM emp GROUP BY dept_id; +SELECT dept_id, SUM(salary) FILTER (WHERE hiredate > to_date("2003-01-01")) FROM emp GROUP BY dept_id; +SELECT dept_id, SUM(salary) FILTER (WHERE hiredate > to_timestamp("2003-01-01 00:00:00")) FROM emp GROUP BY dept_id; +SELECT dept_id, SUM(salary) FILTER (WHERE date_format(hiredate, "-MM-dd") > "2003-01-01") FROM emp GROUP BY dept_id; +-- [SPARK-30276] Support Filter expression allows simultaneous use of DISTINCT +-- SELECT dept_id, SUM(DISTINCT salary) FILTER (WHERE date_format(hiredate, "-MM-dd HH:mm:ss") > "2001-01-01 00:00:00") FROM emp GROUP BY dept_id; + +-- Aggregate with filter and grouped by literals. +SELECT 'foo', COUNT(a) FILTER (WHERE b <= 2) FROM testData GROUP BY 1; +SELECT 'foo', SUM(salary) FILTER (WHERE hiredate >= date "2003-01-01") FROM emp GROUP BY 1; +SELECT 'foo', SUM(salary) FILTER (WHERE hiredate >= to_date("2003-01-01")) FROM emp GROUP BY 1; +SELECT 'foo', SUM(salary) FILTER (WHERE hiredate >= to_timestamp("2003-01-01")) FROM emp GROUP BY 1; + +-- Aggregate with filter, more than one aggregate function goes with distinct. +select dept_id, count(distinct emp_name), count(distinct hiredate), sum(salary), sum(salary) filter (where id > 200) from emp group by dept_id; +select dept_id, count(distinct emp_name), count(distinct hiredate), sum(salary) filter (where salary < 400.00D), sum(salary) filter (where id > 200) from emp group by dept_id; +-- [SPARK-30276] Support Filter expression allows simultaneous use of DISTINCT +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate), sum(salary) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate) filter (where hiredate > date "2003-01-01"), sum(salary) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate) filter (where hiredate > date "2003-01-01"), sum(salary) filter (where salary < 400.00D) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct hiredate) filter (where hiredate > date "2003-01-01"), sum(salary) filter (where salary < 400.00D), sum(salary) filter (where id > 200) from emp group by dept_id; +-- select dept_id, count(distinct emp_name) filter (where id > 200), count(distinct emp_name), sum(salary) from emp group by dept_id; +-- select dept_id,
[GitHub] [spark] viirya commented on a change in pull request #26809: [SPARK-30185][SQL] Implement Dataset.tail API
viirya commented on a change in pull request #26809: [SPARK-30185][SQL] Implement Dataset.tail API URL: https://github.com/apache/spark/pull/26809#discussion_r360628682 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ## @@ -309,20 +309,41 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ * UnsafeRow is highly compressible (at least 8 bytes for any column), the byte array is also * compressed. */ - private def getByteArrayRdd(n: Int = -1): RDD[(Long, Array[Byte])] = { + private def getByteArrayRdd(n: Int = -1, reverse: Boolean = false): RDD[(Long, Array[Byte])] = { execute().mapPartitionsInternal { iter => var count = 0 val buffer = new Array[Byte](4 << 10) // 4K val codec = CompressionCodec.createCodec(SparkEnv.get.conf) val bos = new ByteArrayOutputStream() val out = new DataOutputStream(codec.compressedOutputStream(bos)) - // `iter.hasNext` may produce one row and buffer it, we should only call it when the limit is - // not hit. - while ((n < 0 || count < n) && iter.hasNext) { -val row = iter.next().asInstanceOf[UnsafeRow] -out.writeInt(row.getSizeInBytes) -row.writeToStream(out, buffer) -count += 1 + + if (reverse) { +// To collect n from the last, we should anyway read everything with keeping the n. +// Otherwise, we don't know where is the last from the iterator. +var last: Seq[UnsafeRow] = Seq.empty[UnsafeRow] +if (n > 0) { + // NOTE: when reverse, -1 is not supported. There is no difference with collecting + // all and reversing. + val slidingIter = iter.map(_.copy()).sliding(n) + while (slidingIter.hasNext) { last = slidingIter.next().asInstanceOf[Seq[UnsafeRow]] } Review comment: do we need to copy rows if it is not last sliding window? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #26809: [SPARK-30185][SQL] Implement Dataset.tail API
viirya commented on a change in pull request #26809: [SPARK-30185][SQL] Implement Dataset.tail API URL: https://github.com/apache/spark/pull/26809#discussion_r360628682 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkPlan.scala ## @@ -309,20 +309,41 @@ abstract class SparkPlan extends QueryPlan[SparkPlan] with Logging with Serializ * UnsafeRow is highly compressible (at least 8 bytes for any column), the byte array is also * compressed. */ - private def getByteArrayRdd(n: Int = -1): RDD[(Long, Array[Byte])] = { + private def getByteArrayRdd(n: Int = -1, reverse: Boolean = false): RDD[(Long, Array[Byte])] = { execute().mapPartitionsInternal { iter => var count = 0 val buffer = new Array[Byte](4 << 10) // 4K val codec = CompressionCodec.createCodec(SparkEnv.get.conf) val bos = new ByteArrayOutputStream() val out = new DataOutputStream(codec.compressedOutputStream(bos)) - // `iter.hasNext` may produce one row and buffer it, we should only call it when the limit is - // not hit. - while ((n < 0 || count < n) && iter.hasNext) { -val row = iter.next().asInstanceOf[UnsafeRow] -out.writeInt(row.getSizeInBytes) -row.writeToStream(out, buffer) -count += 1 + + if (reverse) { +// To collect n from the last, we should anyway read everything with keeping the n. +// Otherwise, we don't know where is the last from the iterator. +var last: Seq[UnsafeRow] = Seq.empty[UnsafeRow] +if (n > 0) { + // NOTE: when reverse, -1 is not supported. There is no difference with collecting + // all and reversing. + val slidingIter = iter.map(_.copy()).sliding(n) + while (slidingIter.hasNext) { last = slidingIter.next().asInstanceOf[Seq[UnsafeRow]] } Review comment: do we need to copy rows if it is not last sliding window? I think we only care about last n rows? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568150257 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20438/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568150253 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
AmplabJenkins removed a comment on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568150257 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20438/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
AmplabJenkins commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568150253 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays
SparkQA commented on issue #26910: [SPARK-30154][ML] PySpark UDF to convert MLlib vectors to dense arrays URL: https://github.com/apache/spark/pull/26910#issuecomment-568150044 **[Test build #115639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115639/testReport)** for PR 26910 at commit [`d257dce`](https://github.com/apache/spark/commit/d257dce703986e4ed325af0bae07c88a467a684e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tinhto-000 commented on a change in pull request #26955: [SPARK-30310] [Core] Resolve missing match case in SparkUncaughtExceptionHandler and added tests
tinhto-000 commented on a change in pull request #26955: [SPARK-30310] [Core] Resolve missing match case in SparkUncaughtExceptionHandler and added tests URL: https://github.com/apache/spark/pull/26955#discussion_r360627627 ## File path: core/src/test/scala/org/apache/spark/util/SparkUncaughtExceptionHandlerSuite.scala ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.io.File + +import org.apache.spark.SparkFunSuite + +class SparkUncaughtExceptionSuite extends SparkFunSuite { + + private val sparkHome = +sys.props.getOrElse("spark.test.home", fail("spark.test.home is not set!")) + + // creates a spark-class process that invokes the exception thrower + // the testcases will detect the process's exit code + def getThrowerProcess(exceptionThrower: Any, exitOnUncaughtException: Boolean): Process = { +Utils.executeCommand( + Seq(s"$sparkHome/bin/spark-class", +exceptionThrower.getClass.getCanonicalName.dropRight(1), // drops the "$" at the end +if (exitOnUncaughtException) "true" else "false"), + new File(sparkHome), + Map("SPARK_TESTING" -> "1", "SPARK_HOME" -> sparkHome)) + } + + test("SPARK-30310: Test uncaught RuntimeException, exitOnUncaughtException = true") { +val process = getThrowerProcess(RuntimeExceptionThrower, exitOnUncaughtException = true) +assert(process.waitFor == SparkExitCode.UNCAUGHT_EXCEPTION) + } + + test("SPARK-30310: Test uncaught RuntimeException, exitOnUncaughtException = false") { +val process = getThrowerProcess(RuntimeExceptionThrower, exitOnUncaughtException = false) +assert(process.waitFor == 0) + } + + test("SPARK-30310: Test uncaught OutOfMemoryError, exitOnUncaughtException = true") { +val process = getThrowerProcess(OutOfMemoryErrorThrower, exitOnUncaughtException = true) +assert(process.waitFor == SparkExitCode.OOM) + } + + test("SPARK-30310: Test uncaught OutOfMemoryError, exitOnUncaughtException = false") { +val process = getThrowerProcess(OutOfMemoryErrorThrower, exitOnUncaughtException = false) +assert(process.waitFor == SparkExitCode.OOM) + } + + test("SPARK-30310: Test uncaught SparkFatalException, exitOnUncaughtException = true") { +val process = getThrowerProcess(SparkFatalExceptionThrower, exitOnUncaughtException = true) +assert(process.waitFor == SparkExitCode.UNCAUGHT_EXCEPTION) + } + + test("SPARK-30310: Test uncaught SparkFatalException, exitOnUncaughtException = false") { +val process = getThrowerProcess(SparkFatalExceptionThrower, exitOnUncaughtException = false) +assert(process.waitFor == 0) + } + + test("SPARK-30310: Test uncaught SparkFatalException (OOM), exitOnUncaughtException = true") { +val process = getThrowerProcess(SparkFatalExceptionWithOOMThrower, + exitOnUncaughtException = true) +assert(process.waitFor == SparkExitCode.OOM) + } + + test("SPARK-30310: Test uncaught SparkFatalException (OOM), exitOnUncaughtException = false") { +val process = getThrowerProcess(SparkFatalExceptionWithOOMThrower, + exitOnUncaughtException = false) +assert(process.waitFor == SparkExitCode.OOM) + } + +} + +// a thread that uses SparkUncaughtExceptionHandler, then throws the throwable +class ThrowableThrowerThread(t: Throwable, +exitOnUncaughtException: Boolean) extends Thread { + override def run() { +Thread.setDefaultUncaughtExceptionHandler( + new SparkUncaughtExceptionHandler(exitOnUncaughtException)) +throw t + } +} + +// Objects to be invoked by spark-class for different Throwable types +// that SparkUncaughtExceptionHandler handles. spark-class will exit with +// exit code dictated by either SparkUncaughtExceptionHandler (SparkExitCode) +// or main() (0) + +object RuntimeExceptionThrower { + def main(args: Array[String]): Unit = { +val t = new ThrowableThrowerThread(new RuntimeException, if (args(0) == "true") true else false) +t.start() +t.join() +System.exit(0) + } +} + +object OutOfMemoryErrorThrower { + def main(args: Array[String]): Unit = { +val t = new ThrowableThrowerThread(new OutOfMemoryError, if (args(0) == "true")
[GitHub] [spark] tinhto-000 commented on a change in pull request #26955: [SPARK-30310] [Core] Resolve missing match case in SparkUncaughtExceptionHandler and added tests
tinhto-000 commented on a change in pull request #26955: [SPARK-30310] [Core] Resolve missing match case in SparkUncaughtExceptionHandler and added tests URL: https://github.com/apache/spark/pull/26955#discussion_r360627600 ## File path: core/src/test/scala/org/apache/spark/util/SparkUncaughtExceptionHandlerSuite.scala ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util + +import java.io.File + +import org.apache.spark.SparkFunSuite + +class SparkUncaughtExceptionSuite extends SparkFunSuite { + + private val sparkHome = +sys.props.getOrElse("spark.test.home", fail("spark.test.home is not set!")) + + // creates a spark-class process that invokes the exception thrower + // the testcases will detect the process's exit code + def getThrowerProcess(exceptionThrower: Any, exitOnUncaughtException: Boolean): Process = { +Utils.executeCommand( + Seq(s"$sparkHome/bin/spark-class", +exceptionThrower.getClass.getCanonicalName.dropRight(1), // drops the "$" at the end +if (exitOnUncaughtException) "true" else "false"), Review comment: thanks for the suggestion! modded accordingly. also changed to use .toBoolean for converting string to boolean. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] tinhto-000 commented on a change in pull request #26955: [SPARK-30310] [Core] Resolve missing match case in SparkUncaughtExceptionHandler and added tests
tinhto-000 commented on a change in pull request #26955: [SPARK-30310] [Core] Resolve missing match case in SparkUncaughtExceptionHandler and added tests URL: https://github.com/apache/spark/pull/26955#discussion_r360627566 ## File path: core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala ## @@ -48,11 +48,17 @@ private[spark] class SparkUncaughtExceptionHandler(val exitOnUncaughtException: System.exit(SparkExitCode.OOM) case _ if exitOnUncaughtException => System.exit(SparkExitCode.UNCAUGHT_EXCEPTION) + case _ => +// SPARK-30310: Don't System.exit() when exitOnUncaughtException is false } } } catch { - case oom: OutOfMemoryError => Runtime.getRuntime.halt(SparkExitCode.OOM) - case t: Throwable => Runtime.getRuntime.halt(SparkExitCode.UNCAUGHT_EXCEPTION_TWICE) + case oom: OutOfMemoryError => +logError("Uncaught OutOfMemoryError in thread " + thread + ", process halted.", oom) Review comment: thanks for the suggestion! modded accordingly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] seayoun commented on a change in pull request #26938: [SPARK-30297][CORE] Fix executor lost in net cause app hung upp
seayoun commented on a change in pull request #26938: [SPARK-30297][CORE] Fix executor lost in net cause app hung upp URL: https://github.com/apache/spark/pull/26938#discussion_r360625700 ## File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ## @@ -199,6 +200,18 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) if (now - lastSeenMs > executorTimeoutMs) { logWarning(s"Removing executor $executorId with no recent heartbeats: " + s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms") +sc.schedulerBackend match { + case backend: CoarseGrainedSchedulerBackend => +backend.synchronized { + // Mark executor pending to remove if executor heartbeat expired + // to avoid reschedule task on this executor again + if (!backend.executorsPendingToRemove.contains(executorId)) { +backend.executorsPendingToRemove(executorId) = false Review comment: > `sc.killAndReplaceExecutor` already try to mark it as "pending to remove" this is right, but the task has rescheduled at this executor agait at this time, the executor must be removed from the ExecutorBackend to avoid. For example, it will `disableExecutor` in `CoarseGrainedSchedulerBackend` if the driver lost connection from executor, `disableExecutor` mark the executor dead, and then to reschedule the task on the lost connection executors. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] seayoun commented on a change in pull request #26938: [SPARK-30297][CORE] Fix executor lost in net cause app hung upp
seayoun commented on a change in pull request #26938: [SPARK-30297][CORE] Fix executor lost in net cause app hung upp URL: https://github.com/apache/spark/pull/26938#discussion_r360625775 ## File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ## @@ -199,6 +200,18 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) if (now - lastSeenMs > executorTimeoutMs) { logWarning(s"Removing executor $executorId with no recent heartbeats: " + s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms") +sc.schedulerBackend match { + case backend: CoarseGrainedSchedulerBackend => +backend.synchronized { + // Mark executor pending to remove if executor heartbeat expired + // to avoid reschedule task on this executor again + if (!backend.executorsPendingToRemove.contains(executorId)) { +backend.executorsPendingToRemove(executorId) = false Review comment: The task can reschedule at this executor before mark it as "pending to remove". This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] seayoun commented on a change in pull request #26938: [SPARK-30297][CORE] Fix executor lost in net cause app hung upp
seayoun commented on a change in pull request #26938: [SPARK-30297][CORE] Fix executor lost in net cause app hung upp URL: https://github.com/apache/spark/pull/26938#discussion_r360625700 ## File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ## @@ -199,6 +200,18 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) if (now - lastSeenMs > executorTimeoutMs) { logWarning(s"Removing executor $executorId with no recent heartbeats: " + s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms") +sc.schedulerBackend match { + case backend: CoarseGrainedSchedulerBackend => +backend.synchronized { + // Mark executor pending to remove if executor heartbeat expired + // to avoid reschedule task on this executor again + if (!backend.executorsPendingToRemove.contains(executorId)) { +backend.executorsPendingToRemove(executorId) = false Review comment: > `sc.killAndReplaceExecutor` already try to mark it as "pending to remove" this is right, but the task has rescheduled at this executor agait at this time, the executor must be removed from the ExecutorBackend to avoid, for example, it will `disableExecutor` in `CoarseGrainedSchedulerBackend` if the driver lost connection from executor. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector
dongjoon-hyun commented on issue #25135: [SPARK-28367][SS] Use new KafkaConsumer.poll API in Kafka connector URL: https://github.com/apache/spark/pull/25135#issuecomment-568146271 Thank you for updates, @HeartSaVioR . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568145373 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568145375 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115636/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568145373 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568145375 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115636/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
SparkQA removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568116396 **[Test build #115636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115636/testReport)** for PR 26973 at commit [`f24e873`](https://github.com/apache/spark/commit/f24e8734698c65cc17cbd33efc27962f33fc1ca0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
SparkQA commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568145211 **[Test build #115636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115636/testReport)** for PR 26973 at commit [`f24e873`](https://github.com/apache/spark/commit/f24e8734698c65cc17cbd33efc27962f33fc1ca0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568144946 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115637/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568144944 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568144946 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115637/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568144944 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
SparkQA removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568139220 **[Test build #115637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115637/testReport)** for PR 26913 at commit [`746e0d1`](https://github.com/apache/spark/commit/746e0d1d3a11af21604dc8acc9d098fc721a0bb7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
SparkQA commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568144864 **[Test build #115637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115637/testReport)** for PR 26913 at commit [`746e0d1`](https://github.com/apache/spark/commit/746e0d1d3a11af21604dc8acc9d098fc721a0bb7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cfmcgrady commented on issue #26925: [MINOR][CORE] Quiet request executor remove message.
cfmcgrady commented on issue #26925: [MINOR][CORE] Quiet request executor remove message. URL: https://github.com/apache/spark/pull/26925#issuecomment-568143502 retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cfmcgrady commented on issue #26925: [MINOR][CORE] Quiet request executor remove message.
cfmcgrady commented on issue #26925: [MINOR][CORE] Quiet request executor remove message. URL: https://github.com/apache/spark/pull/26925#issuecomment-568143471 It seems this unit test failed wasn't triggered by this commit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #26970: [SPARK-28825][SQL][DOC] Documentation for Explain Command
maropu commented on a change in pull request #26970: [SPARK-28825][SQL][DOC] Documentation for Explain Command URL: https://github.com/apache/spark/pull/26970#discussion_r360622819 ## File path: docs/sql-ref-syntax-qry-explain.md ## @@ -19,4 +19,157 @@ license: | limitations under the License. --- -**This page is under construction** +### Description + +The `EXPLAIN` statement provides the execution plan for the statement. +By default, `EXPLAIN` provides information about the physical plan. +`EXPLAIN` does not support 'DESCRIBE TABLE' statement. + + +### Syntax +{% highlight sql %} +EXPLAIN [EXTENDED | CODEGEN] statement Review comment: We have more modes for explains: https://github.com/apache/spark/blob/c72f88b0ba20727e831ba9755d9628d0347ee3cb/sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala#L134 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
AmplabJenkins removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568143068 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115638/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
AmplabJenkins commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568143068 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115638/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
SparkQA commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568143059 **[Test build #115638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115638/testReport)** for PR 25024 at commit [`b6d5280`](https://github.com/apache/spark/commit/b6d5280ff06ede4869d225ab3f12277fb1502102). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
SparkQA removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568142516 **[Test build #115638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115638/testReport)** for PR 25024 at commit [`b6d5280`](https://github.com/apache/spark/commit/b6d5280ff06ede4869d225ab3f12277fb1502102). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
AmplabJenkins commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568143065 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
AmplabJenkins removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568143065 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #26970: [SPARK-28825][SQL][DOC] Documentation for Explain Command
maropu edited a comment on issue #26970: [SPARK-28825][SQL][DOC] Documentation for Explain Command URL: https://github.com/apache/spark/pull/26970#issuecomment-568142931 Thanks for the work, @PavithraRamachandran. btw, I updated the PR template and plz don't change it next time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #26970: [SPARK-28825][SQL][DOC] Documentation for Explain Command
maropu edited a comment on issue #26970: [SPARK-28825][SQL][DOC] Documentation for Explain Command URL: https://github.com/apache/spark/pull/26970#issuecomment-568142931 Thanks for the work, @PavithraRamachandran. btw, I updated the PR template. Plz don't change it next time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #26970: [SPARK-28825][SQL][DOC] Documentation for Explain Command
maropu commented on issue #26970: [SPARK-28825][SQL][DOC] Documentation for Explain Command URL: https://github.com/apache/spark/pull/26970#issuecomment-568142931 I updated the PR template. Plz don't change it next time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
SparkQA commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568142516 **[Test build #115638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115638/testReport)** for PR 25024 at commit [`b6d5280`](https://github.com/apache/spark/commit/b6d5280ff06ede4869d225ab3f12277fb1502102). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
AmplabJenkins removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568141078 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
AmplabJenkins commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568141078 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
AmplabJenkins removed a comment on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568141080 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20437/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row
AmplabJenkins commented on issue #25024: [SPARK-27296][SQL] User Defined Aggregators that do not ser/de on each input row URL: https://github.com/apache/spark/pull/25024#issuecomment-568141080 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20437/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
SparkQA commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568139220 **[Test build #115637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115637/testReport)** for PR 26913 at commit [`746e0d1`](https://github.com/apache/spark/commit/746e0d1d3a11af21604dc8acc9d098fc721a0bb7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26958: [SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in
HyukjinKwon commented on a change in pull request #26958: [SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC URL: https://github.com/apache/spark/pull/26958#discussion_r360619514 ## File path: python/pyspark/sql/readwriter.py ## @@ -520,20 +537,24 @@ def func(iterator): raise TypeError("path can be only string, list or RDD") @since(1.5) -def orc(self, path, mergeSchema=None, recursiveFileLookup=None): +def orc(self, path, mergeSchema=None, pathGlobFilter=None, recursiveFileLookup=None): """Loads ORC files, returning the result as a :class:`DataFrame`. :param mergeSchema: sets whether we should merge schemas collected from all ORC part-files. This will override ``spark.sql.orc.mergeSchema``. The default value is specified in ``spark.sql.orc.mergeSchema``. +:param pathGlobFilter: an optional glob pattern to only include files with paths matching + the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`. + It does not change the behavior of `partition discovery`_. :param recursiveFileLookup: recursively scan a directory for files. Using this option -disables `partition discovery`_. +disables `partition discovery`_. Review comment: So .. if you dont mind, I would like this run that separately :-).. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #26958: [SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in
HyukjinKwon commented on a change in pull request #26958: [SPARK-30128][DOCS][PYTHON][SQL] Document/promote 'recursiveFileLookup' and 'pathGlobFilter' in file sources 'mergeSchema' in ORC URL: https://github.com/apache/spark/pull/26958#discussion_r360619476 ## File path: python/pyspark/sql/readwriter.py ## @@ -520,20 +537,24 @@ def func(iterator): raise TypeError("path can be only string, list or RDD") @since(1.5) -def orc(self, path, mergeSchema=None, recursiveFileLookup=None): +def orc(self, path, mergeSchema=None, pathGlobFilter=None, recursiveFileLookup=None): """Loads ORC files, returning the result as a :class:`DataFrame`. :param mergeSchema: sets whether we should merge schemas collected from all ORC part-files. This will override ``spark.sql.orc.mergeSchema``. The default value is specified in ``spark.sql.orc.mergeSchema``. +:param pathGlobFilter: an optional glob pattern to only include files with paths matching + the pattern. The syntax follows `org.apache.hadoop.fs.GlobFilter`. + It does not change the behavior of `partition discovery`_. :param recursiveFileLookup: recursively scan a directory for files. Using this option -disables `partition discovery`_. +disables `partition discovery`_. Review comment: Since we're going ahead for Spark 3, we wont likely backport many things that cause conflicts. So I was thinking it's a feasible option. Also, I think we might have to document this first that virtical alignment isn't preferred. I think virtical alignment is still valid per PEP 8 and PEP 257. One downside of doing bit by bit is a confusion by mixed style. Considering that we wont likely add many new docstrings, mixed style exists in a long term. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568136983 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20436/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568136980 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568136983 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20436/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568136980 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore
HeartSaVioR commented on issue #26935: [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore URL: https://github.com/apache/spark/pull/26935#issuecomment-568134313 cc. @tdas @zsxwing @gaborgsomogyi This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #24350: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
jiangxb1987 commented on a change in pull request #24350: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend URL: https://github.com/apache/spark/pull/24350#discussion_r360614606 ## File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ## @@ -205,6 +207,13 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) // Note: we want to get an executor back after expiring this one, // so do not simply call `sc.killExecutor` here (SPARK-8119) sc.killAndReplaceExecutor(executorId) +// In case of the executors which are not gracefully shut down, we should remove +// lost executors from CoarseGrainedSchedulerBackend manually here (SPARK-27348) +sc.schedulerBackend match { + case backend: CoarseGrainedSchedulerBackend => +backend.driverEndpoint.send(RemoveExecutor(executorId, ExecutorKilled)) Review comment: IIUC if executors are killed they would enter this heartbeat lost logic anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568129570 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115635/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568129565 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568129570 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115635/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
AmplabJenkins removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568129565 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
SparkQA removed a comment on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568081299 **[Test build #115635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115635/testReport)** for PR 26913 at commit [`33ae658`](https://github.com/apache/spark/commit/33ae65854370c38a24531a2519f17913fc5f43be). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider
SparkQA commented on issue #26913: [SPARK-29219][SQL] Introduce SupportsCatalogOptions for TableProvider URL: https://github.com/apache/spark/pull/26913#issuecomment-568129034 **[Test build #115635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115635/testReport)** for PR 26913 at commit [`33ae658`](https://github.com/apache/spark/commit/33ae65854370c38a24531a2519f17913fc5f43be). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on issue #26968: [SQL][minor] update the final plan in UI for AQE
JkSelf commented on issue #26968: [SQL][minor] update the final plan in UI for AQE URL: https://github.com/apache/spark/pull/26968#issuecomment-568128085 LGTM. Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation
AmplabJenkins removed a comment on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation URL: https://github.com/apache/spark/pull/26974#issuecomment-568125502 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115634/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation
AmplabJenkins removed a comment on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation URL: https://github.com/apache/spark/pull/26974#issuecomment-568125497 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation
AmplabJenkins commented on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation URL: https://github.com/apache/spark/pull/26974#issuecomment-568125502 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115634/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation
AmplabJenkins commented on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation URL: https://github.com/apache/spark/pull/26974#issuecomment-568125497 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation
SparkQA removed a comment on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation URL: https://github.com/apache/spark/pull/26974#issuecomment-568062081 **[Test build #115634 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115634/testReport)** for PR 26974 at commit [`5b6c906`](https://github.com/apache/spark/commit/5b6c9069a9ac1c46b7961d2da00f2cd5aa0dfb1a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #24350: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
jiangxb1987 commented on a change in pull request #24350: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend URL: https://github.com/apache/spark/pull/24350#discussion_r360607491 ## File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala ## @@ -205,6 +207,13 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock) // Note: we want to get an executor back after expiring this one, // so do not simply call `sc.killExecutor` here (SPARK-8119) sc.killAndReplaceExecutor(executorId) +// In case of the executors which are not gracefully shut down, we should remove +// lost executors from CoarseGrainedSchedulerBackend manually here (SPARK-27348) +sc.schedulerBackend match { Review comment: Why not put this into `TaskSchedulerImpl.executorLost()` ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation
SparkQA commented on issue #26974: [SPARK-30324][SQL] Simplify JSON field access through dot notation URL: https://github.com/apache/spark/pull/26974#issuecomment-568124920 **[Test build #115634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115634/testReport)** for PR 26974 at commit [`5b6c906`](https://github.com/apache/spark/commit/5b6c9069a9ac1c46b7961d2da00f2cd5aa0dfb1a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu edited a comment on issue #26943: [SPARK-30298][SQL] Bucket join should work for self-join with views
maropu edited a comment on issue #26943: [SPARK-30298][SQL] Bucket join should work for self-join with views URL: https://github.com/apache/spark/pull/26943#issuecomment-568124034 yea, of course not! You can feel free to take them over. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #26943: [SPARK-30298][SQL] Bucket join should work for self-join with views
maropu commented on issue #26943: [SPARK-30298][SQL] Bucket join should work for self-join with views URL: https://github.com/apache/spark/pull/26943#issuecomment-568124034 yea, sure! You can feel free to take them over. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wypoon commented on issue #26895: [SPARK-17398][SQL] Fix ClassCastException when querying partitioned JSON table
wypoon commented on issue #26895: [SPARK-17398][SQL] Fix ClassCastException when querying partitioned JSON table URL: https://github.com/apache/spark/pull/26895#issuecomment-568123485 Thanks @vanzin ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
SparkQA commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568116396 **[Test build #115636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115636/testReport)** for PR 26973 at commit [`f24e873`](https://github.com/apache/spark/commit/f24e8734698c65cc17cbd33efc27962f33fc1ca0). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax
AmplabJenkins commented on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax URL: https://github.com/apache/spark/pull/26847#issuecomment-568110547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115632/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax
AmplabJenkins commented on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax URL: https://github.com/apache/spark/pull/26847#issuecomment-568110543 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax
AmplabJenkins removed a comment on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax URL: https://github.com/apache/spark/pull/26847#issuecomment-568110547 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115632/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax
AmplabJenkins removed a comment on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax URL: https://github.com/apache/spark/pull/26847#issuecomment-568110543 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax
SparkQA removed a comment on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax URL: https://github.com/apache/spark/pull/26847#issuecomment-568035901 **[Test build #115632 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115632/testReport)** for PR 26847 at commit [`0b89d3a`](https://github.com/apache/spark/commit/0b89d3acadebe5452302c5edd794e49252b52982). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568109077 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20435/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568109068 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax
SparkQA commented on issue #26847: [SPARK-30214][SQL] Support COMMENT ON syntax URL: https://github.com/apache/spark/pull/26847#issuecomment-568109376 **[Test build #115632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115632/testReport)** for PR 26847 at commit [`0b89d3a`](https://github.com/apache/spark/commit/0b89d3acadebe5452302c5edd794e49252b52982). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568109068 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568109077 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20435/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568088179 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115633/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins removed a comment on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568088164 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568088164 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource
AmplabJenkins commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource URL: https://github.com/apache/spark/pull/26973#issuecomment-568088179 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115633/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org