[GitHub] AmplabJenkins commented on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
AmplabJenkins commented on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#issuecomment-447019560 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
AmplabJenkins commented on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#issuecomment-447019572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6086/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
SparkQA commented on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#issuecomment-447019731 **[Test build #100100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100100/testReport)** for PR 22305 at commit [`0408c26`](https://github.com/apache/spark/commit/0408c269b6541bee22762b6780227f0e00770567). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
AmplabJenkins removed a comment on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#issuecomment-447019560 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] srowen commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts
srowen commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts URL: https://github.com/apache/spark/pull/23311#issuecomment-447019462 Honestly I think we can remove this. It's been bad practice for years, and keeping the support means it stays in Spark for years. This mode doesn't really work. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
AmplabJenkins removed a comment on issue #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#issuecomment-447019572 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6086/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HyukjinKwon commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts
HyukjinKwon commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts URL: https://github.com/apache/spark/pull/23311#issuecomment-447021531 Yea, I actually wanted to remove this but made it deprecated in case some people have a different view. +1 for just removing out. Let me update it tomorrow if there's no comment against just removing out. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] chakravarthiT opened a new pull request #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch
chakravarthiT opened a new pull request #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch URL: https://github.com/apache/spark/pull/23312 ## What changes were proposed in this pull request? User specified filters are not applied to SQL tab in yarn mode, as it is overridden by the yarn AmIp filter. So we need to append user provided filters (spark.ui.filters) with yarn filter. ## How was this patch tested? 【Test step】: 1) Launch spark sql with authentication filter as below: 2) spark-sql --master yarn --conf spark.ui.filters=org.apache.hadoop.security.authentication.server.AuthenticationFilter --conf spark.org.apache.hadoop.security.authentication.server.AuthenticationFilter.params="type=simple" 3) Go to Yarn application list UI link 4) Launch the application master for the Spark-SQL app ID and access all the tabs by appending tab name. 5) It will display an error for all tabs including SQL tab.(before able to access SQL tab,as Authentication filter is not applied for SQL tab) 6) Also can be verified with info logs,that Authentication filter applied to SQL tab.(before it is not applied). I have attached the behaviour below in following order.. 1) Command used 2) Before fix (logs and UI) 3) After fix (logs and UI) **1) COMMAND USED**: launching spark-sql with authentication filter. ![image](https://user-images.githubusercontent.com/45845595/49947295-e7e97400-ff16-11e8-8c9a-10659487ddee.png) **2) BEFORE FIX:** **UI result:** able to access SQL tab. ![image](https://user-images.githubusercontent.com/45845595/49948398-62b38e80-ff19-11e8-95dc-e74f9e3c2ba7.png) **logs**: authentication filter not applied to SQL tab. ![image](https://user-images.githubusercontent.com/45845595/49947343-ff286180-ff16-11e8-9de0-3f8db140bc32.png) **3) AFTER FIX:** **UI result**: Not able to access SQL tab. ![image](https://user-images.githubusercontent.com/45845595/49947360-0d767d80-ff17-11e8-9e9e-a95311949164.png) **in logs**: Both yarn filter and Authentication filter applied to SQL tab. ![image](https://user-images.githubusercontent.com/45845595/49947377-1a936c80-ff17-11e8-9f44-700eb3dc0ded.png) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] chakravarthiT commented on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch
chakravarthiT commented on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch URL: https://github.com/apache/spark/pull/23312#issuecomment-447024583 @vanzin @andrewor14 please review This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch
AmplabJenkins removed a comment on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch URL: https://github.com/apache/spark/pull/23312#issuecomment-447024203 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script
SparkQA removed a comment on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script URL: https://github.com/apache/spark/pull/19652#issuecomment-446948281 **[Test build #100089 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100089/testReport)** for PR 19652 at commit [`0d706ff`](https://github.com/apache/spark/commit/0d706fffb133c1d685e4aaa0b62758d0826bad62). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23310: [SPARK-26363][WebUI] Avoid duplicated KV store lookup for task table
AmplabJenkins commented on issue #23310: [SPARK-26363][WebUI] Avoid duplicated KV store lookup for task table URL: https://github.com/apache/spark/pull/23310#issuecomment-446997933 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6081/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23310: [SPARK-26363][WebUI] Avoid duplicated KV store lookup for task table
AmplabJenkins commented on issue #23310: [SPARK-26363][WebUI] Avoid duplicated KV store lookup for task table URL: https://github.com/apache/spark/pull/23310#issuecomment-446997924 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on issue #23300: [SPARK-26327][SQL][BACKPORT-2.2] Bug fix for `FileSourceScanExec` metrics update
dongjoon-hyun commented on issue #23300: [SPARK-26327][SQL][BACKPORT-2.2] Bug fix for `FileSourceScanExec` metrics update URL: https://github.com/apache/spark/pull/23300#issuecomment-447012741 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on issue #23299: [SPARK-26327][SQL][BACKPORT-2.3] Bug fix for `FileSourceScanExec` metrics update
dongjoon-hyun commented on issue #23299: [SPARK-26327][SQL][BACKPORT-2.3] Bug fix for `FileSourceScanExec` metrics update URL: https://github.com/apache/spark/pull/23299#issuecomment-447012839 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] dongjoon-hyun commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions
dongjoon-hyun commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions URL: https://github.com/apache/spark/pull/23291#issuecomment-447012984 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions
AmplabJenkins commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions URL: https://github.com/apache/spark/pull/23291#issuecomment-447015467 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions
AmplabJenkins commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions URL: https://github.com/apache/spark/pull/23291#issuecomment-447015486 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6084/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23300: [SPARK-26327][SQL][BACKPORT-2.2] Bug fix for `FileSourceScanExec` metrics update
SparkQA commented on issue #23300: [SPARK-26327][SQL][BACKPORT-2.2] Bug fix for `FileSourceScanExec` metrics update URL: https://github.com/apache/spark/pull/23300#issuecomment-447015177 **[Test build #100097 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100097/testReport)** for PR 23300 at commit [`7239aac`](https://github.com/apache/spark/commit/7239aac61371afc7b518f660f4068f73dc523642). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23300: [SPARK-26327][SQL][BACKPORT-2.2] Bug fix for `FileSourceScanExec` metrics update
AmplabJenkins removed a comment on issue #23300: [SPARK-26327][SQL][BACKPORT-2.2] Bug fix for `FileSourceScanExec` metrics update URL: https://github.com/apache/spark/pull/23300#issuecomment-447014789 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23300: [SPARK-26327][SQL][BACKPORT-2.2] Bug fix for `FileSourceScanExec` metrics update
AmplabJenkins removed a comment on issue #23300: [SPARK-26327][SQL][BACKPORT-2.2] Bug fix for `FileSourceScanExec` metrics update URL: https://github.com/apache/spark/pull/23300#issuecomment-447014798 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6082/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23299: [SPARK-26327][SQL][BACKPORT-2.3] Bug fix for `FileSourceScanExec` metrics update
SparkQA commented on issue #23299: [SPARK-26327][SQL][BACKPORT-2.3] Bug fix for `FileSourceScanExec` metrics update URL: https://github.com/apache/spark/pull/23299#issuecomment-447015171 **[Test build #100098 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100098/testReport)** for PR 23299 at commit [`63e50f8`](https://github.com/apache/spark/commit/63e50f8655cf61cbff105857aeec84a2139fb729). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23299: [SPARK-26327][SQL][BACKPORT-2.3] Bug fix for `FileSourceScanExec` metrics update
AmplabJenkins removed a comment on issue #23299: [SPARK-26327][SQL][BACKPORT-2.3] Bug fix for `FileSourceScanExec` metrics update URL: https://github.com/apache/spark/pull/23299#issuecomment-447014502 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions
SparkQA commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions URL: https://github.com/apache/spark/pull/23291#issuecomment-447015215 **[Test build #100099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100099/testReport)** for PR 23291 at commit [`d9b46f3`](https://github.com/apache/spark/commit/d9b46f33b56d0ce46942c7cf09249ea266632ce1). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts
SparkQA commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts URL: https://github.com/apache/spark/pull/23311#issuecomment-447015175 **[Test build #100096 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100096/testReport)** for PR 23311 at commit [`c107b4e`](https://github.com/apache/spark/commit/c107b4ea6688976054f85fabff985d295438cb8b). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23299: [SPARK-26327][SQL][BACKPORT-2.3] Bug fix for `FileSourceScanExec` metrics update
AmplabJenkins removed a comment on issue #23299: [SPARK-26327][SQL][BACKPORT-2.3] Bug fix for `FileSourceScanExec` metrics update URL: https://github.com/apache/spark/pull/23299#issuecomment-447014507 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6083/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#discussion_r241452692 ## File path: python/pyspark/sql/tests/test_pandas_udf_window.py ## @@ -47,6 +47,15 @@ def pandas_scalar_time_two(self): from pyspark.sql.functions import pandas_udf return pandas_udf(lambda v: v * 2, 'double') +@property +def pandas_agg_count_udf(self): +from pyspark.sql.functions import pandas_udf, PandasUDFType Review comment: SGTM. Opened https://jira.apache.org/jira/browse/SPARK-26364 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts
AmplabJenkins commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts URL: https://github.com/apache/spark/pull/23311#issuecomment-447017423 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts
AmplabJenkins commented on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts URL: https://github.com/apache/spark/pull/23311#issuecomment-447017438 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6085/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts
AmplabJenkins removed a comment on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts URL: https://github.com/apache/spark/pull/23311#issuecomment-447017423 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#discussion_r241453704 ## File path: python/pyspark/worker.py ## @@ -145,7 +145,18 @@ def wrapped(*series): return lambda *a: (wrapped(*a), arrow_return_type) -def wrap_window_agg_pandas_udf(f, return_type): +def wrap_window_agg_pandas_udf(f, return_type, runner_conf, udf_index): +window_bound_types_str = runner_conf.get('pandas_window_bound_types') +window_bound_type = [t.strip() for t in window_bound_types_str.split(',')][udf_index] Review comment: Sounds good! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#discussion_r241453632 ## File path: python/pyspark/sql/functions.py ## @@ -2993,20 +2992,25 @@ def pandas_udf(f=None, returnType=None, functionType=None): >>> @pandas_udf("double", PandasUDFType.GROUPED_AGG) # doctest: +SKIP ... def mean_udf(v): ... return v.mean() - >>> w = Window \\ - ... .partitionBy('id') \\ - ... .rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing) + >>> w = (Window.partitionBy('id') + ....orderBy('v') + ....rowsBetween(-1, 0)) >>> df.withColumn('mean_v', mean_udf(df['v']).over(w)).show() # doctest: +SKIP +---++--+ | id| v|mean_v| +---++--+ - | 1| 1.0| 1.5| + | 1| 1.0| 1.0| | 1| 2.0| 1.5| - | 2| 3.0| 6.0| - | 2| 5.0| 6.0| - | 2|10.0| 6.0| + | 2| 3.0| 3.0| + | 2| 5.0| 4.0| + | 2|10.0| 7.5| +---++--+ + .. note:: For performance reasons, the input series to window functions are not copied. +Therefore, changing the value of the input series is not allowed and will +result incorrect results. For the same reason, users should also not rely Review comment: Sounds good! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts
AmplabJenkins removed a comment on issue #23311: [SPARK-26362][CORE] Deprecate 'spark.driver.allowMultipleContexts' to discourage multiple creation of SparkContexts URL: https://github.com/apache/spark/pull/23311#issuecomment-447017438 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6085/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
icexelloss commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#discussion_r241454125 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala ## @@ -27,17 +27,65 @@ import org.apache.spark.api.python.{ChainedPythonFunctions, PythonEvalType} import org.apache.spark.rdd.RDD import org.apache.spark.sql.catalyst.InternalRow import org.apache.spark.sql.catalyst.expressions._ -import org.apache.spark.sql.catalyst.plans.physical._ -import org.apache.spark.sql.execution.{GroupedIterator, SparkPlan, UnaryExecNode} +import org.apache.spark.sql.catalyst.plans.physical.{AllTuples, ClusteredDistribution, Distribution, Partitioning} +import org.apache.spark.sql.execution.{ExternalAppendOnlyUnsafeRowArray, SparkPlan} import org.apache.spark.sql.execution.arrow.ArrowUtils -import org.apache.spark.sql.types.{DataType, StructField, StructType} +import org.apache.spark.sql.execution.window._ +import org.apache.spark.sql.types._ import org.apache.spark.util.Utils +/** + * This class calculates and outputs windowed aggregates over the rows in a single partition. + * + * This is similar to [[WindowExec]]. The main difference is that this node doesn't not compute Review comment: Nice catch. Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON
AmplabJenkins commented on issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON URL: https://github.com/apache/spark/pull/23196#issuecomment-447021279 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON
AmplabJenkins commented on issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON URL: https://github.com/apache/spark/pull/23196#issuecomment-447021290 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100091/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON
SparkQA commented on issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON URL: https://github.com/apache/spark/pull/23196#issuecomment-447020882 **[Test build #100091 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100091/testReport)** for PR 23196 at commit [`0c7b96b`](https://github.com/apache/spark/commit/0c7b96b596f19c1cbd0500a8631b90bfa6b02da7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON
AmplabJenkins removed a comment on issue #23196: [SPARK-26243][SQL] Use java.time API for parsing timestamps and dates from JSON URL: https://github.com/apache/spark/pull/23196#issuecomment-447021290 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100091/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch
AmplabJenkins removed a comment on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch URL: https://github.com/apache/spark/pull/23312#issuecomment-447025750 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script
AmplabJenkins removed a comment on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script URL: https://github.com/apache/spark/pull/19652#issuecomment-447026082 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100089/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch
AmplabJenkins commented on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch URL: https://github.com/apache/spark/pull/23312#issuecomment-447025963 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script
AmplabJenkins commented on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script URL: https://github.com/apache/spark/pull/19652#issuecomment-447026082 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100089/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch
AmplabJenkins commented on issue #23312: [SPARK-26255]Custom error/exception is not thrown for the SQL tab when UI filters are added in spark-sql launch URL: https://github.com/apache/spark/pull/23312#issuecomment-447025750 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script
AmplabJenkins removed a comment on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script URL: https://github.com/apache/spark/pull/19652#issuecomment-447026074 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script
AmplabJenkins commented on issue #19652: [SPARK-22435][SQL] Support processing array and map type using script URL: https://github.com/apache/spark/pull/19652#issuecomment-447026074 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write)
AmplabJenkins commented on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write) URL: https://github.com/apache/spark/pull/23208#issuecomment-447026628 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write)
AmplabJenkins removed a comment on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write) URL: https://github.com/apache/spark/pull/23208#issuecomment-447026643 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6087/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write)
AmplabJenkins removed a comment on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write) URL: https://github.com/apache/spark/pull/23208#issuecomment-447026628 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write)
SparkQA removed a comment on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write) URL: https://github.com/apache/spark/pull/23208#issuecomment-446955202 **[Test build #100090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100090/testReport)** for PR 23208 at commit [`701000d`](https://github.com/apache/spark/commit/701000d66c2f1e2a390b746ee1a475a0f54d93fc). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write)
SparkQA commented on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write) URL: https://github.com/apache/spark/pull/23208#issuecomment-447028402 **[Test build #100090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100090/testReport)** for PR 23208 at commit [`701000d`](https://github.com/apache/spark/commit/701000d66c2f1e2a390b746ee1a475a0f54d93fc). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write)
AmplabJenkins commented on issue #23208: [SPARK-25530][SQL] data source v2 API refactor (batch write) URL: https://github.com/apache/spark/pull/23208#issuecomment-447029201 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] squito commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr
squito commented on issue #23260: [SPARK-26311][YARN] New feature: custom log URL for stdout/stderr URL: https://github.com/apache/spark/pull/23260#issuecomment-447116226 yes, I see your point about the chicken and egg. I also wonder if this feature should not be so yarn-specific then -- in fact, it almost seems more important on kubernetes, as there is no long-lived NM there. But maybe the params you need end up being specific to deployment mode, (eg. `{{ContainerId}}`) so there is no general solution. I'm inclined to wait on this a while till we see if there is a way to get this work more generally, or maybe even to work with yarn even while the app is running; but I don't feel so strongly I'm blocking it, either. @vanzin do you ahve more thoughts? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False
AmplabJenkins commented on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False URL: https://github.com/apache/spark/pull/23315#issuecomment-447131563 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
AmplabJenkins commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108#issuecomment-447131366 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6096/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23303: [SPARK-26352][SQL] join reorder should not change the order of output attributes
AmplabJenkins removed a comment on issue #23303: [SPARK-26352][SQL] join reorder should not change the order of output attributes URL: https://github.com/apache/spark/pull/23303#issuecomment-447131122 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subqueries
AmplabJenkins commented on issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subqueries URL: https://github.com/apache/spark/pull/23057#issuecomment-447131437 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6097/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
AmplabJenkins removed a comment on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108#issuecomment-447131363 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subqueries
AmplabJenkins removed a comment on issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subqueries URL: https://github.com/apache/spark/pull/23057#issuecomment-447131434 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23303: [SPARK-26352][SQL] join reorder should not change the order of output attributes
AmplabJenkins removed a comment on issue #23303: [SPARK-26352][SQL] join reorder should not change the order of output attributes URL: https://github.com/apache/spark/pull/23303#issuecomment-447131129 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6095/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23316: [SPARK-26367] [SQL] Remove ReplaceExceptWithFilter from nonExcludableRules
AmplabJenkins commented on issue #23316: [SPARK-26367] [SQL] Remove ReplaceExceptWithFilter from nonExcludableRules URL: https://github.com/apache/spark/pull/23316#issuecomment-447131284 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23316: [SPARK-26367] [SQL] Remove ReplaceExceptWithFilter from nonExcludableRules
AmplabJenkins removed a comment on issue #23316: [SPARK-26367] [SQL] Remove ReplaceExceptWithFilter from nonExcludableRules URL: https://github.com/apache/spark/pull/23316#issuecomment-447131284 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf*
AmplabJenkins removed a comment on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf* URL: https://github.com/apache/spark/pull/23314#issuecomment-447131160 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
AmplabJenkins commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108#issuecomment-447131363 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23303: [SPARK-26352][SQL] join reorder should not change the order of output attributes
SparkQA commented on issue #23303: [SPARK-26352][SQL] join reorder should not change the order of output attributes URL: https://github.com/apache/spark/pull/23303#issuecomment-447131102 **[Test build #100110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100110/testReport)** for PR 23303 at commit [`ae15739`](https://github.com/apache/spark/commit/ae157392c065a84fea05f2e6d323a7c8e3889d8e). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23316: [SPARK-26367] [SQL] Remove ReplaceExceptWithFilter from nonExcludableRules
AmplabJenkins commented on issue #23316: [SPARK-26367] [SQL] Remove ReplaceExceptWithFilter from nonExcludableRules URL: https://github.com/apache/spark/pull/23316#issuecomment-447131294 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6092/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions
AmplabJenkins removed a comment on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions URL: https://github.com/apache/spark/pull/23291#issuecomment-447074388 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100099/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] aokolnychyi commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions
aokolnychyi commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions URL: https://github.com/apache/spark/pull/23291#issuecomment-447146614 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] HeartSaVioR commented on a change in pull request #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer
HeartSaVioR commented on a change in pull request #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer URL: https://github.com/apache/spark/pull/23301#discussion_r241607771 ## File path: docs/structured-streaming-kafka-integration.md ## @@ -379,7 +379,25 @@ The following configurations are optional: string spark-kafka-source streaming and batch - Prefix of consumer group identifiers (`group.id`) that are generated by structured streaming queries + Prefix of consumer group identifiers (`group.id`) that are generated by structured streaming + queries. If "kafka.group.id" is set, this option will be ignored. Review comment: Yup. I think I chose word incorrectly. Many options are wrapped with \` so felt we are having implicit rule on that. please ignore if the approach on representation is already not consistent. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] vanzin commented on issue #23095: [SPARK-23886][SS] Update query status for ContinuousExecution
vanzin commented on issue #23095: [SPARK-23886][SS] Update query status for ContinuousExecution URL: https://github.com/apache/spark/pull/23095#issuecomment-447167671 Looks good. I'd have chosen a shorter test name, but no biggie. Merging to master. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] zsxwing commented on a change in pull request #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer
zsxwing commented on a change in pull request #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer URL: https://github.com/apache/spark/pull/23301#discussion_r241603962 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala ## @@ -581,6 +581,33 @@ abstract class KafkaMicroBatchSourceSuiteBase extends KafkaSourceSuiteBase { ) } + test("allow group.id override") { +// Tests code path KafkaSourceProvider.{sourceSchema(.), createSource(.)} +// as well as KafkaOffsetReader.createConsumer(.) +val topic = newTopic() +testUtils.createTopic(topic, partitions = 3) +testUtils.sendMessages(topic, (1 to 10).map(_.toString).toArray, Some(0)) +testUtils.sendMessages(topic, (11 to 20).map(_.toString).toArray, Some(1)) +testUtils.sendMessages(topic, (21 to 30).map(_.toString).toArray, Some(2)) + +val dsKafka = spark + .readStream + .format("kafka") + .option("kafka.group.id", "id-" + Random.nextInt()) + .option("kafka.bootstrap.servers", testUtils.brokerAddress) + .option("subscribe", topic) + .option("startingOffsets", "earliest") + .load() + .selectExpr("CAST(value AS STRING)") + .as[String] + .map(_.toInt) + +testStream(dsKafka)( Review comment: Yeah, we don't have an api to check this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] zsxwing commented on a change in pull request #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer
zsxwing commented on a change in pull request #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer URL: https://github.com/apache/spark/pull/23301#discussion_r241603813 ## File path: docs/structured-streaming-kafka-integration.md ## @@ -379,7 +379,25 @@ The following configurations are optional: string spark-kafka-source streaming and batch - Prefix of consumer group identifiers (`group.id`) that are generated by structured streaming queries + Prefix of consumer group identifiers (`group.id`) that are generated by structured streaming + queries. If "kafka.group.id" is set, this option will be ignored. Review comment: > nit: Given that other option is wrapped with `, might better to follow same rule for consistency. We don't have such rule. See the doc of `failOnDataLoss` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
SparkQA commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108#issuecomment-447166021 **[Test build #100111 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100111/testReport)** for PR 23108 at commit [`fef8c68`](https://github.com/apache/spark/commit/fef8c6845ccce792474a8c65ee6ebfd9624bb4b4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
SparkQA removed a comment on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108#issuecomment-447131132 **[Test build #100111 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100111/testReport)** for PR 23108 at commit [`fef8c68`](https://github.com/apache/spark/commit/fef8c6845ccce792474a8c65ee6ebfd9624bb4b4). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer
AmplabJenkins removed a comment on issue #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer URL: https://github.com/apache/spark/pull/23301#issuecomment-447166767 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6100/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer
AmplabJenkins commented on issue #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer URL: https://github.com/apache/spark/pull/23301#issuecomment-447166767 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6100/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf*
AmplabJenkins removed a comment on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf* URL: https://github.com/apache/spark/pull/23314#issuecomment-447166770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6099/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer
AmplabJenkins commented on issue #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer URL: https://github.com/apache/spark/pull/23301#issuecomment-447166762 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf*
AmplabJenkins removed a comment on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf* URL: https://github.com/apache/spark/pull/23314#issuecomment-447166765 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] BryanCutler commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window)
BryanCutler commented on a change in pull request #22305: [SPARK-24561][SQL][Python] User-defined window aggregation functions with Pandas UDF (bounded window) URL: https://github.com/apache/spark/pull/22305#discussion_r241609712 ## File path: python/pyspark/worker.py ## @@ -238,7 +284,8 @@ def read_udfs(pickleSer, infile, eval_type): # In the special case of a single UDF this will return a single result rather # than a tuple of results; this is the format that the JVM side expects. for i in range(num_udfs): -arg_offsets, udf = read_single_udf(pickleSer, infile, eval_type, runner_conf) +arg_offsets, udf = read_single_udf( +pickleSer, infile, eval_type, runner_conf, udf_index=i) Review comment: Yeah, basically this ``` window_eval_type_str, remaining_type_str = runner_conf['pandas_window_bound_types'].split(',', 1) runner_conf['pandas_window_bound_types'] = remaining_type_str window_eval_type = window_eval_type_str.strip().lower() ``` I'm not crazy about changing the conf inplace, but it wouldn't rely on any particular udf indexing then. Maybe it would make more sense to check the eval type before calling `read_single_udf`, process the conf and then send the window_eval_type as an optional param to `read_single_udf`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False
AmplabJenkins commented on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False URL: https://github.com/apache/spark/pull/23315#issuecomment-447162502 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100109/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA removed a comment on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False
SparkQA removed a comment on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False URL: https://github.com/apache/spark/pull/23315#issuecomment-447131107 **[Test build #100109 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100109/testReport)** for PR 23315 at commit [`dbc3ca0`](https://github.com/apache/spark/commit/dbc3ca05cd48edbb92aab1a2b7b6f8c2d0b4583d). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False
AmplabJenkins removed a comment on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False URL: https://github.com/apache/spark/pull/23315#issuecomment-447162499 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False
AmplabJenkins commented on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False URL: https://github.com/apache/spark/pull/23315#issuecomment-447162499 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] SparkQA commented on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False
SparkQA commented on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False URL: https://github.com/apache/spark/pull/23315#issuecomment-447162328 **[Test build #100109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/100109/testReport)** for PR 23315 at commit [`dbc3ca0`](https://github.com/apache/spark/commit/dbc3ca05cd48edbb92aab1a2b7b6f8c2d0b4583d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] BryanCutler commented on issue #23305: [SPARK-26355][PYSPARK] Add a workaround for PyArrow 0.11.
BryanCutler commented on issue #23305: [SPARK-26355][PYSPARK] Add a workaround for PyArrow 0.11. URL: https://github.com/apache/spark/pull/23305#issuecomment-447164827 late +1, thanks @ueshin ! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] asfgit closed pull request #23095: [SPARK-23886][SS] Update query status for ContinuousExecution
asfgit closed pull request #23095: [SPARK-23886][SS] Update query status for ContinuousExecution URL: https://github.com/apache/spark/pull/23095 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index 2cac86599ef19..f2dda0373c7ba 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -146,6 +146,12 @@ class MicroBatchExecution( logInfo(s"Query $prettyIdString was stopped") } + /** Begins recording statistics about query progress for a given trigger. */ + override protected def startTrigger(): Unit = { +super.startTrigger() +currentStatus = currentStatus.copy(isTriggerActive = true) + } + /** * Repeatedly attempts to run batches as data arrives. */ diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala index 392229bcb5f55..a5fbb56e27099 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala @@ -114,7 +114,6 @@ trait ProgressReporter extends Logging { logDebug("Starting Trigger Calculation") lastTriggerStartTimestamp = currentTriggerStartTimestamp currentTriggerStartTimestamp = triggerClock.getTimeMillis() -currentStatus = currentStatus.copy(isTriggerActive = true) currentTriggerStartOffsets = null currentTriggerEndOffsets = null currentDurationsMs.clear() diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala index 4a7df731da67d..adbec0b00f368 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala @@ -117,6 +117,8 @@ class ContinuousExecution( // For at least once, we can just ignore those reports and risk duplicates. commitLog.getLatest() match { case Some((latestEpochId, _)) => +updateStatusMessage("Starting new streaming query " + + s"and getting offsets from latest epoch $latestEpochId") val nextOffsets = offsetLog.get(latestEpochId).getOrElse { throw new IllegalStateException( s"Batch $latestEpochId was committed without end epoch offsets!") @@ -128,6 +130,7 @@ class ContinuousExecution( nextOffsets case None => // We are starting this stream for the first time. Offsets are all None. +updateStatusMessage("Starting new streaming query") logInfo(s"Starting new streaming query.") currentBatchId = 0 OffsetSeq.fill(continuousSources.map(_ => null): _*) @@ -260,6 +263,7 @@ class ContinuousExecution( epochUpdateThread.setDaemon(true) epochUpdateThread.start() + updateStatusMessage("Running") reportTimeTaken("runContinuous") { SQLExecution.withNewExecutionId( sparkSessionForQuery, lastExecution) { @@ -319,6 +323,8 @@ class ContinuousExecution( * before this is called. */ def commit(epoch: Long): Unit = { +updateStatusMessage(s"Committing epoch $epoch") + assert(continuousSources.length == 1, "only one continuous source supported currently") assert(offsetLog.get(epoch).isDefined, s"offset for epoch $epoch not reported before commit") diff --git a/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryStatus.scala b/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryStatus.scala index a0c9bcc8929eb..ca79e0248c06b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryStatus.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryStatus.scala @@ -28,9 +28,11 @@ import org.apache.spark.annotation.InterfaceStability * Reports information about the instantaneous status of a streaming query. * * @param message A human readable description of what the stream is currently doing. - * @param isDataAvailable True when there is new data to be processed. + * @param isDataAvailable True when there is new data to be processed. Doesn't apply + *to ContinuousExecution
[GitHub] rxin commented on issue #23316: [SPARK-26367] [SQL] Remove ReplaceExceptWithFilter from nonExcludableRules
rxin commented on issue #23316: [SPARK-26367] [SQL] Remove ReplaceExceptWithFilter from nonExcludableRules URL: https://github.com/apache/spark/pull/23316#issuecomment-447125517 Should these be in the optimizer, or run as a separate batch? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions
AmplabJenkins removed a comment on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions URL: https://github.com/apache/spark/pull/23291#issuecomment-447147385 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions
AmplabJenkins commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions URL: https://github.com/apache/spark/pull/23291#issuecomment-447147389 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6098/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions
AmplabJenkins commented on issue #23291: [SPARK-26203][SQL] Benchmark performance of In and InSet expressions URL: https://github.com/apache/spark/pull/23291#issuecomment-447147385 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] zsxwing commented on a change in pull request #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer
zsxwing commented on a change in pull request #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer URL: https://github.com/apache/spark/pull/23301#discussion_r241603867 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala ## @@ -340,9 +340,19 @@ private[kafka010] class KafkaSourceProvider extends DataSourceRegister // Validate user-specified Kafka options if (caseInsensitiveParams.contains(s"kafka.${ConsumerConfig.GROUP_ID_CONFIG}")) { - throw new IllegalArgumentException( -s"Kafka option '${ConsumerConfig.GROUP_ID_CONFIG}' is not supported as " + - s"user-specified consumer groups are not used to track offsets.") + logWarning( +s"It is not recommended to set Kafka option 'kafka.${ConsumerConfig.GROUP_ID_CONFIG}'. " + Review comment: Good point. Updated. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
AmplabJenkins removed a comment on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108#issuecomment-447166365 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100111/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
AmplabJenkins commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108#issuecomment-447166362 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
AmplabJenkins commented on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108#issuecomment-447166365 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/100111/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf*
AmplabJenkins commented on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf* URL: https://github.com/apache/spark/pull/23314#issuecomment-447166770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6099/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins commented on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf*
AmplabJenkins commented on issue #23314: [SPARK-26364][PYTHON][TESTING] Clean up imports in test_pandas_udf* URL: https://github.com/apache/spark/pull/23314#issuecomment-447166765 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer
AmplabJenkins removed a comment on issue #23301: [SPARK-26350][SS]Allow to override group id of the Kafka consumer URL: https://github.com/apache/spark/pull/23301#issuecomment-447166762 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories
AmplabJenkins removed a comment on issue #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE EXTERNAL TABLE with subdirectories URL: https://github.com/apache/spark/pull/23108#issuecomment-447166362 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False
AmplabJenkins removed a comment on issue #23315: [SPARK-26366][SQL] ReplaceExceptWithFilter should consider NULL as False URL: https://github.com/apache/spark/pull/23315#issuecomment-447131563 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] AmplabJenkins removed a comment on issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subqueries
AmplabJenkins removed a comment on issue #23057: [SPARK-26078][SQL] Dedup self-join attributes on IN subqueries URL: https://github.com/apache/spark/pull/23057#issuecomment-447131437 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/6097/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org