[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...
Github user ScrapCodes commented on the issue: https://github.com/apache/spark/pull/14151 @gatorsmile Ping ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19620 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19620 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83243/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19620 **[Test build #83243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83243/testReport)** for PR 19620 at commit [`a762b1f`](https://github.com/apache/spark/commit/a762b1fbebcb73964e4fb2bcd910014fb9a67989). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19619 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83244/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19619 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19619 **[Test build #83244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83244/testReport)** for PR 19619 at commit [`efa16a6`](https://github.com/apache/spark/commit/efa16a636ec508c13a54a42b292233b0eed55df9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #83249 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83249/testReport)** for PR 16677 at commit [`e53648e`](https://github.com/apache/spark/commit/e53648e7f58f439bb09a702521c2f84cf2e344bd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16677 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83247/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #83247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83247/testReport)** for PR 16677 at commit [`7598337`](https://github.com/apache/spark/commit/759833712a9be4b3f3f65cf4722ddd33851726e8). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19557: [SPARK-22281][SPARKR] Handle R method breaking signature...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19557 **[Test build #83248 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83248/testReport)** for PR 19557 at commit [`1b41f73`](https://github.com/apache/spark/commit/1b41f73a2cdea5ebc7a0c3346dd37d9841cc72df). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 ping @cloud-fan @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19528: [SPARK-20393][WEBU UI][1.6] Strengthen Spark to prevent ...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19528 @shaneknapp - could you help check - what version of SciPy Jenkins is running with? thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19557: [SPARK-22281][SPARKR] Handle R method breaking signature...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19557 rebased --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16677 **[Test build #83247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83247/testReport)** for PR 16677 at commit [`7598337`](https://github.com/apache/spark/commit/759833712a9be4b3f3f65cf4722ddd33851726e8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16578 @mallman I will try to go through this again. Do you think this can be generalize to data source v2 API? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19617 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83242/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19617 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19617 **[Test build #83242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83242/testReport)** for PR 19617 at commit [`a43430b`](https://github.com/apache/spark/commit/a43430b99d0e5aab351467386fe566461b2a4b06). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19601 **[Test build #83246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83246/testReport)** for PR 19601 at commit [`b971506`](https://github.com/apache/spark/commit/b971506f8d5138a2c23e039427d547b736079c13). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16578 thanks! ping/add @rxin @hvanhovell @gatorsmile @cloud-fan @liancheng @joseph-torres --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19601 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18906 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19619 **[Test build #83244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83244/testReport)** for PR 19619 at commit [`efa16a6`](https://github.com/apache/spark/commit/efa16a636ec508c13a54a42b292233b0eed55df9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for vers...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19620 **[Test build #83243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83243/testReport)** for PR 19620 at commit [`a762b1f`](https://github.com/apache/spark/commit/a762b1fbebcb73964e4fb2bcd910014fb9a67989). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19618: [SPARK-5484][Followup] PeriodicRDDCheckpointer doc clean...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19618 **[Test build #83245 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83245/testReport)** for PR 19618 at commit [`2858cbb`](https://github.com/apache/spark/commit/2858cbb5c8264d7bee592835b56a415961ed1dc4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19620: [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check f...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/19620 [SPARK-22327][SPARKR][TEST][BACKPORT-2.1] check for version warning ## What changes were proposed in this pull request? Will need to port to this to branch-1.6, -2.0, -2.1, -2.2 ## How was this patch tested? manually Jenkins, AppVeyor Author: Felix CheungCloses #19549 from felixcheung/rcranversioncheck. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rcranversioncheck21 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19620.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19620 commit a762b1fbebcb73964e4fb2bcd910014fb9a67989 Author: Felix Cheung Date: 2017-10-31T04:44:24Z [SPARK-22327][SPARKR][TEST] check for version warning ## What changes were proposed in this pull request? Will need to port to this to branch-1.6, -2.0, -2.1, -2.2 ## How was this patch tested? manually Jenkins, AppVeyor Author: Felix Cheung Closes #19549 from felixcheung/rcranversioncheck. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19619: [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check f...
GitHub user felixcheung opened a pull request: https://github.com/apache/spark/pull/19619 [SPARK-22327][SPARKR][TEST][BACKPORT-2.2] check for version warning ## What changes were proposed in this pull request? Will need to port to this to branch-1.6, -2.0, -2.1, -2.2 ## How was this patch tested? manually Jenkins, AppVeyor Author: Felix CheungCloses #19549 from felixcheung/rcranversioncheck. ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixcheung/spark rcranversioncheck22 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19619.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19619 commit efa16a636ec508c13a54a42b292233b0eed55df9 Author: Felix Cheung Date: 2017-10-31T04:44:24Z [SPARK-22327][SPARKR][TEST] check for version warning ## What changes were proposed in this pull request? Will need to port to this to branch-1.6, -2.0, -2.1, -2.2 ## How was this patch tested? manually Jenkins, AppVeyor Author: Felix Cheung Closes #19549 from felixcheung/rcranversioncheck. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19550: [SPARK-22327][SPARKR][TEST][BACKPORT-2.0] check f...
Github user felixcheung closed the pull request at: https://github.com/apache/spark/pull/19550 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19618: [SPARK-5484][Followup] PeriodicRDDCheckpointer do...
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/19618 [SPARK-5484][Followup] PeriodicRDDCheckpointer doc cleanup ## What changes were proposed in this pull request? PeriodicRDDCheckpointer was already moved out of mllib in Spark-5484 ## How was this patch tested? existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/zhengruifeng/spark checkpointer_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19618.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19618 commit 2858cbb5c8264d7bee592835b56a415961ed1dc4 Author: Zheng RuiFengDate: 2017-10-31T04:39:59Z create pr --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19549: [SPARK-22327][SPARKR][TEST] check for version war...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19549 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19549: [SPARK-22327][SPARKR][TEST] check for version warning
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19549 merged to master. will backport separately --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19617 **[Test build #83242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83242/testReport)** for PR 19617 at commit [`a43430b`](https://github.com/apache/spark/commit/a43430b99d0e5aab351467386fe566461b2a4b06). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19617 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19617 cc @HyukjinKwon @BryanCutler --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19617 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19617: [SPARK-22347][PySpark][DOC] Add document to notice users...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19617 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83241/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19617: [SPARK-22347][PySpark][DOC] Add document to notic...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/19617 [SPARK-22347][PySpark][DOC] Add document to notice users for using udfs with conditional expressions ## What changes were proposed in this pull request? Under the current execution mode of Python UDFs, we don't well support Python UDFs as branch values or else value in CaseWhen expression. Since to fix it might need the change not small and this issue has simpler workaround. We should just notice users in the document about this. ## How was this patch tested? Only document change. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 SPARK-22347-3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19617.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19617 commit a43430b99d0e5aab351467386fe566461b2a4b06 Author: Liang-Chi HsiehDate: 2017-10-31T04:28:16Z Add document to notice users for using udfs with conditional expressions. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19592: [SPARK-22347][SQL][PySpark] Support optionally running P...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19592 After collected the opinions so far, doing just document is the consensus. I will close this for now and submit a simple PR to document it later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19592: [SPARK-22347][SQL][PySpark] Support optionally ru...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19592#discussion_r147892336 --- Diff: python/pyspark/worker.py --- @@ -105,8 +105,14 @@ def read_single_udf(pickleSer, infile, eval_type): elif eval_type == PythonEvalType.SQL_PANDAS_GROUPED_UDF: # a groupby apply udf has already been wrapped under apply() return arg_offsets, row_func -else: +elif eval_type == PythonEvalType.SQL_BATCHED_UDF: return arg_offsets, wrap_udf(row_func, return_type) +elif eval_type == PythonEvalType.SQL_BATCHED_OPT_UDF: --- End diff -- One possible is, we do the wrapping when creating UDFs in Python side. Even for UDFs not used in conditional expressions, we still add an extra boolean argument to the end of its argument list. We don't need another eval_type with this fix. But currently I think documenting it seems a more acceptable fix for others. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19592: [SPARK-22347][SQL][PySpark] Support optionally ru...
Github user viirya closed the pull request at: https://github.com/apache/spark/pull/19592 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19592: [SPARK-22347][SQL][PySpark] Support optionally ru...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19592#discussion_r147891641 --- Diff: python/pyspark/worker.py --- @@ -105,8 +105,14 @@ def read_single_udf(pickleSer, infile, eval_type): elif eval_type == PythonEvalType.SQL_PANDAS_GROUPED_UDF: # a groupby apply udf has already been wrapped under apply() return arg_offsets, row_func -else: +elif eval_type == PythonEvalType.SQL_BATCHED_UDF: return arg_offsets, wrap_udf(row_func, return_type) +elif eval_type == PythonEvalType.SQL_BATCHED_OPT_UDF: --- End diff -- Because the python functions are serialized and maybe broadcasted further, I didn't figure out a way to do this wrapping in `BatchEvalPython` in Scala side. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19613: Fixed a typo
Github user jmchung commented on the issue: https://github.com/apache/spark/pull/19613 Hi @ganeshchand , could you also fix the typo in `JdbcUtils.scala`? Thanks! #L459 underling => underlying --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83240/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19606: [SPARK-22333][SQL][Backport-2.2]timeFunctionCall(CURRENT...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19606 **[Test build #83239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83239/testReport)** for PR 19606 at commit [`2bcc2ea`](https://github.com/apache/spark/commit/2bcc2ea6fd0ca9f12959246bb9ee6796cb7a90a0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19606: [SPARK-22333][SQL][Backport-2.2]timeFunctionCall(CURRENT...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19606 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19601 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r147887853 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -89,19 +93,159 @@ case class AnalyzeColumnCommand( // The first element in the result will be the overall row count, the following elements // will be structs containing all column stats. // The layout of each struct follows the layout of the ColumnStats. -val ndvMaxErr = sparkSession.sessionState.conf.ndvMaxError val expressions = Count(Literal(1)).toAggregateExpression() +: - attributesToAnalyze.map(ColumnStat.statExprs(_, ndvMaxErr)) + attributesToAnalyze.map(statExprs(_, sparkSession.sessionState.conf)) val namedExpressions = expressions.map(e => Alias(e, e.toString)()) val statsRow = new QueryExecution(sparkSession, Aggregate(Nil, namedExpressions, relation)) .executedPlan.executeTake(1).head val rowCount = statsRow.getLong(0) -val columnStats = attributesToAnalyze.zipWithIndex.map { case (attr, i) => - // according to `ColumnStat.statExprs`, the stats struct always have 6 fields. - (attr.name, ColumnStat.rowToColumnStat(statsRow.getStruct(i + 1, 6), attr)) -}.toMap -(rowCount, columnStats) +val colStats = rowToColumnStats(sparkSession, relation, attributesToAnalyze, statsRow, rowCount) +(rowCount, colStats) + } + + /** + * Constructs an expression to compute column statistics for a given column. + * + * The expression should create a single struct column with the following schema: + * distinctCount: Long, min: T, max: T, nullCount: Long, avgLen: Long, maxLen: Long, + * percentiles: Array[T] + * + * Together with [[rowToColumnStats]], this function is used to create [[ColumnStat]] and + * as a result should stay in sync with it. + */ + private def statExprs(col: Attribute, conf: SQLConf): CreateNamedStruct = { +def struct(exprs: Expression*): CreateNamedStruct = CreateStruct(exprs.map { expr => + expr.transformUp { case af: AggregateFunction => af.toAggregateExpression() } +}) +val one = Literal(1, LongType) + +// the approximate ndv (num distinct value) should never be larger than the number of rows +val numNonNulls = if (col.nullable) Count(col) else Count(one) +val ndv = Least(Seq(HyperLogLogPlusPlus(col, conf.ndvMaxError), numNonNulls)) +val numNulls = Subtract(Count(one), numNonNulls) +val defaultSize = Literal(col.dataType.defaultSize, LongType) +val nullArray = Literal(null, ArrayType(DoubleType)) + +def fixedLenTypeExprs(castType: DataType) = { + // For fixed width types, avg size should be the same as max size. + Seq(ndv, Cast(Min(col), castType), Cast(Max(col), castType), numNulls, defaultSize, +defaultSize) +} + +def fixedLenTypeStruct(castType: DataType, genHistogram: Boolean) = { + val percentileExpr = if (genHistogram) { +// To generate equi-height histogram, we need to: +// 1. get percentiles p(1/n), p(2/n) ... p((n-1)/n), +// 2. use min, max, and percentiles as range values of buckets, e.g. [min, p(1/n)], +// [p(1/n), p(2/n)] ... [p((n-1)/n), max], and then count ndv in each bucket. +// Step 2 will be performed in `rowToColumnStats`. --- End diff -- Do you mean calculate percentiles for min/max at the step 1? Currently other percentiles are already calculated at step 1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19272 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83235/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19272 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19272 **[Test build #83235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83235/testReport)** for PR 19272 at commit [`864ab7e`](https://github.com/apache/spark/commit/864ab7ec659a5071e0ed1a87d2448c507b815a79). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r147887335 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -216,65 +218,61 @@ object ColumnStat extends Logging { } } - /** - * Constructs an expression to compute column statistics for a given column. - * - * The expression should create a single struct column with the following schema: - * distinctCount: Long, min: T, max: T, nullCount: Long, avgLen: Long, maxLen: Long - * - * Together with [[rowToColumnStat]], this function is used to create [[ColumnStat]] and - * as a result should stay in sync with it. - */ - def statExprs(col: Attribute, relativeSD: Double): CreateNamedStruct = { -def struct(exprs: Expression*): CreateNamedStruct = CreateStruct(exprs.map { expr => - expr.transformUp { case af: AggregateFunction => af.toAggregateExpression() } -}) -val one = Literal(1, LongType) + private def convertToHistogram(s: String): EquiHeightHistogram = { +val idx = s.indexOf(",") +if (idx <= 0) { + throw new AnalysisException("Failed to parse histogram.") +} +val height = s.substring(0, idx).toDouble +val pattern = "Bucket\\(([^,]+), ([^,]+), ([^\\)]+)\\)".r +val buckets = pattern.findAllMatchIn(s).map { m => + EquiHeightBucket(m.group(1).toDouble, m.group(2).toDouble, m.group(3).toLong) +}.toSeq +EquiHeightHistogram(height, buckets) + } -// the approximate ndv (num distinct value) should never be larger than the number of rows -val numNonNulls = if (col.nullable) Count(col) else Count(one) -val ndv = Least(Seq(HyperLogLogPlusPlus(col, relativeSD), numNonNulls)) -val numNulls = Subtract(Count(one), numNonNulls) -val defaultSize = Literal(col.dataType.defaultSize, LongType) +} -def fixedLenTypeStruct(castType: DataType) = { - // For fixed width types, avg size should be the same as max size. - struct(ndv, Cast(Min(col), castType), Cast(Max(col), castType), numNulls, defaultSize, -defaultSize) -} +/** + * There are a few types of histograms in state-of-the-art estimation methods. E.g. equi-width + * histogram, equi-height histogram, frequency histogram (value-frequency pairs) and hybrid + * histogram, etc. + * Currently in Spark, we support equi-height histogram since it is good at handling skew + * distribution, and also provides reasonable accuracy in other cases. + * We can add other histograms in the future, which will make estimation logic more complicated. --- End diff -- It's not in high priority, here I just want to say it's doable, but will complicate the estimation logic. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r147886882 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -177,13 +180,12 @@ object ColumnStat extends Logging { Some(ColumnStat( distinctCount = BigInt(map(KEY_DISTINCT_COUNT).toLong), // Note that flatMap(Option.apply) turns Option(null) into None. -min = map.get(KEY_MIN_VALUE) - .map(fromExternalString(_, field.name, field.dataType)).flatMap(Option.apply), -max = map.get(KEY_MAX_VALUE) - .map(fromExternalString(_, field.name, field.dataType)).flatMap(Option.apply), +min = map.get(KEY_MIN_VALUE).map(fromString(_, field.name, field.dataType)), +max = map.get(KEY_MAX_VALUE).map(fromString(_, field.name, field.dataType)), --- End diff -- Yea, but I tend to revert the change because keep `flatMap(Option.apply)` is more robust. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19439 **[Test build #83238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83238/testReport)** for PR 19439 at commit [`e314327`](https://github.com/apache/spark/commit/e314327dd74c0092194c311a531c8a8bb90fdb86). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19479: [SPARK-17074] [SQL] Generate equi-height histogra...
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19479#discussion_r147886758 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/Statistics.scala --- @@ -155,6 +156,8 @@ object ColumnStat extends Logging { private val KEY_NULL_COUNT = "nullCount" private val KEY_AVG_LEN = "avgLen" private val KEY_MAX_LEN = "maxLen" + val KEY_HISTOGRAM = "histogram" + val KEY_HISTOGRAM_SEPARATOR = "-" --- End diff -- they are used in `HiveExternalCatalog` for stats/properties conversion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/19439 Jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83237/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19601 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83236/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19601 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19601 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19459 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83233/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #83233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83233/testReport)** for PR 19459 at commit [`cfb1c3d`](https://github.com/apache/spark/commit/cfb1c3dd48abc7073cf0f98e529afae4e1157d78). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19615 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19615 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83234/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19615 **[Test build #83234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83234/testReport)** for PR 19615 at commit [`46f530f`](https://github.com/apache/spark/commit/46f530fe777c921d43a2f323abc91d8bb69423d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19611 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19611 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83229/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19611 **[Test build #83229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83229/testReport)** for PR 19611 at commit [`d98ce9e`](https://github.com/apache/spark/commit/d98ce9e34050d0ef08a6e8802952a3c3bb6fc896). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19614 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83232/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19614 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19614 **[Test build #83232 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83232/testReport)** for PR 19614 at commit [`5c04540`](https://github.com/apache/spark/commit/5c045400659f3bf149e39ba8ec6d4a13f1210e72). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19616: [SPARK-22404][YARN][WIP] Provide an option to use unmana...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19616 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19616: [SPARK-22404][YARN][WIP] Provide an option to use...
GitHub user devaraj-kavali opened a pull request: https://github.com/apache/spark/pull/19616 [SPARK-22404][YARN][WIP] Provide an option to use unmanaged AM in yarn-client mode ## What changes were proposed in this pull request? Providing a new configuration "spark.yarn.un-managed-am" (defaults to false) to enable the Unmanaged AM Application in Yarn Client mode which launches the Application Master service as part of the Client. It utilizes the existing code for communicating between the Application Master <-> Task Scheduler for the container requests/allocations/launch, and eliminates these 1. Allocating and launching the Application Master container 2. Remote Node/Process communication between Application Master <-> Task Scheduler ## How was this patch tested? I verified manually running the applications in yarn-client mode with "spark.yarn.un-managed-am" enabled, and also ensured that there is no impact to the existing execution flows. I am verifying some more failure scenarios, will update the PR if anything needs to be fixed. I would like to hear others feedback/thoughts on this. You can merge this pull request into a Git repository by running: $ git pull https://github.com/devaraj-kavali/spark SPARK-22404 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19616.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19616 commit e51f99ef04e4fd797f4c715b1773c1d245a8a0cd Author: Devaraj KDate: 2017-10-31T00:06:48Z [SPARK-22404][YARN] Provide an option to use unmanaged AM in yarn-client mode --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19272: [Spark-21842][Mesos] Support Kerberos ticket renewal and...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19272 **[Test build #83235 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83235/testReport)** for PR 19272 at commit [`864ab7e`](https://github.com/apache/spark/commit/864ab7ec659a5071e0ed1a87d2448c507b815a79). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19272: [Spark-21842][Mesos] Support Kerberos ticket rene...
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19272#discussion_r147866441 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterManager.scala --- @@ -17,7 +17,7 @@ package org.apache.spark.scheduler.cluster.mesos -import org.apache.spark.{SparkContext, SparkException} +import org.apache.spark.SparkContext --- End diff -- `SparkException` is unused, not sure why it was there in the first place --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19568: SPARK-22345: Fix sort-merge joins with conditions and co...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19568 @rdblue Yes, the current implementation implicitly assumes the rule `CollapseCodegenStages ` excludes all the illegal cases. How about adding an `assert` to do the check that the condition of `SortMergeJoinExec` does not have `CodegenFallback ` expressions? Also write a code comment to explain `CollapseCodegenStages ` guarantees the assumption? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19459 I think it is a bug, we should fix it first. BTW I'm fine to upgrade arrow, just make sure we get everything we need at the arrow version we wanna upgrade, then remove all the hacks at Spark side. We should throw exception if users have an old arrow version installed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19615 **[Test build #83234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83234/testReport)** for PR 19615 at commit [`46f530f`](https://github.com/apache/spark/commit/46f530fe777c921d43a2f323abc91d8bb69423d5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/19459 After incorporating date and timestamp types for this, I had to refactor a little to use `_create_batch` from serializers to make Arrow batches from Columns even when the user doesn't specify the schema to be able to use the casts for these types. It doesn't seem to affect performance from the initial benchmark. I came across an issue when using pandas DataFrame with timestamps without Arrow. Spark will read values as long and not datetime, so currently a test for this will fail ``` In [1]: spark.conf.set("spark.sql.execution.arrow.enabled", "false") In [2]: import pandas as pd ...: from datetime import datetime ...: In [3]: pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)]}) In [4]: df = spark.createDataFrame(pdf) In [5]: df.show() +---+ | ts| +---+ |15094116610| +---+ In [6]: df.schema Out[6]: StructType(List(StructField(ts,LongType,true))) In [7]: pdf Out[7]: ts 0 2017-10-31 01:01:01 In [9]: pdf.dtypes Out[9]: tsdatetime64[ns] dtype: object ``` @HyukjinKwon or @ueshin could you confirm you see the same? and do you consider this a bug? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19615: [SPARK-19611][SQL][followup] set dataSchema correctly in...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19615 cc @budde @gatorsmile --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19615: [SPARK-19611][SQL][followup] set dataSchema corre...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/19615 [SPARK-19611][SQL][followup] set dataSchema correctly in HiveMetastoreCatalog.convertToLogicalRelation ## What changes were proposed in this pull request? We made a mistake in https://github.com/apache/spark/pull/16944 . In `HiveMetastoreCatalog#inferIfNeeded` we infer the data schema, merge with full schema, and return the new full schema. At caller side we treat the full schema as data schema and set it to `HadoopFsRelation`. This doesn't cause any problem because both parquet and orc can work with a wrong data schema that has extra columns, but it's better to fix this mistake. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark infer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19615.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19615 commit 46f530fe777c921d43a2f323abc91d8bb69423d5 Author: Wenchen FanDate: 2017-10-30T23:05:57Z set dataSchema correctly in HiveMetastoreCatalog.convertToLogicalRelation --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19614 **[Test build #83232 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83232/testReport)** for PR 19614 at commit [`5c04540`](https://github.com/apache/spark/commit/5c045400659f3bf149e39ba8ec6d4a13f1210e72). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark DataFra...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19459 **[Test build #83233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83233/testReport)** for PR 19459 at commit [`cfb1c3d`](https://github.com/apache/spark/commit/cfb1c3dd48abc7073cf0f98e529afae4e1157d78). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/19614 I will fix the style shortly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15770 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83230/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15770 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15770 **[Test build #83230 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83230/testReport)** for PR 15770 at commit [`cfa18af`](https://github.com/apache/spark/commit/cfa18af7ed27eccebc7af97be8d7e1f4227a5ffa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19614 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19614 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83231/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19614 **[Test build #83231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83231/testReport)** for PR 19614 at commit [`ddc97ef`](https://github.com/apache/spark/commit/ddc97efed418698b81cce70e8cd0498e46dbcd88). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19614: [SPARK-22399][ML] update the location of reference paper
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19614 **[Test build #83231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83231/testReport)** for PR 19614 at commit [`ddc97ef`](https://github.com/apache/spark/commit/ddc97efed418698b81cce70e8cd0498e46dbcd88). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19614: update the location of reference paper
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/19614 update the location of reference paper ## What changes were proposed in this pull request? Update the url of reference paper. ## How was this patch tested? It is comments, so nothing tested. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark 22399 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/19614.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19614 commit ddc97efed418698b81cce70e8cd0498e46dbcd88 Author: bomengDate: 2017-10-30T22:31:05Z update the location of reference paper --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/19611 LGTM pending tests --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15770: [SPARK-15784][ML]:Add Power Iteration Clustering to spar...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15770 **[Test build #83230 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83230/testReport)** for PR 15770 at commit [`cfa18af`](https://github.com/apache/spark/commit/cfa18af7ed27eccebc7af97be8d7e1f4227a5ffa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19611: [SPARK-22305] Write HDFSBackedStateStoreProvider.loadMap...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19611 **[Test build #83229 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83229/testReport)** for PR 19611 at commit [`d98ce9e`](https://github.com/apache/spark/commit/d98ce9e34050d0ef08a6e8802952a3c3bb6fc896). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org