[GitHub] spark pull request #22125: [DOCS] Fix cloud-integration.md Typo
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22125 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22127 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94868/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22127 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22127 **[Test build #94868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94868/testReport)** for PR 22127 at commit [`8255336`](https://github.com/apache/spark/commit/825533682c98598409e537fa866dcdab915e3948). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22125 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22116: [DOCS]Update configuration.md
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22116 Merged to master. For the future, a better title and bundling these in one PR would be preferable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22090: [DOCS] Fixed NDCG formula issues
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22090#discussion_r210772634 --- Diff: docs/mllib-evaluation-metrics.md --- @@ -461,11 +461,11 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{ Normalized Discounted Cumulative Gain -$NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=0}^{n-1} +$NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=1}^{n} --- End diff -- We do need to fix this, but, this makes the subscripts incorrect for R_i(j). I think the expression should change to ln(j+2) in the next line; this is what the code does. For consistency I'd do the same below too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22090: [DOCS] Fixed NDCG formula issues
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22090#discussion_r210772686 --- Diff: docs/mllib-evaluation-metrics.md --- @@ -461,11 +461,11 @@ $$rel_D(r) = \begin{cases}1 & \text{if $r \in D$}, \\ 0 & \text{otherwise}.\end{ Normalized Discounted Cumulative Gain -$NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=0}^{n-1} +$NDCG(k)=\frac{1}{M} \sum_{i=0}^{M-1} {\frac{1}{IDCG(D_i, k)}\sum_{j=1}^{n} \frac{rel_{D_i}(R_i(j))}{\text{ln}(j+1)}} \\ \text{Where} \\ \hspace{5 mm} n = \text{min}\left(\text{max}\left(|R_i|,|D_i|\right),k\right) \\ -\hspace{5 mm} IDCG(D, k) = \sum_{j=0}^{\text{min}(\left|D\right|, k) - 1} \frac{1}{\text{ln}(j+1)}$ +\hspace{5 mm} IDCG(D, k) = \sum_{j=1}^{\text{min}(\left|D\right|, k)} \frac{1}{\text{ln}(j+1)}$ https://en.wikipedia.org/wiki/Information_retrieval#Discounted_cumulative_gain";>NDCG at k is a --- End diff -- We can update the link here to https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22125 **[Test build #4279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4279/testReport)** for PR 22125 at commit [`6031f70`](https://github.com/apache/spark/commit/6031f70b8f57f9b64335db33d8e219814a7bba9c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21584 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21584 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94862/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21584 **[Test build #94862 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94862/testReport)** for PR 21584 at commit [`6584029`](https://github.com/apache/spark/commit/658402919c080ae4d878d355a4b3a14a4d4d0aad). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22125 OK, we should bundle these, but w/e --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22125 **[Test build #4279 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4279/testReport)** for PR 22125 at commit [`6031f70`](https://github.com/apache/spark/commit/6031f70b8f57f9b64335db33d8e219814a7bba9c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20725: [SPARK-23555][PYTHON] Add BinaryType support for Arrow i...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/20725 still having problems getting this to pass: ``` [error] (sql-kafka-0-10/test:test) sbt.TestsFailedException: Tests unsuccessful [error] (core/test:test) sbt.TestsFailedException: Tests unsuccessful [error] Total time: 14786 s, completed Aug 16, 2018 4:05:09 PM ``` i rebased upstream changes from the main spark repo on my fork and launched another build. in ~5 hours we'll know how it went. :\ if this fails tonite, i'll figure out a hacky way to test this tomorrow morning. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22126 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94863/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22126 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22126 **[Test build #94863 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94863/testReport)** for PR 22126 at commit [`c005109`](https://github.com/apache/spark/commit/c005109ac517bc8db687318f5e93a35a1ae785c3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22128: Add test_slice() to streaming BasicOperations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22128 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22128: Add test_slice() to streaming BasicOperations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22128 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue
Github user bomeng commented on the issue: https://github.com/apache/spark/pull/22127 Good points. I will leave it open for any suggestions for improving the user experience.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22128: Add test_slice() to streaming BasicOperations
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22128 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22128: Add test_slice() to streaming BasicOperations
GitHub user cclauss opened a pull request: https://github.com/apache/spark/pull/22128 Add test_slice() to streaming BasicOperations As suggested in https://github.com/apache/spark/pull/20838#pullrequestreview-139118618 ## What changes were proposed in this pull request? Add a test for slice operations on streams. (Please fill in changes proposed in this fix) ## How was this patch tested? It is a new test being added to the automated test suite. (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cclauss/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22128.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22128 commit 4094422d58077aa95129a7ec9fddf75c2e3af7a7 Author: cclauss Date: 2018-08-16T23:06:59Z Add test_slice() to streaming BasicOperations As suggested in https://github.com/apache/spark/pull/20838#pullrequestreview-139118618 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r210767018 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2223,21 +2223,31 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { checkAnswer(jsonDF, Seq(Row("Chris", "Baird"))) } - test("SPARK-23723: specified encoding is not matched to actual encoding") { -val fileName = "test-data/utf16LE.json" -val schema = new StructType().add("firstName", StringType).add("lastName", StringType) -val exception = intercept[SparkException] { - spark.read.schema(schema) -.option("mode", "FAILFAST") -.option("multiline", "true") -.options(Map("encoding" -> "UTF-16BE")) -.json(testFile(fileName)) -.count() +def doCount(bypassParser: Boolean, multiLine: Boolean): Long = { + var result: Long = -1 + withSQLConf(SQLConf.BYPASS_PARSER_FOR_EMPTY_SCHEMA.key -> bypassParser.toString) { +val fileName = "test-data/utf16LE.json" +val schema = new StructType().add("firstName", StringType).add("lastName", StringType) +result = spark.read.schema(schema) + .option("mode", "FAILFAST") --- End diff -- This sounds good! Let us enable it only when PERMISSIVE is on. You know, our default mode is PERMISSIVE. This should benefit most users. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22108: [SPARK-25092][SQL][FOLLOWUP] Add RewriteCorrelate...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22108 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r210765672 --- Diff: docs/sql-programming-guide.md --- @@ -1894,6 +1894,7 @@ working with timestamps in `pandas_udf`s to get the best performance, see - In version 2.3 and earlier, CSV rows are considered as malformed if at least one column value in the row is malformed. CSV parser dropped such rows in the DROPMALFORMED mode or outputs an error in the FAILFAST mode. Since Spark 2.4, CSV row is considered as malformed only when it contains malformed column values requested from CSV datasource, other values can be ignored. As an example, CSV file contains the "id,name" header and one row "1234". In Spark 2.4, selection of the id column consists of a row with one column value 1234 but in Spark 2.3 and earlier it is empty in the DROPMALFORMED mode. To restore the previous behavior, set `spark.sql.csv.parser.columnPruning.enabled` to `false`. - Since Spark 2.4, File listing for compute statistics is done in parallel by default. This can be disabled by setting `spark.sql.parallelFileListingInStatsComputation.enabled` to `False`. - Since Spark 2.4, Metadata files (e.g. Parquet summary files) and temporary files are not counted as data files when calculating table size during Statistics computation. + - Since Spark 2.4, text-based datasources like CSV and JSON don't parse input lines if the required schema pushed down to the datasources is empty. The schema can be empty in the case of the count() action. For example, Spark 2.3 and earlier versions failed on JSON files with invalid encoding but Spark 2.4 returns total number of lines in the file. To restore the previous behavior when the underlying parser is always invoked even for the empty schema, set `true` to `spark.sql.legacy.bypassParserForEmptySchema`. This option will be removed in Spark 3.0. --- End diff -- Is it right based on what you said https://github.com/apache/spark/pull/21909#discussion_r210704902? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22108: [SPARK-25092][SQL][FOLLOWUP] Add RewriteCorrelatedScalar...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22108 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22127 @bomeng There seems to be bit of history to this :-) . Please check https://github.com/apache/spark/pull/15011 where we decided against silently switching to "default" database. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22117: [SPARK-23654][BUILD] remove jets3t as a dependenc...
Github user steveloughran closed the pull request at: https://github.com/apache/spark/pull/22117 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22081: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22081 Thanks. Two less JARs on the CP to keep up to date âwhat more can anyone want? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21950 **[Test build #94869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94869/testReport)** for PR 21950 at commit [`3a65edf`](https://github.com/apache/spark/commit/3a65edf0e07f3beb6d6dd4dcb16e76ea7210c5e9). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21135: [SPARK-24060][TEST] StreamingSymmetricHashJoinHelperSuit...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21135 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/G...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21561 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21561: [SPARK-24555][ML] logNumExamples in KMeans/BiKM/GMM/AFT/...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21561 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22112 Thanks for the clarification, but I guess my point is with your last statement: > - with assumption that we will expand solution to cover all later. If we document this and say we support unordered operations with the caveat that failures could result in different results, my assumption is we don't necessarily have to do anything else ever (this is what I am proposing). We could decide to for instance add an option to sort, or if its not a result stage fail more tasks to try handle the situation, but strictly speaking we wouldn't have to. If you think we have to fix those operations that can result in unordered then I think it comes back to we just don't support unordered operations at all and we should say that and probably force the sort on all these operations and possibly on all operations where user could cause it to be different order on rerun. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21909 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94860/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21909 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21909: [SPARK-24959][SQL] Speed up count() for JSON and CSV
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21909 **[Test build #94860 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94860/testReport)** for PR 21909 at commit [`6b34018`](https://github.com/apache/spark/commit/6b34018fcedffa0033cb281d619af79e15d99585). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22127 **[Test build #94868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94868/testReport)** for PR 22127 at commit [`8255336`](https://github.com/apache/spark/commit/825533682c98598409e537fa866dcdab915e3948). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22127 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22127: [SPARK-25032][SQL] fix drop database issue
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22127 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2258/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22127: [SPARK-25032][SQL] fix drop database issue
GitHub user bomeng opened a pull request: https://github.com/apache/spark/pull/22127 [SPARK-25032][SQL] fix drop database issue ## What changes were proposed in this pull request? When user tries to drop the current database (other than default database), after the database is deleted, we should set the database to default. ## How was this patch tested? A new test case is added to cover this scenario. You can merge this pull request into a Git repository by running: $ git pull https://github.com/bomeng/spark 25032 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22127.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22127 commit 825533682c98598409e537fa866dcdab915e3948 Author: Bo Meng Date: 2018-08-16T21:58:17Z fix drop database issue --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22114: [SPARK-24938][Core] Prevent Netty from using onheap memo...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22114 **[Test build #94867 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94867/testReport)** for PR 22114 at commit [`c2f9ed1`](https://github.com/apache/spark/commit/c2f9ed10776842ffe0746fcc89b157675fa6c455). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/22112 @tgravescs I was specifically in agreement with > Personally I don't want to talk about implementation until we decide what we want our semantics to be around the unordered operations because that affects any implementation. and > I would propose we fix the things that are using the round robin type partitioning (repartition) but then unordered things like zip/MapPartitions (via user code) we document or perhaps give the user the option to sort. IMO a fix in spark core for repartition should work for most (if not all) order dependent closures - we might choose not to implement for others due to time constraints; but basic idea should be fairly similar. Given this, I am fine with documenting the potential issue for others and fix for a core subset - with assumption that we will expand solution to cover all later. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22114: [SPARK-24938][Core] Prevent Netty from using onheap memo...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/22114 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22114: [SPARK-24938][Core] Prevent Netty from using onheap memo...
Github user NiharS commented on the issue: https://github.com/apache/spark/pull/22114 Tried with a significantly larger input, both with and without the change. They ran in just about the same time. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21990 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21990 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94866/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21990 **[Test build #94866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94866/testReport)** for PR 21990 at commit [`7ba70b5`](https://github.com/apache/spark/commit/7ba70b524f9779529142f6c70b04610b5b068a05). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class SparkExtensionsTest(unittest.TestCase, SQLTestUtils):` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21990: [SPARK-25003][PYSPARK] Use SessionExtensions in Pyspark
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21990 **[Test build #94866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94866/testReport)** for PR 21990 at commit [`7ba70b5`](https://github.com/apache/spark/commit/7ba70b524f9779529142f6c70b04610b5b068a05). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21950 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94853/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21950 Build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21950: [SPARK-24914][SQL][WIP] Add configuration to avoid OOM d...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21950 **[Test build #94853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94853/testReport)** for PR 21950 at commit [`aa2a957`](https://github.com/apache/spark/commit/aa2a957751a906fe538822cace019014e763a8c3). * This patch passes all tests. * This patch **does not merge cleanly**. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22124: [SPARK-25135][SQL] Insert datasource table may all null ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22124 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22124: [SPARK-25135][SQL] Insert datasource table may all null ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22124 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94861/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22124: [SPARK-25135][SQL] Insert datasource table may all null ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22124 **[Test build #94861 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94861/testReport)** for PR 22124 at commit [`276879c`](https://github.com/apache/spark/commit/276879ca2bd8d2966b829b7e41e140362c4e4160). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21221: [SPARK-23429][CORE] Add executor memory metrics to heart...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21221 **[Test build #94865 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94865/testReport)** for PR 21221 at commit [`2897281`](https://github.com/apache/spark/commit/2897281a384d25556609a17be21f926cb5d68dd6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22107: [SPARK-25117][R] Add EXEPT ALL and INTERSECT ALL support...
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/22107 @felixcheung I have incorporated the comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities...
Github user mn-mikke commented on a diff in the pull request: https://github.com/apache/spark/pull/22126#discussion_r210724650 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HigherOrderFunctionsSuite.scala --- @@ -363,9 +363,9 @@ class HigherOrderFunctionsSuite extends SparkFunSuite with ExpressionEvalHelper left: Expression, right: Expression, f: (Expression, Expression, Expression) => Expression): Expression = { - val MapType(kt, vt1, vcn1) = left.dataType.asInstanceOf[MapType] - val MapType(_, vt2, vcn2) = right.dataType.asInstanceOf[MapType] - MapZipWith(left, right, createLambda(kt, false, vt1, vcn1, vt2, vcn2, f)) + val MapType(kt, vt1, _) = left.dataType.asInstanceOf[MapType] --- End diff -- Optional suggestion: Maybe we could remove```asInstanceOf[MapType]``` here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22045: [SPARK-23940][SQL] Add transform_values SQL function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22045 **[Test build #94864 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94864/testReport)** for PR 22045 at commit [`3382e1a`](https://github.com/apache/spark/commit/3382e1a5396c8e5a94802d92a7106eacf627617c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/22112 @mridulm so just to clarify are you agreeing that we need to decide on what we do with zip and others or are you agreeing that we should document these as unordered actions thus retries might be different and only fix repartition? We can certainly add other options later but I don't want to change what we say the core zip behavior is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...
Github user ajacques closed the pull request at: https://github.com/apache/spark/pull/21889 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21584 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2256/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22126 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21584 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2256/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22126 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2257/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22126 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21584 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22081: [SPARK-23654][BUILD] remove jets3t as a dependenc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22081 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Thanks for the response all. @mailman If it's really your preference, I will create a PR against that branch and close this one. My intention was never to take away from your efforts, and I still consider my work here to be just minor stylistic tweaks on top of your work. I did this as service to help bridge the divide and hopefully alleviate frustrations. But this has been a bit frustrating being stuck between two sides of this and changing merge strategies often and don't wish to continue being in between like this. As such, I will create a PR, but hope it does not dragged out to settle any differences in opinions between maintainers and submitters. My goal is to make sure this valuable feature gets merged so many can benefit. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21584 Let's merge at the end of the day pacific time (~5PM-ish) on Friday, August 17, pending any additional feedback on the mailing list thread discussing the subject of including this in 2.4. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22081: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22081 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21584 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/2256/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22119: [WIP][SPARK-25129][SQL] Revert mapping com.databricks.sp...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22119 +1 for @tgravescs 's comments. In terms of usability, the mapping and configuration will be easier for the most customers. For the following @gengliangwang 's comment, technically there is no available published Databricks avro artifacts for Spark 2.4 (master branch) as of today. I assume that @gengliangwang will release it on the same day along with Apache Spark 2.4, but it would be great if we don't have that kind of undesirable assumptions which is beyond the Apache community. > For hive tables that used Databricks spark-avro, the tables can still use the Databricks repo(since the built-in spark-avro is not loaded by default) Additionally, 3rd party `spark-avro` will go to maintenance mode. Spark 3.0 may want to read the old `spark-avro` generated tables. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/22112 I agree @tgravescs, I was looking at the implementation to understand what the expectations are wrt newly introduced methods/fields and whether they make sense : I did not see any details furnished. I donât think we can hack our way out of this. I would expect a solution for repartition to also be applicable to other order dependent closures as well - though we might choose to fix them later, the basic approach ideally should be transferable. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22123: [SPARK-25134][SQL] Csv column pruning with checking of h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22123 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94854/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22123: [SPARK-25134][SQL] Csv column pruning with checking of h...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22123 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22123: [SPARK-25134][SQL] Csv column pruning with checking of h...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22123 **[Test build #94854 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94854/testReport)** for PR 22123 at commit [`c4179a9`](https://github.com/apache/spark/commit/c4179a9f0a85b412178323e6cb881385fa644051). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22126 **[Test build #94863 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94863/testReport)** for PR 22126 at commit [`c005109`](https://github.com/apache/spark/commit/c005109ac517bc8db687318f5e93a35a1ae785c3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22126 cc @mn-mikke --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities...
GitHub user ueshin opened a pull request: https://github.com/apache/spark/pull/22126 [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of value arguments should be true. ## What changes were proposed in this pull request? This is a follow-up pr of #22017 which added `map_zip_with` function. In the test, when creating a lambda function, we use the `valueContainsNull` values for the nullabilities of the value arguments, but we should've used `true` as the same as `bind` method because the values might be `null` if the keys don't match. ## How was this patch tested? Added small tests and existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ueshin/apache-spark issues/SPARK-23938/fix_tests Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22126.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22126 commit c005109ac517bc8db687318f5e93a35a1ae785c3 Author: Takuya UESHIN Date: 2018-08-16T19:14:34Z Fix a test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21584 **[Test build #94862 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94862/testReport)** for PR 21584 at commit [`6584029`](https://github.com/apache/spark/commit/658402919c080ae4d878d355a4b3a14a4d4d0aad). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21584: [SPARK-24433][K8S] Initial R Bindings for SparkR on K8s
Github user ifilonenko commented on the issue: https://github.com/apache/spark/pull/21584 This PR has been updated to pass Jenkins by removing the `with RTestsSuite` line in `KubernetesSuite`. As such, this feature may be merged and the `with RTestsSuite` will be re-included in a separate PR for when the Jenkins is updated with the new Ubuntu OS. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > I see no point of leaving this PR open. I don't agree with you on that point, and I've expressed my view in https://github.com/apache/spark/pull/21889#issuecomment-413655304. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Essentially, this PR was created to take the management of #21320 out of my hands, with a view towards facilitating its incorporation into Spark 2.4. It was my suggestion, one based in frustration. In hindsight, I no longer believe this strategy is the bestâor most expedientâapproach towards progress. Indeed, I believe the direction of this PR has become orthogonal to its motivating goal, becoming a dispute between myself and @HyukjinKwon rather than a means to move things along. I believe I can shepherd #21320 in a way that will promote greater progress. @ajacques, I mean no disrespect, and I thank you for volunteering your time, patience and effort for the sake of all that are interested in seeing this patch become a part of Spark. And I apologize for letting you down, letting everyone down. In my conduct leading up to the creation of this PR I did not act with the greatest maturity or patience. And I did not act in the best interests of the community. No one has spent more time or more effort, taken more responsibility or exhibited more patience with this 2+ year patch-set-in-the-making than myself. I respectfully submit it is mine to present and manage, and no one else's. Insofar as I have expressed otherwise in the past, I admit my errorâone made in frustrationâand recant in hindsight. @ajacques, at this point I respectfully assert that managing the patch set I submitted in #21320 is not your responsibility, nor is it anyone else's but mine. I ask you to close this PR so that we can resume the review in #21320. As I stated there, you are welcome to open a PR on https://github.com/VideoAmp/spark-public/tree/spark-4502-parquet_column_pruning-foundation to submit the changes you've made for review. Thank you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo
Github user KraFusion commented on the issue: https://github.com/apache/spark/pull/22125 @kiszk PR created yesterday for ```configurations.md``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/22125 Thanks, would it possible to address similar issues? For example, in `configurations.md`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...
Github user vackosar commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r210708107 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -254,3 +259,10 @@ class SinkProgress protected[sql]( } } } + +private[sql] object SinkProgress { + val DEFAULT_NUM_OUTPUT_ROWS: Long = -1L --- End diff -- I will implement this for continuous streaming and then only legacy sinks would output -1. I didn't wanted to change the API too often. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22125 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22125 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22125 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21919 LGTM overall except one minor comment. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...
Github user arunmahadevan commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r210707152 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -254,3 +259,10 @@ class SinkProgress protected[sql]( } } } + +private[sql] object SinkProgress { + val DEFAULT_NUM_OUTPUT_ROWS: Long = -1L --- End diff -- Does it result in sink progress output with "numOutputRows = -1" ? Maybe add numOutputRows to the output only if the value is not default. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22125: [DOCS] Fix cloud-integration.md Typo
GitHub user KraFusion opened a pull request: https://github.com/apache/spark/pull/22125 [DOCS] Fix cloud-integration.md Typo Corrected typo; changed spark-default.conf to spark-defaults.conf You can merge this pull request into a Git repository by running: $ git pull https://github.com/KraFusion/spark-1 patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22125.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22125 commit 6031f70b8f57f9b64335db33d8e219814a7bba9c Author: Joey Krabacher Date: 2018-08-16T18:58:54Z [DOCS] Fix cloud-integration.md Typo changed spark-default.conf to spark-defaults.conf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/22117 Test failure in ` org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.SuiteSelector)`: ``` Caused by: sbt.ForkMain$ForkError: java.lang.NoClassDefFoundError: javax/jdo/JDOException at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5501) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:184) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:73) ... 41 more Caused by: sbt.ForkMain$ForkError: java.lang.ClassNotFoundException: javax.jdo.JDOException at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.doLoadClass(IsolatedClientLoader.scala:227) at org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1.loadClass(IsolatedClientLoader.scala:216) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 44 more ``` somehow datanucleus JARs aren't on the CP for the hive test. I can't see how this patch is causing this âcan anyone else? But if not: why is it surfacing here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21909: [SPARK-24959][SQL] Speed up count() for JSON and ...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/21909#discussion_r210704902 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala --- @@ -2223,21 +2223,31 @@ class JsonSuite extends QueryTest with SharedSQLContext with TestJsonData { checkAnswer(jsonDF, Seq(Row("Chris", "Baird"))) } - test("SPARK-23723: specified encoding is not matched to actual encoding") { -val fileName = "test-data/utf16LE.json" -val schema = new StructType().add("firstName", StringType).add("lastName", StringType) -val exception = intercept[SparkException] { - spark.read.schema(schema) -.option("mode", "FAILFAST") -.option("multiline", "true") -.options(Map("encoding" -> "UTF-16BE")) -.json(testFile(fileName)) -.count() +def doCount(bypassParser: Boolean, multiLine: Boolean): Long = { + var result: Long = -1 + withSQLConf(SQLConf.BYPASS_PARSER_FOR_EMPTY_SCHEMA.key -> bypassParser.toString) { +val fileName = "test-data/utf16LE.json" +val schema = new StructType().add("firstName", StringType).add("lastName", StringType) +result = spark.read.schema(schema) + .option("mode", "FAILFAST") --- End diff -- > Does the mode matter? I just want to have an explicit error in the test instead of `0` for `count()` ( `DROPMALFORMED`), or full table of nulls or an exception (`PERMISSIVE`) since an exception is expected result. > What happened if users use DROPMALFORMED before this PR? It depends on `multiLine`. If it is `true`, behaviour before and after PR is the same since the optimization doesn't impact on the `multiLine` mode. For `multiLine` equals to `false`, after the PR the result is `5` (total number of lines), before the PR - `0` in the `DROPMALFORMED` mode. We can enable this optimization for the `PERMISSIVE` mode only to exclude any deviation in outputs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21819 @HyukjinKwon , can you take it forward? Appreciate your effort and thanks in advance. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22124: [SPARK-25135][SQL] Insert datasource table may all null ...
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/22124 cc @gengliangwang --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22117 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94851/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22117: [SPARK-23654][BUILD] remove jets3t as a dependency of sp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22117 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org