[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22071 **[Test build #4241 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4241/testReport)** for PR 22071 at commit [`b4ca224`](https://github.com/apache/spark/commit/b4ca224095cb7fda6822c431465bfb7f48a4bb2d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94573/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #94573 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94573/testReport)** for PR 21732 at commit [`80506f4`](https://github.com/apache/spark/commit/80506f4e98184ccd66dbaac14ec52d69c358020d). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` * For example, we build an encoder for `case class Data(a: Int, b: String)` and the real type` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22047 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22047 **[Test build #94586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94586/testReport)** for PR 22047 at commit [`af4d901`](https://github.com/apache/spark/commit/af4d9011adb290e4300efce936d25b4de4ec5cd5). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22047 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94586/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22047 **[Test build #94586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94586/testReport)** for PR 22047 at commit [`af4d901`](https://github.com/apache/spark/commit/af4d9011adb290e4300efce936d25b4de4ec5cd5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22047 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22047: [SPARK-19851] Add support for EVERY and ANY (SOME) aggre...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22047 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2059/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21746: [SPARK-24699] [SS]Make watermarks work with Trigger.Once...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/21746 @c-horn it's in 2.4.0. I just fixed the ticket. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22071 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94572/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21746: [SPARK-24699] [SS]Make watermarks work with Trigger.Once...
Github user c-horn commented on the issue: https://github.com/apache/spark/pull/21746 @tdas this will not be included in `2.4.0`? as indicated [SPARK-24699](https://issues.apache.org/jira/browse/SPARK-24699)? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22071 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94571/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22071: [SPARK-25088][CORE][MESOS][DOCS] Update Rest Server docs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22071 **[Test build #94572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94572/testReport)** for PR 22071 at commit [`b4ca224`](https://github.com/apache/spark/commit/b4ca224095cb7fda6822c431465bfb7f48a4bb2d). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22063 **[Test build #94571 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94571/testReport)** for PR 22063 at commit [`a803869`](https://github.com/apache/spark/commit/a803869775250f366ef357ccf06fed397a3f1cfd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21819 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94582/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22074: [SPARK-25089][R] removing lintr checks for 2.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22074 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2058/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22074: [SPARK-25089][R] removing lintr checks for 2.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22074 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21819 **[Test build #94582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94582/testReport)** for PR 21819 at commit [`7129c3f`](https://github.com/apache/spark/commit/7129c3fbcd73d99ad80111b55625b9b5d46333a0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user squito commented on the issue: https://github.com/apache/spark/pull/21698 yeah, you'd have to sort the entire record. I think originally it didn't seem like that would work, because you don't know that `T` is sortable for `RDD[T]`. But after a sort, you've got bytes, so you can at least sort the serialized bytes of `T`. Note that you don't want to do the entire repartitioning based on that, since you won't get a good distribution for skewed data. But it can give you a deterministic order as an *additional* sort, just within one partition. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22017 **[Test build #94585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94585/testReport)** for PR 22017 at commit [`595161f`](https://github.com/apache/spark/commit/595161fefbf55711b76530a9e53aff73491febd6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22074: [R] removing lintr checks for 2.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22074 **[Test build #94584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94584/consoleFull)** for PR 22074 at commit [`b204a88`](https://github.com/apache/spark/commit/b204a88f9d0116a384682422f82f1a55be32443b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22074: removing lintr checks for 2.0
GitHub user shaneknapp opened a pull request: https://github.com/apache/spark/pull/22074 removing lintr checks for 2.0 ## What changes were proposed in this pull request? since 2.0 will be EOLed some time in the not too distant future, and we'll be moving the builds from centos to ubuntu, i think it's fine to disable R linting rather than going down the rabbit hole of trying to fix this stuff. ## How was this patch tested? the build system will test this You can merge this pull request into a Git repository by running: $ git pull https://github.com/shaneknapp/spark removing-lintr-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22074.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22074 commit b204a88f9d0116a384682422f82f1a55be32443b Author: shane knapp Date: 2018-08-10T20:30:25Z removing lintr checks for 2.0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/22017 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/21698 thinking about the sorting thing again but I don't think it works unless you sort both the keys and the values themselves. for instance lets say we have a groupby key which generates: B = k1, { (a11, a12, a13), (a21, a22, a23), (a11, a12, a13)} Then we do a distinct on it, you could get: C =((a11, a12, a13) ,(a21, a22, a23)) But say you get fetch failures and you refetch the data. Since the reducer can fetch it in any order during the shuffle, B could be B = k1, { (a21, a22, a23), (a11, a12, a13), (a11, a12, a13)} and distinct you end up with C =((a21, a22, a23), (a11, a12, a13)) If you were to just sort that C could be in a different place and then round robin the output, the above example of C would go to different partitions. I'm not sure of a way to get around this without using the hash partitioner but still looking into it. I need to look at the details about how the dataframe pr solved this as well as I'm curious how it works with above example. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22009 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94574/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22009 **[Test build #94574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94574/testReport)** for PR 22009 at commit [`f4f85a8`](https://github.com/apache/spark/commit/f4f85a833ef319a6860134e12655574aca081ed6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21980 Thanks @zsxwing --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209374683 --- Diff: python/pyspark/worker.py --- @@ -259,6 +260,26 @@ def main(infile, outfile): "PYSPARK_DRIVER_PYTHON are correctly set.") % ("%d.%d" % sys.version_info[:2], version)) +# set up memory limits +memory_limit_mb = int(os.environ.get('PYSPARK_EXECUTOR_MEMORY_MB', "-1")) +total_memory = resource.RLIMIT_AS +try: +(total_memory_limit, max_total_memory) = resource.getrlimit(total_memory) +msg = "Current mem: {0} of max {1}\n".format(total_memory_limit, max_total_memory) +print(msg, file=sys.stderr) + +if memory_limit_mb > 0 and total_memory_limit == resource.RLIM_INFINITY: +# convert to bytes +total_memory_limit = memory_limit_mb * 1024 * 1024 + +msg = "Setting mem to {0} of max {1}\n".format(total_memory_limit, max_total_memory) +print(msg, file=sys.stderr) +resource.setrlimit(total_memory, (total_memory_limit, total_memory_limit)) --- End diff -- Here the hard limit is intended to be `total_memory_limit` or it should be `max_total_memory`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22017 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22073: [R] removing lintr checks for 2.1
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22073 **[Test build #94583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94583/testReport)** for PR 22073 at commit [`f2974fb`](https://github.com/apache/spark/commit/f2974fbaf518f9e5350324ea0bf32c2fcea6f9b3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22017 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94570/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22073: [R] removing lintr checks for 2.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22073 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2057/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22073: [R] removing lintr checks for 2.1
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22073 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22017: [SPARK-23938][SQL] Add map_zip_with function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22017 **[Test build #94570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94570/testReport)** for PR 22017 at commit [`595161f`](https://github.com/apache/spark/commit/595161fefbf55711b76530a9e53aff73491febd6). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22073: removing lintr checks for 2.1
GitHub user shaneknapp opened a pull request: https://github.com/apache/spark/pull/22073 removing lintr checks for 2.1 ## What changes were proposed in this pull request? since 2.1 will be EOLed some time in the not too distant future, and we'll be moving the builds from centos to ubuntu, i think it's fine to disable R linting rather than going down the rabbit hole of trying to fix this stuff. ## How was this patch tested? the build system will test this You can merge this pull request into a Git repository by running: $ git pull https://github.com/shaneknapp/spark removing-lintr Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22073.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22073 commit f2974fbaf518f9e5350324ea0bf32c2fcea6f9b3 Author: shane knapp Date: 2018-08-10T20:12:35Z removing lintr checks for 2.1 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22062: [SPARK-25081][Core]Nested spill in ShuffleExterna...
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/22062#discussion_r209372979 --- Diff: core/src/test/scala/org/apache/spark/shuffle/sort/ShuffleExternalSorterSuite.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.shuffle.sort + +import java.lang.{Long => JLong} + +import org.mockito.Mockito.when +import org.scalatest.mockito.MockitoSugar + +import org.apache.spark._ +import org.apache.spark.executor.{ShuffleWriteMetrics, TaskMetrics} +import org.apache.spark.memory._ +import org.apache.spark.unsafe.Platform + +class ShuffleExternalSorterSuite extends SparkFunSuite with LocalSparkContext with MockitoSugar { + + test("nested spill should be no-op") { +val conf = new SparkConf() + .setMaster("local[1]") + .setAppName("ShuffleExternalSorterSuite") + .set("spark.testing", "true") + .set("spark.testing.memory", "1600") + .set("spark.memory.fraction", "1") +sc = new SparkContext(conf) + +val memoryManager = UnifiedMemoryManager(conf, 1) + +var shouldAllocate = false + +// Mock `TaskMemoryManager` to allocate free memory when `shouldAllocate` is true. +// This will trigger a nested spill and expose issues if we don't handle this case properly. +val taskMemoryManager = new TaskMemoryManager(memoryManager, 0) { + override def acquireExecutionMemory(required: Long, consumer: MemoryConsumer): Long = { +// ExecutionMemoryPool.acquireMemory will wait until there are 400 bytes for a task to use. +// So we leave 400 bytes for the task. +if (shouldAllocate && + memoryManager.maxHeapMemory - memoryManager.executionMemoryUsed > 400) { + val acquireExecutionMemoryMethod = +memoryManager.getClass.getMethods.filter(_.getName == "acquireExecutionMemory").head + acquireExecutionMemoryMethod.invoke( +memoryManager, +JLong.valueOf( + memoryManager.maxHeapMemory - memoryManager.executionMemoryUsed - 400), +JLong.valueOf(1L), // taskAttemptId +MemoryMode.ON_HEAP + ).asInstanceOf[java.lang.Long] +} +super.acquireExecutionMemory(required, consumer) + } +} +val taskContext = mock[TaskContext] --- End diff -- lol --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21980: [SPARK-25010][SQL] Rand/Randn should produce different v...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/21980 LGTM2 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaConsumer ...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/18143 @ScrapCodes sorry for the delay. I think @tdas has fixed the issue. Please close the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21634: [SPARK-24648][SQL] SqlMetrics should be threadsaf...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/21634#discussion_r209371636 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala --- @@ -504,4 +504,38 @@ class SQLMetricsSuite extends SparkFunSuite with SQLMetricsTestUtils with Shared test("writing data out metrics with dynamic partition: parquet") { testMetricsDynamicPartition("parquet", "parquet", "t1") } + + test("writing metrics from single thread") { +val nAdds = 10 +val acc = new SQLMetric("test", -10) +assert(acc.isZero()) +acc.set(0) +for (i <- 1 to nAdds) acc.add(1) +assert(!acc.isZero()) +assert(nAdds === acc.value) +acc.reset() +assert(acc.isZero()) + } + + test("writing metrics from multiple threads") { --- End diff -- > Do you mean it's a one-writer, multi-reader scene? Yes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21819 @HeartSaVioR @HyukjinKwon @jose-torres @tdas would you mind taking a look? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21819: [SPARK-24863][SS] Report Kafka offset lag as a custom me...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21819 **[Test build #94582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94582/testReport)** for PR 21819 at commit [`7129c3f`](https://github.com/apache/spark/commit/7129c3fbcd73d99ad80111b55625b9b5d46333a0). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21977: SPARK-25004: Add spark.executor.pyspark.memory limit.
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/21977 Why not using `resource.RLIMIT_RSS`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21864: [SPARK-24908][R][style] removing spaces to make lintr ha...
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21864 thanks @srowen --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21870: Branch 2.3
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21870 Close this @lovezeropython --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22070 **[Test build #4240 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4240/testReport)** for PR 22070 at commit [`9e95df2`](https://github.com/apache/spark/commit/9e95df24206bbcc51ae09bd488d72a2bcf84ee7b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21864: [SPARK-24908][R][style] removing spaces to make lintr ha...
Github user srowen commented on the issue: https://github.com/apache/spark/pull/21864 Also backported to 2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...
Github user rezasafi commented on the issue: https://github.com/apache/spark/pull/22072 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...
Github user rezasafi commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r209364194 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -137,13 +135,12 @@ case class AggregateInPandasExec( val columnarBatchIter = new ArrowPythonRunner( pyFuncs, -bufferSize, -reuseWorker, PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF, argOffsets, aggInputSchema, sessionLocalTimeZone, -pythonRunnerConf).compute(projectedRowIter, context.partitionId(), context) +pythonRunnerConf, +sparkContext.conf).compute(projectedRowIter, context.partitionId(), context) --- End diff -- The conf is accessible through org.apache.spark.SparkEnv.get.conf on the executors AFAIK --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 green! https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/8/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21537 Thanks @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 alright, builds are now passing. it failed the last one on the junit publish, and since we're not running java/scala unittests, i have since removed that block. should be green in ~20. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21537 LGTM apart from this [comment](https://github.com/apache/spark/pull/21537#discussion_r209353035) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22067 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22067 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94568/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22067 **[Test build #94568 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94568/testReport)** for PR 22067 at commit [`b799e92`](https://github.com/apache/spark/commit/b799e925cbd1b859204491eace7e64142b75727e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22011 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94565/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22011 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22011: [SPARK-24822][PySpark] Python support for barrier execut...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22011 **[Test build #94565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94565/testReport)** for PR 22011 at commit [`ea2330b`](https://github.com/apache/spark/commit/ea2330baa61e427665ba824c3c42d1e4ec1a7934). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22072 **[Test build #94581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94581/testReport)** for PR 22072 at commit [`1a6452e`](https://github.com/apache/spark/commit/1a6452ef0939c09c09801cff78b0214d7979bf6d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2056/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22072 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22037 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22072: [SPARK-25081][Core]Nested spill in ShuffleExternalSorter...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22072 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2055/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22001 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94566/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22001 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22001 **[Test build #94566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94566/testReport)** for PR 22001 at commit [`8de1a4b`](https://github.com/apache/spark/commit/8de1a4b0523bc459f66973cd92b7648e2609a002). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22072: [SPARK-25081][Core]Nested spill in ShuffleExterna...
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/22072 [SPARK-25081][Core]Nested spill in ShuffleExternalSorter should not access released memory page (branch-2.2) ## What changes were proposed in this pull request? Backport https://github.com/apache/spark/pull/22062 to branch-2.2. ## How was this patch tested? Jenkins You can merge this pull request into a Git repository by running: $ git pull https://github.com/zsxwing/spark SPARK-25081-2.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22072.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22072 commit 1a6452ef0939c09c09801cff78b0214d7979bf6d Author: Shixiong Zhu Date: 2018-08-10T17:53:44Z Nested spill in ShuffleExternalSorter should not access released memory page This issue is pretty similar to [SPARK-21907](https://issues.apache.org/jira/browse/SPARK-21907). "allocateArray" in [ShuffleInMemorySorter.reset](https://github.com/apache/spark/blob/9b8521e53e56a53b44c02366a99f8a8ee1307bbf/core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java#L99) may trigger a spill and cause ShuffleInMemorySorter access the released `array`. Another task may get the same memory page from the pool. This will cause two tasks access the same memory page. When a task reads memory written by another task, many types of failures may happen. Here are some examples I have seen: - JVM crash. (This is easy to reproduce in a unit test as we fill newly allocated and deallocated memory with 0xa5 and 0x5a bytes which usually points to an invalid memory address) - java.lang.IllegalArgumentException: Comparison method violates its general contract! - java.lang.NullPointerException at org.apache.spark.memory.TaskMemoryManager.getPage(TaskMemoryManager.java:384) - java.lang.UnsupportedOperationException: Cannot grow BufferHolder by size -536870912 because the size after growing exceeds size limitation 2147483632 This PR resets states in `ShuffleInMemorySorter.reset` before calling `allocateArray` to fix the issue. The new unit test will make JVM crash without the fix. Closes #22062 from zsxwing/SPARK-25081. Authored-by: Shixiong Zhu Signed-off-by: Shixiong Zhu --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22037: [SPARK-24774][SQL] Avro: Support logical decimal type
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22037 **[Test build #94579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94579/testReport)** for PR 22037 at commit [`0384a1a`](https://github.com/apache/spark/commit/0384a1af69573af317f9e644bcf04e12bf38f1f3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22007 If you find this is unrelated, you could trigger another test here --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22007 **[Test build #94580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94580/testReport)** for PR 22007 at commit [`316b9ad`](https://github.com/apache/spark/commit/316b9adc3be3e2d12ce5c092421901929d5455d4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22007 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22068 Yea, if there are more instances found, we better fix them together while we are here. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 `./build/mvn -DskipTests -Phadoop2.6 -Pyarn -Phive -Phive-thriftserver clean package` FTW. (i do know that the -Phadoop2.6 is superfluous, but at this point...) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22069 Yea if there are more instances found, we better fix them together --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 ok think i got it... :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r209353035 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" --- End diff -- maybe a `JavaCode.expression` then for this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22010: [SPARK-21436][CORE] Take advantage of known parti...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/22010#discussion_r209351478 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -396,7 +396,16 @@ abstract class RDD[T: ClassTag]( * Return a new RDD containing the distinct elements in this RDD. */ def distinct(numPartitions: Int)(implicit ord: Ordering[T] = null): RDD[T] = withScope { -map(x => (x, null)).reduceByKey((x, y) => x, numPartitions).map(_._1) +// If the data is already approriately partitioned with a known partitioner we can work locally. +def removeDuplicatesInPartition(itr: Iterator[T]): Iterator[T] = { + val set = new mutable.HashSet[T]() + itr.filter(set.add(_)) --- End diff -- not a big deal, but despite this is really compact and elegant, it adds to the set also the elements which are already there and it is not needed. We can probably check if the key is there and add it only in that case, probably it is a bit faster. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22069 **[Test build #4239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4239/testReport)** for PR 22069 at commit [`8520df8`](https://github.com/apache/spark/commit/8520df899a3364f2bb41d4155d2bed9e68772a07). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 making build config changes... forgot to add `-Pyarn` to the build target, as well as making sure the correct python env is selected before running python tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22066 > @viirya is that effort going on? I can help with the work if you want. Thanks. @mgaido91 Yeah, I'm still working on it. One of the PRs #21537 is still waiting for review. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22001 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22001 **[Test build #94578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94578/testReport)** for PR 22001 at commit [`458c78f`](https://github.com/apache/spark/commit/458c78fb076f642f5eee24a7a0911f3822254084). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22001 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94578/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16677: [SPARK-19355][SQL] Use map output statistics to improve ...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16677 Thank you! @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21939: [SPARK-23874][SQL][PYTHON] Upgrade Apache Arrow to 0.10....
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/21939 there's a bunch of stuff in the unittest logs that i could use some extra eyes on: https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/3/artifact/target/unit-tests.log https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.6-python-3.5-arrow-0.10.0-ubuntu-testing/3/artifact/work/app-20180810110830-/0/target/unit-tests.log --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}
Github user Fokko commented on the issue: https://github.com/apache/spark/pull/22007 I don't really understand the error, can a Spark expert elaborate what's going on here? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22019: [WIP][SPARK-25040][SQL] Empty string for double and floa...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/22019 SGTM too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22001 **[Test build #94578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94578/testReport)** for PR 22001 at commit [`458c78f`](https://github.com/apache/spark/commit/458c78fb076f642f5eee24a7a0911f3822254084). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22001 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22001 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2054/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22066 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94564/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22066 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22066: [SPARK-25084][SQL] "distribute by" on multiple columns (...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22066 **[Test build #94564 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94564/testReport)** for PR 22066 at commit [`5da6d4d`](https://github.com/apache/spark/commit/5da6d4d437c6eb2bdf7a64c031b7c9281a5a8b83). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22010: [SPARK-21436][CORE] Take advantage of known partitioner ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22010 **[Test build #94577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94577/testReport)** for PR 22010 at commit [`5fd3659`](https://github.com/apache/spark/commit/5fd36592a26b07fdb58e79e4efbb6b70daea54df). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org