[GitHub] spark pull request: [SPARK-4959] [SQL] Attributes are case sensiti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3796#issuecomment-68092701 [Test build #24812 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24812/consoleFull) for PR 3796 at commit [`62a7a10`](https://github.com/apache/spark/commit/62a7a10b445aaf4963692fb9df3e885aaebe6051). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4959] [SQL] Attributes are case sensiti...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3796#issuecomment-68092702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24812/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3797#issuecomment-68092844 @JoshRosen I am sorry to forget describe this patch. I have created a jira for it, can you take a look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-68093197 Hi @koeninger , several simple questions: 1. How to map each RDD partition to Kafka partition, each Kafka partition is a RDD partition? 2. How to do receiver injection rate control, in other words, how to decide at which offset current task should read? 3. Do you have any consideration of fault tolerance? In general it is quite similar to what I did long ago a Kafka InputFormat (https://github.com/jerryshao/kafka-input-format) which can be loaded by HadoopRDD. I'm not sure is this the streaming way of fixing the exact-once semantics? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3797#issuecomment-68093199 I can take a look at this later this week. It would probably be a good idea for someone more familiar with the YARN code to take a look, too, since they might also be able to suggest how/whether tests could have prevented the underlying bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM
GitHub user XuTingjun opened a pull request: https://github.com/apache/spark/pull/3799 [SPARK-1507][YARN]specify num of cores for AM I add some configurations below. spark.yarn.am.cores/SPARK_MASTER_CORES/SPARK_DRIVER_CORES for yarn-client mode; spark.driver.cores for yarn-cluster mode. You can merge this pull request into a Git repository by running: $ git pull https://github.com/XuTingjun/spark SPARK1507 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3799.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3799 commit c35a515982fb3235ea648806ff392b13759322fc Author: wangfei wangf...@huawei.com Date: 2014-12-25T08:09:33Z specify AM core in yarn-client and yarn-cluster mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM
Github user XuTingjun commented on the pull request: https://github.com/apache/spark/pull/3686#issuecomment-68093593 Hi all, I accidently delete my repository, so I create a new patch #3799 for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3799#issuecomment-68093631 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SAPRK-4967] [SQL] File name with comma will c...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/3800 [SAPRK-4967] [SQL] File name with comma will cause exception for SQLContext.parquetFile This is a workaround solution to support the `,` in the parquet file name, however, we need to update the interface to support multiple parquet files as input for the API `SQLContext.parquetFile`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark spark_4967 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3800.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3800 commit 4c024a26b91c8ba39ce60b5486cf3d210b7b69bb Author: Cheng Hao hao.ch...@intel.com Date: 2014-12-25T08:04:30Z Support comma in the parquet file path --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SAPRK-4967] [SQL] File name with comma will c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3800#issuecomment-68094002 [Test build #24814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24814/consoleFull) for PR 3800 at commit [`4c024a2`](https://github.com/apache/spark/commit/4c024a26b91c8ba39ce60b5486cf3d210b7b69bb). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Remove many uses of Thread.sleep() from ...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3687#issuecomment-68094079 Just realized that my last comment was a bit confusing, since SPARK-1600 is not related to the FileInputStream ManualClock fix. I'll file a new improvement JIRA to cover replacing our uses of SystemClock in tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] Remove many uses of Thread.sleep() from ...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3687#discussion_r22269847 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala --- @@ -281,34 +278,45 @@ class CheckpointSuite extends TestSuiteBase { // failure, are re-processed or not. test(recovery with file input stream) { // Set up the streaming context and input streams +val batchDuration = Seconds(2) // Due to 1-second resolution of setLastModified() on some OS's. val testDir = Utils.createTempDir() -var ssc = new StreamingContext(master, framework, Seconds(1)) +var ssc = new StreamingContext(conf, batchDuration) --- End diff -- This happens to be in a `try-finally` block, so I think that proper cleanup happens, but I'll replace it with `withStreamingContext` just to be consistent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/3795#issuecomment-68095076 +1 LGTM. I remember this came up at least once, so good to guard against it directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3795#issuecomment-68095313 [Test build #24813 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24813/consoleFull) for PR 3795 at commit [`09d837f`](https://github.com/apache/spark/commit/09d837f0f933a43cd7e2e1b8d2befec0f6516e6b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3795#issuecomment-68095317 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24813/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SAPRK-4967] [SQL] File name with comma will c...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3800#issuecomment-68095357 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24814/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SAPRK-4967] [SQL] File name with comma will c...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3800#issuecomment-68095355 [Test build #24814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24814/consoleFull) for PR 3800 at commit [`4c024a2`](https://github.com/apache/spark/commit/4c024a26b91c8ba39ce60b5486cf3d210b7b69bb). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68095691 [Test build #24815 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24815/consoleFull) for PR 3784 at commit [`00aaa63`](https://github.com/apache/spark/commit/00aaa63b0f05a3fab4211b5037714729128fcc6c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68095891 [Test build #24816 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24816/consoleFull) for PR 3784 at commit [`f4d9b8f`](https://github.com/apache/spark/commit/f4d9b8fa141e8f571e32f3a660026fe1ff907971). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2458] Make failed application log visib...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3467#issuecomment-68096064 [Test build #24818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24818/consoleFull) for PR 3467 at commit [`f9ef854`](https://github.com/apache/spark/commit/f9ef8547228d4c56afb9fc6f43431a458a2325ca). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68096063 [Test build #24817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24817/consoleFull) for PR 3784 at commit [`601d5f6`](https://github.com/apache/spark/commit/601d5f62f00905b971f18bd6c79630a3c604b354). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2458] Make failed application log visib...
Github user tsudukim commented on a diff in the pull request: https://github.com/apache/spark/pull/3467#discussion_r22270614 --- Diff: core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala --- @@ -180,14 +176,15 @@ private[history] class FsHistoryProvider(conf: SparkConf) extends ApplicationHis appListener.startTime.getOrElse(-1L), appListener.endTime.getOrElse(-1L), getModificationTime(dir), - appListener.sparkUser.getOrElse(NOT_STARTED))) + appListener.sparkUser.getOrElse(NOT_STARTED), + !fs.isFile(new Path(dir.getPath(), EventLoggingListener.APPLICATION_COMPLETE } catch { case e: Exception = logInfo(sFailed to load application log data from $dir., e) None } } -.sortBy { info = -info.endTime } +.sortBy { info = (-info.endTime, -info.startTime) } --- End diff -- About completed applications, they are sorted by endTime because they have proper endTime (almost unique). About incomplete applications, they are sorted by startTime because they have the same invalid endTime(-1). As the first order is not effective, the second order is effective. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-68096991 [Test build #24819 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24819/consoleFull) for PR 2405 at commit [`a2057e7`](https://github.com/apache/spark/commit/a2057e713d8d4fb3cb2821262bc712d6c58b4024). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/3801 [SPARK-1600] Refactor FileInputStream tests to remove Thread.sleep() calls and SystemClock usage This PR refactors Spark Streaming's FileInputStream tests to remove uses of Thread.sleep() and SystemClock, which should hopefully resolve some longstanding flakiness in these tests (see SPARK-1600). Key changes: - Modify FileInputDStream to use the scheduler's Clock instead of System.currentTimeMillis(); this allows it to be tested using ManualClock. - Fix a synchronization issue in ManualClock's `currentTime` method. - Add a StreamingTestWaiter class which allows callers to block until a certain number of batches have finished. - Change the FileInputStream tests so that files' modification times are manually set based off of ManualClock; this eliminates many Thread.sleep calls. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark SPARK-1600 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3801.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3801 commit a95ddc41f2b10b57fa18e75c865d7ef4507cd771 Author: Josh Rosen joshro...@databricks.com Date: 2014-12-16T08:50:01Z Modify FileInputDStream to use Clock class. commit 3c3efc3f75521020f482d56b41465a6373448cf5 Author: Josh Rosen joshro...@databricks.com Date: 2014-12-17T01:40:35Z Synchronize `currentTime` in ManualClock commit dda1403f3eaabe9125b87ac45ac3e7b0d667e9de Author: Josh Rosen joshro...@databricks.com Date: 2014-12-25T09:03:00Z Add StreamingTestWaiter class. commit d4f2d87729b20f1060d456a6074f2da6a4e79cb3 Author: Josh Rosen joshro...@databricks.com Date: 2014-12-25T09:03:54Z Refactor file input stream tests to not rely on SystemClock. commit c8f06b10431c555dab3be461622d5d96aa807685 Author: Josh Rosen joshro...@databricks.com Date: 2014-12-25T10:14:06Z Remove Thread.sleep calls in FileInputStream CheckpointSuite test. Hopefully this will fix SPARK-1600. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3801#issuecomment-68097807 These changes are split off from #3687, a larger PR of mine which tried to remove all uses of Thread.sleep() in the streaming tests. It may look like there are a lot of changes here, but most of that is due to indentation changes when I modified tests to use the `withStreamingContext` fixture. /cc @tdas for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3801#discussion_r22271205 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala --- @@ -281,102 +279,130 @@ class CheckpointSuite extends TestSuiteBase { // failure, are re-processed or not. test(recovery with file input stream) { // Set up the streaming context and input streams +val batchDuration = Seconds(2) // Due to 1-second resolution of setLastModified() on some OS's. val testDir = Utils.createTempDir() -var ssc = new StreamingContext(master, framework, Seconds(1)) -ssc.checkpoint(checkpointDir) -val fileStream = ssc.textFileStream(testDir.toString) -// Making value 3 take large time to process, to ensure that the master -// shuts down in the middle of processing the 3rd batch -val mappedStream = fileStream.map(s = { - val i = s.toInt - if (i == 3) Thread.sleep(2000) - i -}) - -// Reducing over a large window to ensure that recovery from master failure -// requires reprocessing of all the files seen before the failure -val reducedStream = mappedStream.reduceByWindow(_ + _, Seconds(30), Seconds(1)) -val outputBuffer = new ArrayBuffer[Seq[Int]] -var outputStream = new TestOutputStream(reducedStream, outputBuffer) -outputStream.register() -ssc.start() - -// Create files and advance manual clock to process them -// var clock = ssc.scheduler.clock.asInstanceOf[ManualClock] -Thread.sleep(1000) -for (i - Seq(1, 2, 3)) { - Files.write(i + \n, new File(testDir, i.toString), Charset.forName(UTF-8)) - // wait to make sure that the file is written such that it gets shown in the file listings - Thread.sleep(1000) +val outputBuffer = new ArrayBuffer[Seq[Int]] with SynchronizedBuffer[Seq[Int]] + +def writeFile(i: Int, clock: ManualClock): Unit = { --- End diff -- I factored some of the common code out here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3801#issuecomment-68097905 [Test build #24820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24820/consoleFull) for PR 3801 at commit [`c8f06b1`](https://github.com/apache/spark/commit/c8f06b10431c555dab3be461622d5d96aa807685). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3801#discussion_r22271345 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala --- @@ -281,102 +279,130 @@ class CheckpointSuite extends TestSuiteBase { // failure, are re-processed or not. test(recovery with file input stream) { // Set up the streaming context and input streams +val batchDuration = Seconds(2) // Due to 1-second resolution of setLastModified() on some OS's. val testDir = Utils.createTempDir() -var ssc = new StreamingContext(master, framework, Seconds(1)) -ssc.checkpoint(checkpointDir) -val fileStream = ssc.textFileStream(testDir.toString) -// Making value 3 take large time to process, to ensure that the master -// shuts down in the middle of processing the 3rd batch -val mappedStream = fileStream.map(s = { - val i = s.toInt - if (i == 3) Thread.sleep(2000) - i -}) - -// Reducing over a large window to ensure that recovery from master failure -// requires reprocessing of all the files seen before the failure -val reducedStream = mappedStream.reduceByWindow(_ + _, Seconds(30), Seconds(1)) -val outputBuffer = new ArrayBuffer[Seq[Int]] -var outputStream = new TestOutputStream(reducedStream, outputBuffer) -outputStream.register() -ssc.start() - -// Create files and advance manual clock to process them -// var clock = ssc.scheduler.clock.asInstanceOf[ManualClock] -Thread.sleep(1000) -for (i - Seq(1, 2, 3)) { - Files.write(i + \n, new File(testDir, i.toString), Charset.forName(UTF-8)) - // wait to make sure that the file is written such that it gets shown in the file listings - Thread.sleep(1000) +val outputBuffer = new ArrayBuffer[Seq[Int]] with SynchronizedBuffer[Seq[Int]] + +def writeFile(i: Int, clock: ManualClock): Unit = { + val file = new File(testDir, i.toString) + Files.write(i + \n, file, Charsets.UTF_8) + assert(file.setLastModified(clock.currentTime())) + // Check that the file's modification date is actually the value we wrote, since rounding or + // truncation will break the test: + assert(file.lastModified() === clock.currentTime()) } -logInfo(Output = + outputStream.output.mkString(,)) -assert(outputStream.output.size 0, No files processed before restart) -ssc.stop() -// Verify whether files created have been recorded correctly or not -var fileInputDStream = ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]] -def recordedFiles = fileInputDStream.batchTimeToSelectedFiles.values.flatten -assert(!recordedFiles.filter(_.endsWith(1)).isEmpty) -assert(!recordedFiles.filter(_.endsWith(2)).isEmpty) -assert(!recordedFiles.filter(_.endsWith(3)).isEmpty) - -// Create files while the master is down -for (i - Seq(4, 5, 6)) { - Files.write(i + \n, new File(testDir, i.toString), Charset.forName(UTF-8)) - Thread.sleep(1000) +def recordedFiles(ssc: StreamingContext): Seq[Int] = { + val fileInputDStream = +ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]] + val filenames = fileInputDStream.batchTimeToSelectedFiles.values.flatten + filenames.map(_.split(File.separator).last.toInt).toSeq.sorted } -// Recover context from checkpoint file and verify whether the files that were -// recorded before failure were saved and successfully recovered -logInfo(*** RESTARTING ) -ssc = new StreamingContext(checkpointDir) -fileInputDStream = ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]] -assert(!recordedFiles.filter(_.endsWith(1)).isEmpty) -assert(!recordedFiles.filter(_.endsWith(2)).isEmpty) -assert(!recordedFiles.filter(_.endsWith(3)).isEmpty) +try { + // This is a var because it's re-assigned when we restart from a checkpoint: + var clock: ManualClock = null + withStreamingContext(new StreamingContext(conf, batchDuration)) { ssc = +ssc.checkpoint(checkpointDir) +clock = ssc.scheduler.clock.asInstanceOf[ManualClock] +val waiter = new StreamingTestWaiter(ssc) +val fileStream = ssc.textFileStream(testDir.toString) +// MKW value 3 take a large time to process, to ensure that the driver +// shuts down in the middle of processing the 3rd batch +val mappedStream = fileStream.map(s = { + val i = s.toInt + if (i == 3) Thread.sleep(4000) + i +
[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3801#discussion_r22271355 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala --- @@ -281,102 +279,130 @@ class CheckpointSuite extends TestSuiteBase { // failure, are re-processed or not. test(recovery with file input stream) { // Set up the streaming context and input streams +val batchDuration = Seconds(2) // Due to 1-second resolution of setLastModified() on some OS's. val testDir = Utils.createTempDir() -var ssc = new StreamingContext(master, framework, Seconds(1)) -ssc.checkpoint(checkpointDir) -val fileStream = ssc.textFileStream(testDir.toString) -// Making value 3 take large time to process, to ensure that the master -// shuts down in the middle of processing the 3rd batch -val mappedStream = fileStream.map(s = { - val i = s.toInt - if (i == 3) Thread.sleep(2000) - i -}) - -// Reducing over a large window to ensure that recovery from master failure -// requires reprocessing of all the files seen before the failure -val reducedStream = mappedStream.reduceByWindow(_ + _, Seconds(30), Seconds(1)) -val outputBuffer = new ArrayBuffer[Seq[Int]] -var outputStream = new TestOutputStream(reducedStream, outputBuffer) -outputStream.register() -ssc.start() - -// Create files and advance manual clock to process them -// var clock = ssc.scheduler.clock.asInstanceOf[ManualClock] -Thread.sleep(1000) -for (i - Seq(1, 2, 3)) { - Files.write(i + \n, new File(testDir, i.toString), Charset.forName(UTF-8)) - // wait to make sure that the file is written such that it gets shown in the file listings - Thread.sleep(1000) +val outputBuffer = new ArrayBuffer[Seq[Int]] with SynchronizedBuffer[Seq[Int]] + +def writeFile(i: Int, clock: ManualClock): Unit = { + val file = new File(testDir, i.toString) + Files.write(i + \n, file, Charsets.UTF_8) + assert(file.setLastModified(clock.currentTime())) + // Check that the file's modification date is actually the value we wrote, since rounding or + // truncation will break the test: + assert(file.lastModified() === clock.currentTime()) } -logInfo(Output = + outputStream.output.mkString(,)) -assert(outputStream.output.size 0, No files processed before restart) -ssc.stop() -// Verify whether files created have been recorded correctly or not -var fileInputDStream = ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]] -def recordedFiles = fileInputDStream.batchTimeToSelectedFiles.values.flatten -assert(!recordedFiles.filter(_.endsWith(1)).isEmpty) -assert(!recordedFiles.filter(_.endsWith(2)).isEmpty) -assert(!recordedFiles.filter(_.endsWith(3)).isEmpty) - -// Create files while the master is down -for (i - Seq(4, 5, 6)) { - Files.write(i + \n, new File(testDir, i.toString), Charset.forName(UTF-8)) - Thread.sleep(1000) +def recordedFiles(ssc: StreamingContext): Seq[Int] = { + val fileInputDStream = +ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]] + val filenames = fileInputDStream.batchTimeToSelectedFiles.values.flatten + filenames.map(_.split(File.separator).last.toInt).toSeq.sorted } -// Recover context from checkpoint file and verify whether the files that were -// recorded before failure were saved and successfully recovered -logInfo(*** RESTARTING ) -ssc = new StreamingContext(checkpointDir) -fileInputDStream = ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]] -assert(!recordedFiles.filter(_.endsWith(1)).isEmpty) -assert(!recordedFiles.filter(_.endsWith(2)).isEmpty) -assert(!recordedFiles.filter(_.endsWith(3)).isEmpty) +try { + // This is a var because it's re-assigned when we restart from a checkpoint: + var clock: ManualClock = null + withStreamingContext(new StreamingContext(conf, batchDuration)) { ssc = +ssc.checkpoint(checkpointDir) +clock = ssc.scheduler.clock.asInstanceOf[ManualClock] +val waiter = new StreamingTestWaiter(ssc) +val fileStream = ssc.textFileStream(testDir.toString) +// MKW value 3 take a large time to process, to ensure that the driver +// shuts down in the middle of processing the 3rd batch +val mappedStream = fileStream.map(s = { + val i = s.toInt + if (i == 3) Thread.sleep(4000) + i +
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68098294 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24815/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68098292 [Test build #24815 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24815/consoleFull) for PR 3784 at commit [`00aaa63`](https://github.com/apache/spark/commit/00aaa63b0f05a3fab4211b5037714729128fcc6c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2458] Make failed application log visib...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3467#issuecomment-68098394 [Test build #24818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24818/consoleFull) for PR 3467 at commit [`f9ef854`](https://github.com/apache/spark/commit/f9ef8547228d4c56afb9fc6f43431a458a2325ca). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2458] Make failed application log visib...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3467#issuecomment-68098398 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24818/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68098518 [Test build #24816 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24816/consoleFull) for PR 3784 at commit [`f4d9b8f`](https://github.com/apache/spark/commit/f4d9b8fa141e8f571e32f3a660026fe1ff907971). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68098521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24816/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68098625 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24817/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68098622 [Test build #24817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24817/consoleFull) for PR 3784 at commit [`601d5f6`](https://github.com/apache/spark/commit/601d5f62f00905b971f18bd6c79630a3c604b354). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-68099780 [Test build #24819 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24819/consoleFull) for PR 2405 at commit [`a2057e7`](https://github.com/apache/spark/commit/a2057e713d8d4fb3cb2821262bc712d6c58b4024). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class UnresolvedGetField(child: Expression, fieldName: String) extends UnaryExpression ` * `case class StructGetField(child: Expression, field: StructField, ordinal: Int) extends GetField ` * `case class ArrayGetField(child: Expression, field: StructField, ordinal: Int, containsNull: Boolean)` * `trait GetField extends UnaryExpression ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2405#issuecomment-68099783 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24819/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3801#issuecomment-68100834 [Test build #24820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24820/consoleFull) for PR 3801 at commit [`c8f06b1`](https://github.com/apache/spark/commit/c8f06b10431c555dab3be461622d5d96aa807685). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3801#issuecomment-68100836 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24820/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4813][Streaming] Fix the issue that Con...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/3661#discussion_r22272234 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/ContextWaiter.scala --- @@ -17,30 +17,63 @@ package org.apache.spark.streaming +import java.util.concurrent.TimeUnit +import java.util.concurrent.locks.ReentrantLock + private[streaming] class ContextWaiter { + + private val lock = new ReentrantLock() + private val condition = lock.newCondition() + + // Guarded by lock private var error: Throwable = null - private var stopped: Boolean = false - def notifyError(e: Throwable) = synchronized { -error = e -notifyAll() - } + // Guarded by lock + private var stopped: Boolean = false - def notifyStop() = synchronized { -stopped = true -notifyAll() + def notifyError(e: Throwable): Unit = { +lock.lock() +try { + error = e + condition.signalAll() +} finally { + lock.unlock() +} } - def waitForStopOrError(timeout: Long = -1) = synchronized { -// If already had error, then throw it -if (error != null) { - throw error + def notifyStop(): Unit = { +lock.lock() +try { + stopped = true + condition.signalAll() +} finally { + lock.unlock() } + } -// If not already stopped, then wait -if (!stopped) { - if (timeout 0) wait() else wait(timeout) + /** + * Return `true` if it's stopped; or throw the reported error if `notifyError` has been called; or + * `false` if the waiting time detectably elapsed before return from the method. + */ + def waitForStopOrError(timeout: Long = -1): Boolean = { --- End diff -- How about making `awaitTermination` throw a TimeoutException if timeout? It looks a better API. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/3464#issuecomment-68101164 ping @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...
Github user lianhuiwang commented on the pull request: https://github.com/apache/spark/pull/3797#issuecomment-68103677 @XuTingjun yes, i agree with you. we should let parseArgs before using config amMemory and executorMemory. because parseArgs can change these value from args. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68105414 Hi @liancheng, i admit my PR is more complicated, but this only cover three cases, i think we'd better adding a separate rule to optimize And/Or in sql for as many as possible cases, not mix several cases in BooleanSimplification. So I am refactorying my PR to make it more readable and clean. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/3795#issuecomment-68105479 To note, my suggested solution would look more like this in HashPartitioner: ```scala def getPartition(key: Any): Int = key match { case null = 0 case enum: Enum[_] = Utils.nonNegativeMod(enum.ordinal(), numPartitions) case _ = Utils.nonNegativeMod(key.hashCode, numPartitions) } ``` This does not require reflection and Java is fast at doing instanceof checks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3779#issuecomment-68105490 @davies can you add a unit test that fails in the old code but works with your code? It would be helpful to more clearly document the exact bug. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3787#issuecomment-68105645 This looks good - thanks @sarutak and @srowen! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3787 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4723] [CORE] To abort the stages which ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3786#issuecomment-68106101 I agree with mark on this. We should try to identify root causes always as a first step. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68106662 [Test build #24821 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24821/consoleFull) for PR 3784 at commit [`caca560`](https://github.com/apache/spark/commit/caca56024026d8211cf55eba2da95279a6b000bd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22274369 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] { } } +object ConditionSimplification extends Rule[LogicalPlan] { + import BinaryComparison.LiteralComparison + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan = q transformExpressionsDown { + case origin: CombinePredicate = +origin.toOptimized +} + } + + type SplitFragments = Map[Expression, Option[Expression]] + + implicit class CombinePredicateExtension(source: CombinePredicate) { +def find(goal: Expression): Boolean = { + def delegate(child: Expression): Boolean = (child, goal) match { +case (combine: CombinePredicate, _) = + isSameCombinePredicate(source, combine) combine.find(goal) + + // if left child is a literal + // LiteralComparison's unapply method change the literal and attribute position +case (LiteralComparison(childComparison), LiteralComparison(goalComparison)) = + isSame(childComparison, goalComparison) + +case other = + isSame(child, goal) + } + + // using method to avoid right side compute if left side is true + val leftResult = () = delegate(source.left) + val rightResult = () = delegate(source.right) + leftResult() || rightResult() +} + +@inline +def isOrPredicate: Boolean = { + source.isInstanceOf[Or] +} + +// create a new combine predicate that has the same combine operator as this +@inline +def build(left: Expression, right: Expression): CombinePredicate = { + CombinePredicate(left, right, isOrPredicate) +} + +// swap left child and right child +@inline +def swap: CombinePredicate = { + source.build(source.right, source.left) +} + +def toOptimized: Expression = source match { + // one CombinePredicate, left equals right , drop right, keep left + // examples: a a = a, a || a = a + case CombinePredicate(left, right) if left.fastEquals(right) = +left + + // one CombinePredicate and left and right are both binary comparison + // examples: a 2 a 2 = false + case origin @ CombinePredicate(LiteralComparison(left), LiteralComparison(right)) = +// left or right maybe change its child position, so rebuild one +val changed = origin.build(left, right) +val optimized = changed.mergeComparison +if (isSame(changed, optimized)) { + origin +} else { + optimized +} + + case origin @ CombinePredicate(left @ CombinePredicate(ll, lr), right) +if isNotCombinePredicate(ll, lr, right) = +val leftOptimized = left.toOptimized +if (isSame(left, leftOptimized)) { + if (isSame(ll, right) || isSame(lr, right)) { +if (isSameCombinePredicate(origin, left)) leftOptimized else right + } else { +val llRight = origin.build(ll, right) +val lrRight = origin.build(lr, right) +val llRightOptimized = llRight.toOptimized +val lrRightOptimized = lrRight.toOptimized +if (isSame(llRight, llRightOptimized) isSame(lrRight, lrRightOptimized)) { + origin +} else if ((isNotCombinePredicate(llRightOptimized, lrRightOptimized)) + || isSameCombinePredicate(origin, left)) { + left.build(llRightOptimized, lrRightOptimized).toOptimized +} else if (llRightOptimized.isLiteral || lrRightOptimized.isLiteral) { + left.build(llRightOptimized, lrRightOptimized) +} else { + origin +} + } +} else if (isNotCombinePredicate(leftOptimized)) { + origin.build(leftOptimized, right).toOptimized +} else { + origin +} + + case origin @ CombinePredicate(left, right @ CombinePredicate(left2, right2)) +if isNotCombinePredicate(left, left2, right2) = +val changed = origin.swap +val optimized = changed.toOptimized +if (isSame(changed, optimized)) { + origin +} else { + optimized +} + + // do optimize like : (a || b || c) a = a, here a, b , c is a condition + case origin @ CombinePredicate(left:
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68108140 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24821/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68108135 [Test build #24821 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24821/consoleFull) for PR 3784 at commit [`caca560`](https://github.com/apache/spark/commit/caca56024026d8211cf55eba2da95279a6b000bd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update README.md
GitHub user dennyglee opened a pull request: https://github.com/apache/spark/pull/3802 Update README.md Corrected link to the Building Spark with Maven page from its original (http://spark.apache.org/docs/latest/building-with-maven.html) to the current page (http://spark.apache.org/docs/latest/building-spark.html) You can merge this pull request into a Git repository by running: $ git pull https://github.com/dennyglee/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3802.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3802 commit 15f601a5ac754c5429153a87f0b89743f6ad48d5 Author: Denny Lee denny.g@gmail.com Date: 2014-12-25T17:22:43Z Update README.md Corrected link to the Building Spark with Maven page from its original (http://spark.apache.org/docs/latest/building-with-maven.html) to the current page (http://spark.apache.org/docs/latest/building-spark.html) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update README.md
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3802#issuecomment-68108716 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...
GitHub user freeman-lab opened a pull request: https://github.com/apache/spark/pull/3803 [SPARK-4969] [STREAMING] [PYTHON] Add binaryRecords to streaming In Spark 1.2 we added a `binaryRecords` input method for loading flat binary data. This format is useful for numerical array data, e.g. in scientific computing applications. This PR adds support for the same format in Streaming applications, where it is similarly useful, especially for streaming time series or sensor data. Summary of additions - adding `binaryRecordsStream` to Spark Streaming - exposing `binaryRecordsStream` in the new PySpark Streaming - new unit tests in Scala and Python This required adding an optional Hadoop configuration param to `fileStream` and `FileInputStream`, but was otherwise straightforward. @tdas @davies You can merge this pull request into a Git repository by running: $ git pull https://github.com/freeman-lab/spark streaming-binary-records Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3803.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3803 commit 8550c2619aba22b40dc109171b395522ccfaaf08 Author: freeman the.freeman@gmail.com Date: 2014-12-25T09:31:31Z Expose additional argument combination commit ecef0eb8d4bf30627e5b35c40c2f4204e1670390 Author: freeman the.freeman@gmail.com Date: 2014-12-25T09:34:49Z Add binaryRecordsStream to python commit fe4e803f8810c19aac02e7c8927af1d08b2f0a94 Author: freeman the.freeman@gmail.com Date: 2014-12-25T09:35:12Z Add binaryRecordStream to Java API commit 36cb0fd576abb20b9c3210774ec9ff0471e2cf48 Author: freeman the.freeman@gmail.com Date: 2014-12-25T09:35:41Z Add binaryRecordsStream to scala commit 23dd69f318aedbf12cab10380a50d94ce8c3ca92 Author: freeman the.freeman@gmail.com Date: 2014-12-25T09:35:52Z Tests for binaryRecordsStream commit 9398bcb615c6cbf033b796c0837c99aba83303b4 Author: freeman the.freeman@gmail.com Date: 2014-12-25T09:40:06Z Expose optional hadoop configuration commit 28bff9bab7be7c2f614a011f6b68e2103234c1df Author: freeman the.freeman@gmail.com Date: 2014-12-25T10:02:42Z Fix missing arg commit 8b70fbcf785074c7cde873cf10e8d5f0ea9e3979 Author: freeman the.freeman@gmail.com Date: 2014-12-25T10:03:01Z Reorganization commit 2843e9de60f23bbce3ac185c09b8575a7513fe0d Author: freeman the.freeman@gmail.com Date: 2014-12-25T17:43:20Z Add params to docstring commit 94d90d0fbc576c4e475bb0a053e6c35d53152cf4 Author: freeman the.freeman@gmail.com Date: 2014-12-25T17:44:09Z Spelling commit 1c739aa67a006a62a6ee8f294ff60568f9031476 Author: freeman the.freeman@gmail.com Date: 2014-12-25T17:48:04Z Simpler default arg handling commit 029d49c143c7bed603db3ca43b44d212de516df8 Author: freeman the.freeman@gmail.com Date: 2014-12-25T17:50:42Z Formatting commit a4324a38f8155f6b3e776326925af61f16a2fdfb Author: freeman the.freeman@gmail.com Date: 2014-12-25T17:56:45Z Line length commit d3e75b2bad2ba5048b36300cfd61b7cb5c39414b Author: freeman the.freeman@gmail.com Date: 2014-12-25T19:29:06Z Add tests in python commit becb34474fd165ee8aae9d207532869bce3ef743 Author: freeman the.freeman@gmail.com Date: 2014-12-25T19:31:07Z Formatting --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-68111500 [Test build #24822 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24822/consoleFull) for PR 3803 at commit [`becb344`](https://github.com/apache/spark/commit/becb34474fd165ee8aae9d207532869bce3ef743). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-68111519 [Test build #24822 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24822/consoleFull) for PR 3803 at commit [`becb344`](https://github.com/apache/spark/commit/becb34474fd165ee8aae9d207532869bce3ef743). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FileInputDStream[K, V, F : NewInputFormat[K,V]](` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-68111521 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24822/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-68111695 [Test build #24823 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24823/consoleFull) for PR 3803 at commit [`fcb915c`](https://github.com/apache/spark/commit/fcb915c2fbba80b9d7b765425e203b0e3796c59d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/3804#issuecomment-68111992 cc @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...
GitHub user nchammas opened a pull request: https://github.com/apache/spark/pull/3804 [EC2] Update mesos/spark-ec2 branch to branch-1.3 You can merge this pull request into a Git repository by running: $ git pull https://github.com/nchammas/spark patch-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3804.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3804 commit cd2c0d4cec89422240c4e417d7291948ccf43a0a Author: Nicholas Chammas nicholas.cham...@gmail.com Date: 2014-12-25T20:14:16Z [EC2] Update mesos/spark-ec2 branch to branch-1.3 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3804#issuecomment-68112020 [Test build #24824 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24824/consoleFull) for PR 3804 at commit [`cd2c0d4`](https://github.com/apache/spark/commit/cd2c0d4cec89422240c4e417d7291948ccf43a0a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/3804#issuecomment-68113162 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-68113297 [Test build #24823 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24823/consoleFull) for PR 3803 at commit [`fcb915c`](https://github.com/apache/spark/commit/fcb915c2fbba80b9d7b765425e203b0e3796c59d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class FileInputDStream[K, V, F : NewInputFormat[K,V]](` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3803#issuecomment-68113298 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24823/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3804#issuecomment-68113654 [Test build #24824 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24824/consoleFull) for PR 3804 at commit [`cd2c0d4`](https://github.com/apache/spark/commit/cd2c0d4cec89422240c4e417d7291948ccf43a0a). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3804#issuecomment-68113656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24824/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update README.md
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3802#issuecomment-68114038 Good catch; I'm going to merge this into `master` (1.3.0) and `branch-1.2` (1.2.1). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update README.md
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3802 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3793#discussion_r22275672 --- Diff: ec2/spark_ec2.py --- @@ -255,6 +255,7 @@ def get_spark_shark_version(opts): 1.0.1: 1.0.1, 1.0.2: 1.0.2, 1.1.0: 1.1.0, +1.2.0: 1.2.0, --- End diff -- Probably also want to have 1.1.1 in here since it's also in `branch-1.2`; I'll just add this myself on merge, though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3793#issuecomment-68114155 LGTM, so I'll merge this into `master` (1.3.0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3793 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3804#issuecomment-68114214 LGTM, too, so I'm going to merge this into `master` (1.3.0). Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3804 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3651#issuecomment-68114368 The only issue with using append = true is that multiple test invocations will just keep appending to the file, potentially making debugging a little more confusing. Not a big deal, I think. Maybe adding a task in the root pom that runs before surefire/scalatest and just deletes that file? I suppose we still have this issue, right? It might not be a big deal if people run `mvn clean` between builds, but I could imagine it being annoying if you're doing incremental re-builds during an iterative debugging session. Is this hard to fix? I don't think it's the end of the world if we don't get to this now, though. Also, we should probably backport this change into the maintenance branches so that we can detect whether their Java tests are broken. We should verify that `SPARK_HOME` is obsolete in yarn/repl for those versions, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3466#discussion_r22275938 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala --- @@ -35,6 +35,15 @@ private[streaming] class StreamingSource(ssc: StreamingContext) extends Source { }) --- End diff -- nit: The existing version of registerGauge could have used the new version. Not a big deal, very small amount of duplicate code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3466#discussion_r22275955 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala --- @@ -55,19 +64,31 @@ private[streaming] class StreamingSource(ssc: StreamingContext) extends Source { // Gauge for last completed batch, useful for monitoring the streaming job's running status, // displayed data -1 for any abnormal condition. - registerGauge(lastCompletedBatch_submissionTime, -_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L) - registerGauge(lastCompletedBatch_processStartTime, -_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), -1L) - registerGauge(lastCompletedBatch_processEndTime, -_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L) + registerGaugeWithOption(lastCompletedBatch_submissionTime, +_.lastCompletedBatch.map(_.submissionTime), -1L) + registerGaugeWithOption(lastCompletedBatch_processingStartTime, +_.lastCompletedBatch.flatMap(_.processingStartTime), -1L) + registerGaugeWithOption(lastCompletedBatch_processingEndTime, +_.lastCompletedBatch.flatMap(_.processingEndTime), -1L) + + // Gauge for last completed batch's delay information. + registerGaugeWithOption(lastCompletedBatch_processingDelay, +_.lastCompletedBatch.flatMap(_.processingDelay), -1L) + registerGaugeWithOption(lastCompletedBatch_schedulingDelay, +_.lastCompletedBatch.flatMap(_.schedulingDelay), -1L) + registerGaugeWithOption(lastCompletedBatch_totalDelay, +_.lastCompletedBatch.flatMap(_.totalDelay), -1L) // Gauge for last received batch, useful for monitoring the streaming job's running status, // displayed data -1 for any abnormal condition. - registerGauge(lastReceivedBatch_submissionTime, -_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L) - registerGauge(lastReceivedBatch_processStartTime, -_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), -1L) - registerGauge(lastReceivedBatch_processEndTime, -_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L) + registerGaugeWithOption(lastReceivedBatch_submissionTime, +_.lastCompletedBatch.map(_.submissionTime), -1L) + registerGaugeWithOption(lastReceivedBatch_processingStartTime, +_.lastCompletedBatch.flatMap(_.processingStartTime), -1L) + registerGaugeWithOption(lastReceivedBatch_processingEndTime, +_.lastCompletedBatch.flatMap(_.processingEndTime), -1L) + + // Gauge for last received batch records and total received batch records. + registerGauge(lastReceivedBatchRecords, _.lastReceivedBatchRecords.values.sum, 0L) --- End diff -- Isnt it more consistent to name this `lastReceivedBatch_records`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3466#discussion_r22275969 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala --- @@ -55,19 +64,31 @@ private[streaming] class StreamingSource(ssc: StreamingContext) extends Source { // Gauge for last completed batch, useful for monitoring the streaming job's running status, // displayed data -1 for any abnormal condition. - registerGauge(lastCompletedBatch_submissionTime, -_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L) - registerGauge(lastCompletedBatch_processStartTime, -_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), -1L) - registerGauge(lastCompletedBatch_processEndTime, -_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L) + registerGaugeWithOption(lastCompletedBatch_submissionTime, +_.lastCompletedBatch.map(_.submissionTime), -1L) + registerGaugeWithOption(lastCompletedBatch_processingStartTime, +_.lastCompletedBatch.flatMap(_.processingStartTime), -1L) + registerGaugeWithOption(lastCompletedBatch_processingEndTime, +_.lastCompletedBatch.flatMap(_.processingEndTime), -1L) + + // Gauge for last completed batch's delay information. + registerGaugeWithOption(lastCompletedBatch_processingDelay, +_.lastCompletedBatch.flatMap(_.processingDelay), -1L) + registerGaugeWithOption(lastCompletedBatch_schedulingDelay, +_.lastCompletedBatch.flatMap(_.schedulingDelay), -1L) + registerGaugeWithOption(lastCompletedBatch_totalDelay, +_.lastCompletedBatch.flatMap(_.totalDelay), -1L) // Gauge for last received batch, useful for monitoring the streaming job's running status, // displayed data -1 for any abnormal condition. - registerGauge(lastReceivedBatch_submissionTime, -_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L) - registerGauge(lastReceivedBatch_processStartTime, -_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), -1L) - registerGauge(lastReceivedBatch_processEndTime, -_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L) + registerGaugeWithOption(lastReceivedBatch_submissionTime, +_.lastCompletedBatch.map(_.submissionTime), -1L) + registerGaugeWithOption(lastReceivedBatch_processingStartTime, +_.lastCompletedBatch.flatMap(_.processingStartTime), -1L) + registerGaugeWithOption(lastReceivedBatch_processingEndTime, +_.lastCompletedBatch.flatMap(_.processingEndTime), -1L) + + // Gauge for last received batch records and total received batch records. + registerGauge(lastReceivedBatchRecords, _.lastReceivedBatchRecords.values.sum, 0L) + registerGauge(totalReceivedBatchRecords, _.numTotalReceivedBatchRecords, 0L) --- End diff -- Since this is more related to the global streaming metrics like `totalCompletedBatches`, it might be more consistent to put these near them and naming it `totalReceivedRecords` (please update the corresponding field in the listener as well if you change this). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3466#discussion_r22275976 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala --- @@ -55,19 +64,31 @@ private[streaming] class StreamingSource(ssc: StreamingContext) extends Source { // Gauge for last completed batch, useful for monitoring the streaming job's running status, // displayed data -1 for any abnormal condition. - registerGauge(lastCompletedBatch_submissionTime, -_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L) - registerGauge(lastCompletedBatch_processStartTime, -_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), -1L) - registerGauge(lastCompletedBatch_processEndTime, -_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L) + registerGaugeWithOption(lastCompletedBatch_submissionTime, +_.lastCompletedBatch.map(_.submissionTime), -1L) + registerGaugeWithOption(lastCompletedBatch_processingStartTime, +_.lastCompletedBatch.flatMap(_.processingStartTime), -1L) + registerGaugeWithOption(lastCompletedBatch_processingEndTime, +_.lastCompletedBatch.flatMap(_.processingEndTime), -1L) + + // Gauge for last completed batch's delay information. + registerGaugeWithOption(lastCompletedBatch_processingDelay, +_.lastCompletedBatch.flatMap(_.processingDelay), -1L) + registerGaugeWithOption(lastCompletedBatch_schedulingDelay, +_.lastCompletedBatch.flatMap(_.schedulingDelay), -1L) + registerGaugeWithOption(lastCompletedBatch_totalDelay, +_.lastCompletedBatch.flatMap(_.totalDelay), -1L) // Gauge for last received batch, useful for monitoring the streaming job's running status, // displayed data -1 for any abnormal condition. - registerGauge(lastReceivedBatch_submissionTime, -_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L) - registerGauge(lastReceivedBatch_processStartTime, -_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), -1L) - registerGauge(lastReceivedBatch_processEndTime, -_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L) + registerGaugeWithOption(lastReceivedBatch_submissionTime, +_.lastCompletedBatch.map(_.submissionTime), -1L) + registerGaugeWithOption(lastReceivedBatch_processingStartTime, +_.lastCompletedBatch.flatMap(_.processingStartTime), -1L) + registerGaugeWithOption(lastReceivedBatch_processingEndTime, +_.lastCompletedBatch.flatMap(_.processingEndTime), -1L) + + // Gauge for last received batch records and total received batch records. + registerGauge(lastReceivedBatchRecords, _.lastReceivedBatchRecords.values.sum, 0L) + registerGauge(totalReceivedBatchRecords, _.numTotalReceivedBatchRecords, 0L) --- End diff -- And if its not too much work, could you add `totalProcessedRecords`? That seems useful. If it is too complicated then dont worry about it for this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3466#issuecomment-68115893 Just a couple of more comments for making the name more consistent with existing ones. Otherwise I approve of the how the `registerGauge` works now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3464#discussion_r22275984 --- Diff: docs/streaming-programming-guide.md --- @@ -66,7 +66,6 @@ main entry point for all streaming functionality. We create a local StreamingCon {% highlight scala %} import org.apache.spark._ import org.apache.spark.streaming._ -import org.apache.spark.streaming.StreamingContext._ --- End diff -- Looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3464#discussion_r22276020 --- Diff: streaming/src/test/scala/org/apache/spark/streamingtest/ImplicitSuite.scala --- @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streamingtest + +/** + * A test suite to make sure all `implicit` functions work correctly. + * + * As `implicit` is a compiler feature, we don't need to run this class. + * What we need to do is making the compiler happy. + */ +class ImplicitSuite { + + // We only want to test if `implict` works well with the compiler, so we don't need a real DStream. + def mockDStream[T]: org.apache.spark.streaming.dstream.DStream[T] = null + + def testToPairDStreamFunctions(): Unit = { +val rdd: org.apache.spark.streaming.dstream.DStream[(Int, Int)] = mockDStream --- End diff -- I think this a copy-paste error ;) should be named `dstream` instead of `rdd`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3464#issuecomment-68116003 LGTM, except one comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-68116052 @harishreedharan Since this feature involves a public API, it requires a design doc and some discussion. Could you make one, so that a few us can take a look and discuss the naming scheme and other API stuff? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4790][STREAMING] Fix ReceivedBlockTrack...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3726#discussion_r22276050 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala --- @@ -182,16 +182,34 @@ class WriteAheadLogSuite extends FunSuite with BeforeAndAfter { } test(WriteAheadLogManager - cleanup old logs) { +logCleanUpTest(waitForCompletion = false) + } + + test(WriteAheadLogManager - cleanup old logs synchronously) { +logCleanUpTest(waitForCompletion = true) + } + + private def logCleanUpTest(waitForCompletion: Boolean): Unit = { // Write data with manager, recover with new manager and verify val manualClock = new ManualClock val dataToWrite = generateRandomData() manager = writeDataUsingManager(testDir, dataToWrite, manualClock, stopManager = false) val logFiles = getLogFilesInDirectory(testDir) assert(logFiles.size 1) -manager.cleanupOldLogs(manualClock.currentTime() / 2) -eventually(timeout(1 second), interval(10 milliseconds)) { + +// To avoid code repeat +def cleanUpAndVerify(): Unit = { + manager.cleanupOldLogs(manualClock.currentTime() / 2, waitForCompletion) --- End diff -- This call should be made only once (as in real use), instead of being called multiple times from within a `eventually` block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4790][STREAMING] Fix ReceivedBlockTrack...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/3726#discussion_r22276054 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala --- @@ -178,7 +178,7 @@ private[streaming] class WriteAheadLogBasedBlockHandler( } def cleanupOldBlock(threshTime: Long) { --- End diff -- Actually, mind renaming this method to `cleanupOldBlocks`? Realized that it was inconsistent with `cleanupOldBatches` and `cleanupOldLogs`. I know its not your code :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4790][STREAMING] Fix ReceivedBlockTrack...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/3726#issuecomment-68116279 Looking good, except one (and one optional) comment in the testsuite. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3795#issuecomment-68116658 @aarondav Good idea, I'll make that change. Note that we can't do a similar fix for arrays: many PairRDDFunctions methods rely on being able to use keys to index into hashmaps, and that will involve the arrays' Object.hashCode. Therefore, we should probably strengthen the warnings for array into errors, since there's a high likelihood that users will get incorrect results. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3795#issuecomment-68117017 [Test build #24825 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24825/consoleFull) for PR 3795 at commit [`1cd87e0`](https://github.com/apache/spark/commit/1cd87e051c72ff7c17ee7c3aa8b9fd507167cdad). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3795#issuecomment-68117045 Alright, I've updated this to support Enums as @aarondav has described and have strengthened the array error-checking to prohibit most uses of arrays as keys in PairRDDFunctions, even when using a custom partitioner. In order to properly handle those cases, we would need to make sure that the hashmaps that we use for aggregation will perform special-case hashing of arrays. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3795#issuecomment-68117310 I just realized that some of this error-checking for array might not work for Java API users due to type erasure / fake class manifests. If that's the case, we might want to just move the check to runtime in HashPartitioner and throw an exception. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4970Fix an implicit bug in SparkSubmitSu...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/3805 SPARK-4970Fix an implicit bug in SparkSubmitSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SparkSubmitBugFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3805.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3805 commit 41ede0ee67f77e09f2abe96c981167ed671e0504 Author: Takeshi Yamamuro linguin@gmail.com Date: 2014-12-25T13:57:49Z Fix an implicit bug in SparkSubmitSuite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/3805#issuecomment-68117570 The test 'includes jars passed in through --jarsâ in SparkSubmitSuite fails when spark.executor.memory is set at over 512MiB in conf/spark-default.conf. An exception is thrown as follows: Exception in thread main org.apache.spark.SparkException: Asked to launch cluster with 512 MB RAM / worker but requested 1024 MB/worker at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1889) at org.apache.spark.SparkContext.init(SparkContext.scala:322) at org.apache.spark.deploy.JarCreationTest$.main(SparkSubmitSuite.scala:458) at org.apache.spark.deploy.JarCreationTest.main(SparkSubmitSuite.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:367) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3805#issuecomment-68117596 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/3784#issuecomment-68117927 Hi, @liancheng, my PR originally also not limited to Filter, i used ```transformExpressionsDown``` from my first version, the tittle of my first version is not accurate:) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/3778#discussion_r22276984 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] { } } +object ConditionSimplification extends Rule[LogicalPlan] { + import BinaryComparison.LiteralComparison + + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case q: LogicalPlan = q transformExpressionsDown { + case origin: CombinePredicate = +origin.toOptimized +} + } + + type SplitFragments = Map[Expression, Option[Expression]] + + implicit class CombinePredicateExtension(source: CombinePredicate) { +def find(goal: Expression): Boolean = { + def delegate(child: Expression): Boolean = (child, goal) match { +case (combine: CombinePredicate, _) = + isSameCombinePredicate(source, combine) combine.find(goal) + + // if left child is a literal + // LiteralComparison's unapply method change the literal and attribute position +case (LiteralComparison(childComparison), LiteralComparison(goalComparison)) = + isSame(childComparison, goalComparison) + +case other = + isSame(child, goal) + } + + // using method to avoid right side compute if left side is true + val leftResult = () = delegate(source.left) + val rightResult = () = delegate(source.right) + leftResult() || rightResult() +} + +@inline +def isOrPredicate: Boolean = { + source.isInstanceOf[Or] +} + +// create a new combine predicate that has the same combine operator as this +@inline +def build(left: Expression, right: Expression): CombinePredicate = { + CombinePredicate(left, right, isOrPredicate) +} + +// swap left child and right child +@inline +def swap: CombinePredicate = { + source.build(source.right, source.left) +} + +def toOptimized: Expression = source match { + // one CombinePredicate, left equals right , drop right, keep left + // examples: a a = a, a || a = a + case CombinePredicate(left, right) if left.fastEquals(right) = +left + + // one CombinePredicate and left and right are both binary comparison + // examples: a 2 a 2 = false + case origin @ CombinePredicate(LiteralComparison(left), LiteralComparison(right)) = +// left or right maybe change its child position, so rebuild one +val changed = origin.build(left, right) +val optimized = changed.mergeComparison +if (isSame(changed, optimized)) { + origin +} else { + optimized +} + + case origin @ CombinePredicate(left @ CombinePredicate(ll, lr), right) +if isNotCombinePredicate(ll, lr, right) = +val leftOptimized = left.toOptimized +if (isSame(left, leftOptimized)) { + if (isSame(ll, right) || isSame(lr, right)) { +if (isSameCombinePredicate(origin, left)) leftOptimized else right + } else { +val llRight = origin.build(ll, right) +val lrRight = origin.build(lr, right) +val llRightOptimized = llRight.toOptimized +val lrRightOptimized = lrRight.toOptimized +if (isSame(llRight, llRightOptimized) isSame(lrRight, lrRightOptimized)) { + origin +} else if ((isNotCombinePredicate(llRightOptimized, lrRightOptimized)) + || isSameCombinePredicate(origin, left)) { + left.build(llRightOptimized, lrRightOptimized).toOptimized +} else if (llRightOptimized.isLiteral || lrRightOptimized.isLiteral) { + left.build(llRightOptimized, lrRightOptimized) +} else { + origin +} + } +} else if (isNotCombinePredicate(leftOptimized)) { + origin.build(leftOptimized, right).toOptimized +} else { + origin +} + + case origin @ CombinePredicate(left, right @ CombinePredicate(left2, right2)) +if isNotCombinePredicate(left, left2, right2) = +val changed = origin.swap +val optimized = changed.toOptimized +if (isSame(changed, optimized)) { + origin +} else { + optimized +} + + // do optimize like : (a || b || c) a = a, here a, b , c is a condition + case origin @ CombinePredicate(left: CombinePredicate,
[GitHub] spark pull request: [SPARK-4950] Delete obsolete mapReduceTripelet...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/3782#issuecomment-68118211 Understood. I got back to the old Pregel API. And also, I'll check #1217 later :)) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/3464#discussion_r22277047 --- Diff: streaming/src/test/scala/org/apache/spark/streamingtest/ImplicitSuite.scala --- @@ -0,0 +1,35 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.streamingtest + +/** + * A test suite to make sure all `implicit` functions work correctly. + * + * As `implicit` is a compiler feature, we don't need to run this class. + * What we need to do is making the compiler happy. + */ +class ImplicitSuite { + + // We only want to test if `implict` works well with the compiler, so we don't need a real DStream. + def mockDStream[T]: org.apache.spark.streaming.dstream.DStream[T] = null + + def testToPairDStreamFunctions(): Unit = { +val rdd: org.apache.spark.streaming.dstream.DStream[(Int, Int)] = mockDStream --- End diff -- Good catch. Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org