date:20141225

[GitHub] spark pull request: [SPARK-4959] [SQL] Attributes are case sensiti...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3796#issuecomment-68092701
  
  [Test build #24812 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24812/consoleFull)
 for   PR 3796 at commit 
[`62a7a10`](https://github.com/apache/spark/commit/62a7a10b445aaf4963692fb9df3e885aaebe6051).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4959] [SQL] Attributes are case sensiti...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3796#issuecomment-68092702
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24812/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...

2014-12-25 Thread XuTingjun

Github user XuTingjun commented on the pull request:

https://github.com/apache/spark/pull/3797#issuecomment-68092844
  
@JoshRosen I am sorry to forget describe this patch. I have created a jira 
for it, can you take a look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-25 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/3798#issuecomment-68093197
  
Hi @koeninger , several simple questions:

1. How to map each RDD partition to Kafka partition, each Kafka partition 
is a RDD partition?
2. How to do receiver injection rate control, in other words, how to decide 
at which offset current task should read?
3. Do you have any consideration of fault tolerance?

In general it is quite similar to what I did long ago a Kafka InputFormat 
(https://github.com/jerryshao/kafka-input-format) which can be loaded by 
HadoopRDD. I'm not sure is this  the streaming way of fixing the exact-once 
semantics?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3797#issuecomment-68093199
  
I can take a look at this later this week.  It would probably be a good 
idea for someone more familiar with the YARN code to take a look, too, since 
they might also be able to suggest how/whether tests could have prevented the 
underlying bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread XuTingjun

GitHub user XuTingjun opened a pull request:

https://github.com/apache/spark/pull/3799

[SPARK-1507][YARN]specify num of cores for AM

I add some configurations below.
spark.yarn.am.cores/SPARK_MASTER_CORES/SPARK_DRIVER_CORES for yarn-client 
mode;
spark.driver.cores for yarn-cluster mode.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/XuTingjun/spark SPARK1507

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3799.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3799


commit c35a515982fb3235ea648806ff392b13759322fc
Author: wangfei wangf...@huawei.com
Date:   2014-12-25T08:09:33Z

specify AM core in yarn-client and yarn-cluster mode




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread XuTingjun

Github user XuTingjun commented on the pull request:

https://github.com/apache/spark/pull/3686#issuecomment-68093593
  
Hi all, I accidently delete my repository, so I create a new patch #3799 
for it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1507][YARN]specify num of cores for AM

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3799#issuecomment-68093631
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SAPRK-4967] [SQL] File name with comma will c...

2014-12-25 Thread chenghao-intel

GitHub user chenghao-intel opened a pull request:

https://github.com/apache/spark/pull/3800

[SAPRK-4967] [SQL] File name with comma will cause exception for 
SQLContext.parquetFile

This is a workaround solution to support the `,` in the parquet file name, 
however, we need to update the interface to support multiple parquet files as 
input for the API `SQLContext.parquetFile`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenghao-intel/spark spark_4967

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3800.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3800


commit 4c024a26b91c8ba39ce60b5486cf3d210b7b69bb
Author: Cheng Hao hao.ch...@intel.com
Date:   2014-12-25T08:04:30Z

Support comma in the parquet file path




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SAPRK-4967] [SQL] File name with comma will c...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3800#issuecomment-68094002
  
  [Test build #24814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24814/consoleFull)
 for   PR 3800 at commit 
[`4c024a2`](https://github.com/apache/spark/commit/4c024a26b91c8ba39ce60b5486cf3d210b7b69bb).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Remove many uses of Thread.sleep() from ...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3687#issuecomment-68094079
  
Just realized that my last comment was a bit confusing, since SPARK-1600 is 
not related to the FileInputStream ManualClock fix.  I'll file a new 
improvement JIRA to cover replacing our uses of SystemClock in tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP] Remove many uses of Thread.sleep() from ...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3687#discussion_r22269847
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala ---
@@ -281,34 +278,45 @@ class CheckpointSuite extends TestSuiteBase {
   // failure, are re-processed or not.
   test(recovery with file input stream) {
 // Set up the streaming context and input streams
+val batchDuration = Seconds(2)  // Due to 1-second resolution of 
setLastModified() on some OS's.
 val testDir = Utils.createTempDir()
-var ssc = new StreamingContext(master, framework, Seconds(1))
+var ssc = new StreamingContext(conf, batchDuration)
--- End diff --

This happens to be in a `try-finally` block, so I think that proper cleanup 
happens, but I'll replace it with `withStreamingContext` just to be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

2014-12-25 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/3795#issuecomment-68095076
  
+1 LGTM. I remember this came up at least once, so good to guard against it 
directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3795#issuecomment-68095313
  
  [Test build #24813 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24813/consoleFull)
 for   PR 3795 at commit 
[`09d837f`](https://github.com/apache/spark/commit/09d837f0f933a43cd7e2e1b8d2befec0f6516e6b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3795#issuecomment-68095317
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24813/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SAPRK-4967] [SQL] File name with comma will c...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3800#issuecomment-68095357
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24814/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SAPRK-4967] [SQL] File name with comma will c...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3800#issuecomment-68095355
  
  [Test build #24814 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24814/consoleFull)
 for   PR 3800 at commit 
[`4c024a2`](https://github.com/apache/spark/commit/4c024a26b91c8ba39ce60b5486cf3d210b7b69bb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68095691
  
  [Test build #24815 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24815/consoleFull)
 for   PR 3784 at commit 
[`00aaa63`](https://github.com/apache/spark/commit/00aaa63b0f05a3fab4211b5037714729128fcc6c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68095891
  
  [Test build #24816 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24816/consoleFull)
 for   PR 3784 at commit 
[`f4d9b8f`](https://github.com/apache/spark/commit/f4d9b8fa141e8f571e32f3a660026fe1ff907971).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2458] Make failed application log visib...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3467#issuecomment-68096064
  
  [Test build #24818 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24818/consoleFull)
 for   PR 3467 at commit 
[`f9ef854`](https://github.com/apache/spark/commit/f9ef8547228d4c56afb9fc6f43431a458a2325ca).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68096063
  
  [Test build #24817 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24817/consoleFull)
 for   PR 3784 at commit 
[`601d5f6`](https://github.com/apache/spark/commit/601d5f62f00905b971f18bd6c79630a3c604b354).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2458] Make failed application log visib...

2014-12-25 Thread tsudukim

Github user tsudukim commented on a diff in the pull request:

https://github.com/apache/spark/pull/3467#discussion_r22270614
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -180,14 +176,15 @@ private[history] class FsHistoryProvider(conf: 
SparkConf) extends ApplicationHis
   appListener.startTime.getOrElse(-1L),
   appListener.endTime.getOrElse(-1L),
   getModificationTime(dir),
-  appListener.sparkUser.getOrElse(NOT_STARTED)))
+  appListener.sparkUser.getOrElse(NOT_STARTED),
+  !fs.isFile(new Path(dir.getPath(), 
EventLoggingListener.APPLICATION_COMPLETE
   } catch {
 case e: Exception =
   logInfo(sFailed to load application log data from $dir., e)
   None
   }
 }
-.sortBy { info = -info.endTime }
+.sortBy { info = (-info.endTime, -info.startTime) }
--- End diff --

About completed applications, they are sorted by endTime because they have 
proper endTime (almost unique).
About incomplete applications, they are sorted by startTime because they 
have the same invalid endTime(-1). As the first order is not effective, the 
second order is effective.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-68096991
  
  [Test build #24819 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24819/consoleFull)
 for   PR 2405 at commit 
[`a2057e7`](https://github.com/apache/spark/commit/a2057e713d8d4fb3cb2821262bc712d6c58b4024).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2014-12-25 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/3801

[SPARK-1600] Refactor FileInputStream tests to remove Thread.sleep() calls 
and SystemClock usage

This PR refactors Spark Streaming's FileInputStream tests to remove uses of 
Thread.sleep() and SystemClock, which should hopefully resolve some 
longstanding flakiness in these tests (see SPARK-1600).

Key changes:

- Modify FileInputDStream to use the scheduler's Clock instead of 
System.currentTimeMillis(); this allows it to be tested using ManualClock.
- Fix a synchronization issue in ManualClock's `currentTime` method.
- Add a StreamingTestWaiter class which allows callers to block until a 
certain number of batches have finished.
- Change the FileInputStream tests so that files' modification times are 
manually set based off of ManualClock; this eliminates many Thread.sleep calls.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark SPARK-1600

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3801.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3801


commit a95ddc41f2b10b57fa18e75c865d7ef4507cd771
Author: Josh Rosen joshro...@databricks.com
Date:   2014-12-16T08:50:01Z

Modify FileInputDStream to use Clock class.

commit 3c3efc3f75521020f482d56b41465a6373448cf5
Author: Josh Rosen joshro...@databricks.com
Date:   2014-12-17T01:40:35Z

Synchronize `currentTime` in ManualClock

commit dda1403f3eaabe9125b87ac45ac3e7b0d667e9de
Author: Josh Rosen joshro...@databricks.com
Date:   2014-12-25T09:03:00Z

Add StreamingTestWaiter class.

commit d4f2d87729b20f1060d456a6074f2da6a4e79cb3
Author: Josh Rosen joshro...@databricks.com
Date:   2014-12-25T09:03:54Z

Refactor file input stream tests to not rely on SystemClock.

commit c8f06b10431c555dab3be461622d5d96aa807685
Author: Josh Rosen joshro...@databricks.com
Date:   2014-12-25T10:14:06Z

Remove Thread.sleep calls in FileInputStream CheckpointSuite test.

Hopefully this will fix SPARK-1600.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3801#issuecomment-68097807
  
These changes are split off from #3687, a larger PR of mine which tried to 
remove all uses of Thread.sleep() in the streaming tests.

It may look like there are a lot of changes here, but most of that is due 
to indentation changes when I modified tests to use the `withStreamingContext` 
fixture.

/cc @tdas for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3801#discussion_r22271205
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala ---
@@ -281,102 +279,130 @@ class CheckpointSuite extends TestSuiteBase {
   // failure, are re-processed or not.
   test(recovery with file input stream) {
 // Set up the streaming context and input streams
+val batchDuration = Seconds(2)  // Due to 1-second resolution of 
setLastModified() on some OS's.
 val testDir = Utils.createTempDir()
-var ssc = new StreamingContext(master, framework, Seconds(1))
-ssc.checkpoint(checkpointDir)
-val fileStream = ssc.textFileStream(testDir.toString)
-// Making value 3 take large time to process, to ensure that the master
-// shuts down in the middle of processing the 3rd batch
-val mappedStream = fileStream.map(s = {
-  val i = s.toInt
-  if (i == 3) Thread.sleep(2000)
-  i
-})
-
-// Reducing over a large window to ensure that recovery from master 
failure
-// requires reprocessing of all the files seen before the failure
-val reducedStream = mappedStream.reduceByWindow(_ + _, Seconds(30), 
Seconds(1))
-val outputBuffer = new ArrayBuffer[Seq[Int]]
-var outputStream = new TestOutputStream(reducedStream, outputBuffer)
-outputStream.register()
-ssc.start()
-
-// Create files and advance manual clock to process them
-// var clock = ssc.scheduler.clock.asInstanceOf[ManualClock]
-Thread.sleep(1000)
-for (i - Seq(1, 2, 3)) {
-  Files.write(i + \n, new File(testDir, i.toString), 
Charset.forName(UTF-8))
-  // wait to make sure that the file is written such that it gets 
shown in the file listings
-  Thread.sleep(1000)
+val outputBuffer = new ArrayBuffer[Seq[Int]] with 
SynchronizedBuffer[Seq[Int]]
+
+def writeFile(i: Int, clock: ManualClock): Unit = {
--- End diff --

I factored some of the common code out here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3801#issuecomment-68097905
  
  [Test build #24820 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24820/consoleFull)
 for   PR 3801 at commit 
[`c8f06b1`](https://github.com/apache/spark/commit/c8f06b10431c555dab3be461622d5d96aa807685).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3801#discussion_r22271345
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala ---
@@ -281,102 +279,130 @@ class CheckpointSuite extends TestSuiteBase {
   // failure, are re-processed or not.
   test(recovery with file input stream) {
 // Set up the streaming context and input streams
+val batchDuration = Seconds(2)  // Due to 1-second resolution of 
setLastModified() on some OS's.
 val testDir = Utils.createTempDir()
-var ssc = new StreamingContext(master, framework, Seconds(1))
-ssc.checkpoint(checkpointDir)
-val fileStream = ssc.textFileStream(testDir.toString)
-// Making value 3 take large time to process, to ensure that the master
-// shuts down in the middle of processing the 3rd batch
-val mappedStream = fileStream.map(s = {
-  val i = s.toInt
-  if (i == 3) Thread.sleep(2000)
-  i
-})
-
-// Reducing over a large window to ensure that recovery from master 
failure
-// requires reprocessing of all the files seen before the failure
-val reducedStream = mappedStream.reduceByWindow(_ + _, Seconds(30), 
Seconds(1))
-val outputBuffer = new ArrayBuffer[Seq[Int]]
-var outputStream = new TestOutputStream(reducedStream, outputBuffer)
-outputStream.register()
-ssc.start()
-
-// Create files and advance manual clock to process them
-// var clock = ssc.scheduler.clock.asInstanceOf[ManualClock]
-Thread.sleep(1000)
-for (i - Seq(1, 2, 3)) {
-  Files.write(i + \n, new File(testDir, i.toString), 
Charset.forName(UTF-8))
-  // wait to make sure that the file is written such that it gets 
shown in the file listings
-  Thread.sleep(1000)
+val outputBuffer = new ArrayBuffer[Seq[Int]] with 
SynchronizedBuffer[Seq[Int]]
+
+def writeFile(i: Int, clock: ManualClock): Unit = {
+  val file = new File(testDir, i.toString)
+  Files.write(i + \n, file, Charsets.UTF_8)
+  assert(file.setLastModified(clock.currentTime()))
+  // Check that the file's modification date is actually the value we 
wrote, since rounding or
+  // truncation will break the test:
+  assert(file.lastModified() === clock.currentTime())
 }
-logInfo(Output =  + outputStream.output.mkString(,))
-assert(outputStream.output.size  0, No files processed before 
restart)
-ssc.stop()
 
-// Verify whether files created have been recorded correctly or not
-var fileInputDStream = 
ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]]
-def recordedFiles = 
fileInputDStream.batchTimeToSelectedFiles.values.flatten
-assert(!recordedFiles.filter(_.endsWith(1)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(2)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(3)).isEmpty)
-
-// Create files while the master is down
-for (i - Seq(4, 5, 6)) {
-  Files.write(i + \n, new File(testDir, i.toString), 
Charset.forName(UTF-8))
-  Thread.sleep(1000)
+def recordedFiles(ssc: StreamingContext): Seq[Int] = {
+  val fileInputDStream =
+ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, 
_, _]]
+  val filenames = 
fileInputDStream.batchTimeToSelectedFiles.values.flatten
+  filenames.map(_.split(File.separator).last.toInt).toSeq.sorted
 }
 
-// Recover context from checkpoint file and verify whether the files 
that were
-// recorded before failure were saved and successfully recovered
-logInfo(*** RESTARTING )
-ssc = new StreamingContext(checkpointDir)
-fileInputDStream = 
ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]]
-assert(!recordedFiles.filter(_.endsWith(1)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(2)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(3)).isEmpty)
+try {
+  // This is a var because it's re-assigned when we restart from a 
checkpoint:
+  var clock: ManualClock = null
+  withStreamingContext(new StreamingContext(conf, batchDuration)) { 
ssc =
+ssc.checkpoint(checkpointDir)
+clock = ssc.scheduler.clock.asInstanceOf[ManualClock]
+val waiter = new StreamingTestWaiter(ssc)
+val fileStream = ssc.textFileStream(testDir.toString)
+// MKW value 3 take a large time to process, to ensure that the 
driver
+// shuts down in the middle of processing the 3rd batch
+val mappedStream = fileStream.map(s = {
+  val i = s.toInt
+  if (i == 3) Thread.sleep(4000)
+  i
+

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3801#discussion_r22271355
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/CheckpointSuite.scala ---
@@ -281,102 +279,130 @@ class CheckpointSuite extends TestSuiteBase {
   // failure, are re-processed or not.
   test(recovery with file input stream) {
 // Set up the streaming context and input streams
+val batchDuration = Seconds(2)  // Due to 1-second resolution of 
setLastModified() on some OS's.
 val testDir = Utils.createTempDir()
-var ssc = new StreamingContext(master, framework, Seconds(1))
-ssc.checkpoint(checkpointDir)
-val fileStream = ssc.textFileStream(testDir.toString)
-// Making value 3 take large time to process, to ensure that the master
-// shuts down in the middle of processing the 3rd batch
-val mappedStream = fileStream.map(s = {
-  val i = s.toInt
-  if (i == 3) Thread.sleep(2000)
-  i
-})
-
-// Reducing over a large window to ensure that recovery from master 
failure
-// requires reprocessing of all the files seen before the failure
-val reducedStream = mappedStream.reduceByWindow(_ + _, Seconds(30), 
Seconds(1))
-val outputBuffer = new ArrayBuffer[Seq[Int]]
-var outputStream = new TestOutputStream(reducedStream, outputBuffer)
-outputStream.register()
-ssc.start()
-
-// Create files and advance manual clock to process them
-// var clock = ssc.scheduler.clock.asInstanceOf[ManualClock]
-Thread.sleep(1000)
-for (i - Seq(1, 2, 3)) {
-  Files.write(i + \n, new File(testDir, i.toString), 
Charset.forName(UTF-8))
-  // wait to make sure that the file is written such that it gets 
shown in the file listings
-  Thread.sleep(1000)
+val outputBuffer = new ArrayBuffer[Seq[Int]] with 
SynchronizedBuffer[Seq[Int]]
+
+def writeFile(i: Int, clock: ManualClock): Unit = {
+  val file = new File(testDir, i.toString)
+  Files.write(i + \n, file, Charsets.UTF_8)
+  assert(file.setLastModified(clock.currentTime()))
+  // Check that the file's modification date is actually the value we 
wrote, since rounding or
+  // truncation will break the test:
+  assert(file.lastModified() === clock.currentTime())
 }
-logInfo(Output =  + outputStream.output.mkString(,))
-assert(outputStream.output.size  0, No files processed before 
restart)
-ssc.stop()
 
-// Verify whether files created have been recorded correctly or not
-var fileInputDStream = 
ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]]
-def recordedFiles = 
fileInputDStream.batchTimeToSelectedFiles.values.flatten
-assert(!recordedFiles.filter(_.endsWith(1)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(2)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(3)).isEmpty)
-
-// Create files while the master is down
-for (i - Seq(4, 5, 6)) {
-  Files.write(i + \n, new File(testDir, i.toString), 
Charset.forName(UTF-8))
-  Thread.sleep(1000)
+def recordedFiles(ssc: StreamingContext): Seq[Int] = {
+  val fileInputDStream =
+ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, 
_, _]]
+  val filenames = 
fileInputDStream.batchTimeToSelectedFiles.values.flatten
+  filenames.map(_.split(File.separator).last.toInt).toSeq.sorted
 }
 
-// Recover context from checkpoint file and verify whether the files 
that were
-// recorded before failure were saved and successfully recovered
-logInfo(*** RESTARTING )
-ssc = new StreamingContext(checkpointDir)
-fileInputDStream = 
ssc.graph.getInputStreams().head.asInstanceOf[FileInputDStream[_, _, _]]
-assert(!recordedFiles.filter(_.endsWith(1)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(2)).isEmpty)
-assert(!recordedFiles.filter(_.endsWith(3)).isEmpty)
+try {
+  // This is a var because it's re-assigned when we restart from a 
checkpoint:
+  var clock: ManualClock = null
+  withStreamingContext(new StreamingContext(conf, batchDuration)) { 
ssc =
+ssc.checkpoint(checkpointDir)
+clock = ssc.scheduler.clock.asInstanceOf[ManualClock]
+val waiter = new StreamingTestWaiter(ssc)
+val fileStream = ssc.textFileStream(testDir.toString)
+// MKW value 3 take a large time to process, to ensure that the 
driver
+// shuts down in the middle of processing the 3rd batch
+val mappedStream = fileStream.map(s = {
+  val i = s.toInt
+  if (i == 3) Thread.sleep(4000)
+  i
+

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68098294
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24815/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68098292
  
  [Test build #24815 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24815/consoleFull)
 for   PR 3784 at commit 
[`00aaa63`](https://github.com/apache/spark/commit/00aaa63b0f05a3fab4211b5037714729128fcc6c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2458] Make failed application log visib...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3467#issuecomment-68098394
  
  [Test build #24818 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24818/consoleFull)
 for   PR 3467 at commit 
[`f9ef854`](https://github.com/apache/spark/commit/f9ef8547228d4c56afb9fc6f43431a458a2325ca).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2458] Make failed application log visib...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3467#issuecomment-68098398
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24818/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68098518
  
  [Test build #24816 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24816/consoleFull)
 for   PR 3784 at commit 
[`f4d9b8f`](https://github.com/apache/spark/commit/f4d9b8fa141e8f571e32f3a660026fe1ff907971).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68098521
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24816/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68098625
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24817/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68098622
  
  [Test build #24817 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24817/consoleFull)
 for   PR 3784 at commit 
[`601d5f6`](https://github.com/apache/spark/commit/601d5f62f00905b971f18bd6c79630a3c604b354).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-68099780
  
  [Test build #24819 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24819/consoleFull)
 for   PR 2405 at commit 
[`a2057e7`](https://github.com/apache/spark/commit/a2057e713d8d4fb3cb2821262bc712d6c58b4024).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class UnresolvedGetField(child: Expression, fieldName: String) 
extends UnaryExpression `
  * `case class StructGetField(child: Expression, field: StructField, 
ordinal: Int) extends GetField `
  * `case class ArrayGetField(child: Expression, field: StructField, 
ordinal: Int, containsNull: Boolean)`
  * `trait GetField extends UnaryExpression `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2096][SQL] support dot notation on arra...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2405#issuecomment-68099783
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24819/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3801#issuecomment-68100834
  
  [Test build #24820 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24820/consoleFull)
 for   PR 3801 at commit 
[`c8f06b1`](https://github.com/apache/spark/commit/c8f06b10431c555dab3be461622d5d96aa807685).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1600] Refactor FileInputStream tests to...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3801#issuecomment-68100836
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24820/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4813][Streaming] Fix the issue that Con...

2014-12-25 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/3661#discussion_r22272234
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/ContextWaiter.scala ---
@@ -17,30 +17,63 @@
 
 package org.apache.spark.streaming
 
+import java.util.concurrent.TimeUnit
+import java.util.concurrent.locks.ReentrantLock
+
 private[streaming] class ContextWaiter {
+
+  private val lock = new ReentrantLock()
+  private val condition = lock.newCondition()
+
+  // Guarded by lock
   private var error: Throwable = null
-  private var stopped: Boolean = false
 
-  def notifyError(e: Throwable) = synchronized {
-error = e
-notifyAll()
-  }
+  // Guarded by lock
+  private var stopped: Boolean = false
 
-  def notifyStop() = synchronized {
-stopped = true
-notifyAll()
+  def notifyError(e: Throwable): Unit = {
+lock.lock()
+try {
+  error = e
+  condition.signalAll()
+} finally {
+  lock.unlock()
+}
   }
 
-  def waitForStopOrError(timeout: Long = -1) = synchronized {
-// If already had error, then throw it
-if (error != null) {
-  throw error
+  def notifyStop(): Unit = {
+lock.lock()
+try {
+  stopped = true
+  condition.signalAll()
+} finally {
+  lock.unlock()
 }
+  }
 
-// If not already stopped, then wait
-if (!stopped) {
-  if (timeout  0) wait() else wait(timeout)
+  /**
+   * Return `true` if it's stopped; or throw the reported error if 
`notifyError` has been called; or
+   * `false` if the waiting time detectably elapsed before return from the 
method.
+   */
+  def waitForStopOrError(timeout: Long = -1): Boolean = {
--- End diff --

How about making `awaitTermination` throw a TimeoutException if timeout? It 
looks a better API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...

2014-12-25 Thread zsxwing

Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/3464#issuecomment-68101164
  
ping @tdas 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4966][YARN]The MemoryOverhead value is ...

2014-12-25 Thread lianhuiwang

Github user lianhuiwang commented on the pull request:

https://github.com/apache/spark/pull/3797#issuecomment-68103677
  
@XuTingjun yes, i agree with you. we should let parseArgs before using 
config amMemory and executorMemory. because parseArgs can change these value 
from args.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68105414
  
Hi @liancheng, i admit my PR is more complicated, but this only cover three 
cases, i think we'd better adding a separate rule to  optimize And/Or in sql 
for as many as possible cases, not mix several cases in  BooleanSimplification. 
 So I am refactorying my PR to make it more readable and clean.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

2014-12-25 Thread aarondav

Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/3795#issuecomment-68105479
  
To note, my suggested solution would look more like this in HashPartitioner:

```scala
  def getPartition(key: Any): Int = key match {
case null = 0
case enum: Enum[_] = Utils.nonNegativeMod(enum.ordinal(), 
numPartitions)
case _ = Utils.nonNegativeMod(key.hashCode, numPartitions)
  }
```

This does not require reflection and Java is fast at doing instanceof 
checks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4939] move to next locality when no pen...

2014-12-25 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3779#issuecomment-68105490
  
@davies can you add a unit test that fails in the old code but works with 
your code? It would be helpful to more clearly document the exact bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

2014-12-25 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3787#issuecomment-68105645
  
This looks good - thanks @sarutak and @srowen!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4953][Doc] Fix the description of build...

2014-12-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3787


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4723] [CORE] To abort the stages which ...

2014-12-25 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3786#issuecomment-68106101
  
I agree with mark on this. We should try to identify root causes always as 
a first step.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68106662
  
  [Test build #24821 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24821/consoleFull)
 for   PR 3784 at commit 
[`caca560`](https://github.com/apache/spark/commit/caca56024026d8211cf55eba2da95279a6b000bd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread liancheng

Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22274369
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] {
   }
 }
 
+object ConditionSimplification extends Rule[LogicalPlan] {
+  import BinaryComparison.LiteralComparison
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan = q transformExpressionsDown  {
+  case origin: CombinePredicate =
+origin.toOptimized
+}
+  }
+
+  type SplitFragments = Map[Expression, Option[Expression]]
+
+  implicit class CombinePredicateExtension(source: CombinePredicate) {
+def find(goal: Expression): Boolean = {
+  def delegate(child: Expression): Boolean = (child, goal) match {
+case (combine: CombinePredicate, _) =
+  isSameCombinePredicate(source, combine)  combine.find(goal)
+
+ // if left child is a literal
+ // LiteralComparison's unapply method change the literal and 
attribute position
+case (LiteralComparison(childComparison), 
LiteralComparison(goalComparison)) =
+  isSame(childComparison, goalComparison)
+
+case other =
+  isSame(child, goal)
+  }
+
+  // using method to avoid right side compute if left side is true
+  val leftResult = () = delegate(source.left)
+  val rightResult = () = delegate(source.right)
+  leftResult() || rightResult()
+}
+
+@inline
+def isOrPredicate: Boolean = {
+  source.isInstanceOf[Or]
+}
+
+// create a new combine predicate that has the same combine operator 
as this
+@inline
+def build(left: Expression, right: Expression): CombinePredicate = {
+  CombinePredicate(left, right, isOrPredicate)
+}
+
+// swap left child and right child
+@inline
+def swap: CombinePredicate = {
+  source.build(source.right, source.left)
+}
+
+def toOptimized: Expression = source match {
+  // one CombinePredicate, left equals right , drop right, keep left
+  // examples: a  a = a, a || a = a
+  case CombinePredicate(left, right) if left.fastEquals(right) =
+left
+
+  // one CombinePredicate and left and right are both binary comparison
+  // examples: a  2  a  2 = false
+  case origin @ CombinePredicate(LiteralComparison(left), 
LiteralComparison(right)) =
+// left or right maybe change its child position, so rebuild one
+val changed = origin.build(left, right)
+val optimized = changed.mergeComparison
+if (isSame(changed, optimized)) {
+  origin
+} else {
+  optimized
+}
+
+  case origin @ CombinePredicate(left @ CombinePredicate(ll, lr), 
right)
+if isNotCombinePredicate(ll, lr, right) =
+val leftOptimized = left.toOptimized
+if (isSame(left, leftOptimized)) {
+  if (isSame(ll, right) || isSame(lr, right)) {
+if (isSameCombinePredicate(origin, left)) leftOptimized else 
right
+  } else {
+val llRight = origin.build(ll, right)
+val lrRight = origin.build(lr, right)
+val llRightOptimized = llRight.toOptimized
+val lrRightOptimized = lrRight.toOptimized
+if (isSame(llRight, llRightOptimized)  isSame(lrRight, 
lrRightOptimized)) {
+  origin
+} else if ((isNotCombinePredicate(llRightOptimized, 
lrRightOptimized))
+  || isSameCombinePredicate(origin, left)) {
+  left.build(llRightOptimized, lrRightOptimized).toOptimized
+} else if (llRightOptimized.isLiteral || 
lrRightOptimized.isLiteral) {
+  left.build(llRightOptimized, lrRightOptimized)
+} else {
+  origin
+}
+  }
+} else if (isNotCombinePredicate(leftOptimized)) {
+  origin.build(leftOptimized, right).toOptimized
+} else {
+  origin
+}
+
+  case origin @ CombinePredicate(left, right @ CombinePredicate(left2, 
right2))
+if isNotCombinePredicate(left, left2, right2) =
+val changed = origin.swap
+val optimized = changed.toOptimized
+if (isSame(changed, optimized)) {
+  origin
+} else {
+  optimized
+}
+
+  // do optimize like : (a || b || c)   a = a, here a, b , c is a 
condition
+  case origin @ CombinePredicate(left:

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68108140
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24821/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68108135
  
  [Test build #24821 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24821/consoleFull)
 for   PR 3784 at commit 
[`caca560`](https://github.com/apache/spark/commit/caca56024026d8211cf55eba2da95279a6b000bd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update README.md

2014-12-25 Thread dennyglee

GitHub user dennyglee opened a pull request:

https://github.com/apache/spark/pull/3802

Update README.md

Corrected link to the Building Spark with Maven page from its original 
(http://spark.apache.org/docs/latest/building-with-maven.html) to the current 
page (http://spark.apache.org/docs/latest/building-spark.html)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dennyglee/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3802.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3802


commit 15f601a5ac754c5429153a87f0b89743f6ad48d5
Author: Denny Lee denny.g@gmail.com
Date:   2014-12-25T17:22:43Z

Update README.md

Corrected link to the Building Spark with Maven page from its original 
(http://spark.apache.org/docs/latest/building-with-maven.html) to the current 
page (http://spark.apache.org/docs/latest/building-spark.html)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update README.md

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3802#issuecomment-68108716
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...

2014-12-25 Thread freeman-lab

GitHub user freeman-lab opened a pull request:

https://github.com/apache/spark/pull/3803

[SPARK-4969] [STREAMING] [PYTHON] Add binaryRecords to streaming

In Spark 1.2 we added a `binaryRecords` input method for loading flat 
binary data. This format is useful for numerical array data, e.g. in scientific 
computing applications. This PR adds support for the same format in Streaming 
applications, where it is similarly useful, especially for streaming time 
series or sensor data.

Summary of additions
- adding `binaryRecordsStream` to Spark Streaming 
- exposing `binaryRecordsStream` in the new PySpark Streaming
- new unit tests in Scala and Python

This required adding an optional Hadoop configuration param to `fileStream` 
and `FileInputStream`, but was otherwise straightforward.

@tdas @davies

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/freeman-lab/spark streaming-binary-records

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3803.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3803


commit 8550c2619aba22b40dc109171b395522ccfaaf08
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T09:31:31Z

Expose additional argument combination

commit ecef0eb8d4bf30627e5b35c40c2f4204e1670390
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T09:34:49Z

Add binaryRecordsStream to python

commit fe4e803f8810c19aac02e7c8927af1d08b2f0a94
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T09:35:12Z

Add binaryRecordStream to Java API

commit 36cb0fd576abb20b9c3210774ec9ff0471e2cf48
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T09:35:41Z

Add binaryRecordsStream to scala

commit 23dd69f318aedbf12cab10380a50d94ce8c3ca92
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T09:35:52Z

Tests for binaryRecordsStream

commit 9398bcb615c6cbf033b796c0837c99aba83303b4
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T09:40:06Z

Expose optional hadoop configuration

commit 28bff9bab7be7c2f614a011f6b68e2103234c1df
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T10:02:42Z

Fix missing arg

commit 8b70fbcf785074c7cde873cf10e8d5f0ea9e3979
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T10:03:01Z

Reorganization

commit 2843e9de60f23bbce3ac185c09b8575a7513fe0d
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T17:43:20Z

Add params to docstring

commit 94d90d0fbc576c4e475bb0a053e6c35d53152cf4
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T17:44:09Z

Spelling

commit 1c739aa67a006a62a6ee8f294ff60568f9031476
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T17:48:04Z

Simpler default arg handling

commit 029d49c143c7bed603db3ca43b44d212de516df8
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T17:50:42Z

Formatting

commit a4324a38f8155f6b3e776326925af61f16a2fdfb
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T17:56:45Z

Line length

commit d3e75b2bad2ba5048b36300cfd61b7cb5c39414b
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T19:29:06Z

Add tests in python

commit becb34474fd165ee8aae9d207532869bce3ef743
Author: freeman the.freeman@gmail.com
Date:   2014-12-25T19:31:07Z

Formatting




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-68111500
  
  [Test build #24822 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24822/consoleFull)
 for   PR 3803 at commit 
[`becb344`](https://github.com/apache/spark/commit/becb34474fd165ee8aae9d207532869bce3ef743).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-68111519
  
  [Test build #24822 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24822/consoleFull)
 for   PR 3803 at commit 
[`becb344`](https://github.com/apache/spark/commit/becb34474fd165ee8aae9d207532869bce3ef743).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FileInputDStream[K, V, F : NewInputFormat[K,V]](`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-68111521
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24822/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-68111695
  
  [Test build #24823 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24823/consoleFull)
 for   PR 3803 at commit 
[`fcb915c`](https://github.com/apache/spark/commit/fcb915c2fbba80b9d7b765425e203b0e3796c59d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...

2014-12-25 Thread nchammas

Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3804#issuecomment-68111992
  
cc @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...

2014-12-25 Thread nchammas

GitHub user nchammas opened a pull request:

https://github.com/apache/spark/pull/3804

[EC2] Update mesos/spark-ec2 branch to branch-1.3



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nchammas/spark patch-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3804.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3804


commit cd2c0d4cec89422240c4e417d7291948ccf43a0a
Author: Nicholas Chammas nicholas.cham...@gmail.com
Date:   2014-12-25T20:14:16Z

[EC2] Update mesos/spark-ec2 branch to branch-1.3




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3804#issuecomment-68112020
  
  [Test build #24824 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24824/consoleFull)
 for   PR 3804 at commit 
[`cd2c0d4`](https://github.com/apache/spark/commit/cd2c0d4cec89422240c4e417d7291948ccf43a0a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...

2014-12-25 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/3804#issuecomment-68113162
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-68113297
  
  [Test build #24823 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24823/consoleFull)
 for   PR 3803 at commit 
[`fcb915c`](https://github.com/apache/spark/commit/fcb915c2fbba80b9d7b765425e203b0e3796c59d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class FileInputDStream[K, V, F : NewInputFormat[K,V]](`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4969] [STREAMING] [PYTHON] Add binaryRe...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3803#issuecomment-68113298
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24823/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3804#issuecomment-68113654
  
  [Test build #24824 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24824/consoleFull)
 for   PR 3804 at commit 
[`cd2c0d4`](https://github.com/apache/spark/commit/cd2c0d4cec89422240c4e417d7291948ccf43a0a).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3804#issuecomment-68113656
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24824/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update README.md

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3802#issuecomment-68114038
  
Good catch; I'm going to merge this into `master` (1.3.0) and `branch-1.2` 
(1.2.1).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Update README.md

2014-12-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3802


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3793#discussion_r22275672
  
--- Diff: ec2/spark_ec2.py ---
@@ -255,6 +255,7 @@ def get_spark_shark_version(opts):
 1.0.1: 1.0.1,
 1.0.2: 1.0.2,
 1.1.0: 1.1.0,
+1.2.0: 1.2.0,
--- End diff --

Probably also want to have 1.1.1 in here since it's also in `branch-1.2`; 
I'll just add this myself on merge, though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3793#issuecomment-68114155
  
LGTM, so I'll merge this into `master` (1.3.0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update default Spark version to 1.2.0

2014-12-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3793


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3804#issuecomment-68114214
  
LGTM, too, so I'm going to merge this into `master` (1.3.0).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [EC2] Update mesos/spark-ec2 branch to branch-...

2014-12-25 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3804


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4159 [CORE] Maven build doesn't run JUni...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3651#issuecomment-68114368
  
 The only issue with using append = true is that multiple test invocations 
will just keep appending to the file, potentially making debugging a little 
more confusing. Not a big deal, I think. Maybe adding a task in the root pom 
that runs before surefire/scalatest and just deletes that file?

I suppose we still have this issue, right?  It might not be a big deal if 
people run `mvn clean` between builds, but I could imagine it being annoying if 
you're doing incremental re-builds during an iterative debugging session.  Is 
this hard to fix?  I don't think it's the end of the world if we don't get to 
this now, though.

Also, we should probably backport this change into the maintenance branches 
so that we can detect whether their Java tests are broken.  We should verify 
that `SPARK_HOME` is obsolete in yarn/repl for those versions, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...

2014-12-25 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3466#discussion_r22275938
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala ---
@@ -35,6 +35,15 @@ private[streaming] class StreamingSource(ssc: 
StreamingContext) extends Source {
 })
--- End diff --

nit: The existing version of registerGauge could have used the new version. 
Not a big deal, very small amount of duplicate code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...

2014-12-25 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3466#discussion_r22275955
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala ---
@@ -55,19 +64,31 @@ private[streaming] class StreamingSource(ssc: 
StreamingContext) extends Source {
 
   // Gauge for last completed batch, useful for monitoring the streaming 
job's running status,
   // displayed data -1 for any abnormal condition.
-  registerGauge(lastCompletedBatch_submissionTime,
-_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L)
-  registerGauge(lastCompletedBatch_processStartTime,
-_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), 
-1L)
-  registerGauge(lastCompletedBatch_processEndTime,
-_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L)
+  registerGaugeWithOption(lastCompletedBatch_submissionTime,
+_.lastCompletedBatch.map(_.submissionTime), -1L)
+  registerGaugeWithOption(lastCompletedBatch_processingStartTime,
+_.lastCompletedBatch.flatMap(_.processingStartTime), -1L)
+  registerGaugeWithOption(lastCompletedBatch_processingEndTime,
+_.lastCompletedBatch.flatMap(_.processingEndTime), -1L)
+
+  // Gauge for last completed batch's delay information.
+  registerGaugeWithOption(lastCompletedBatch_processingDelay,
+_.lastCompletedBatch.flatMap(_.processingDelay), -1L)
+  registerGaugeWithOption(lastCompletedBatch_schedulingDelay,
+_.lastCompletedBatch.flatMap(_.schedulingDelay), -1L)
+  registerGaugeWithOption(lastCompletedBatch_totalDelay,
+_.lastCompletedBatch.flatMap(_.totalDelay), -1L)
 
   // Gauge for last received batch, useful for monitoring the streaming 
job's running status,
   // displayed data -1 for any abnormal condition.
-  registerGauge(lastReceivedBatch_submissionTime,
-_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L)
-  registerGauge(lastReceivedBatch_processStartTime,
-_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), 
-1L)
-  registerGauge(lastReceivedBatch_processEndTime,
-_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L)
+  registerGaugeWithOption(lastReceivedBatch_submissionTime,
+_.lastCompletedBatch.map(_.submissionTime), -1L)
+  registerGaugeWithOption(lastReceivedBatch_processingStartTime,
+_.lastCompletedBatch.flatMap(_.processingStartTime), -1L)
+  registerGaugeWithOption(lastReceivedBatch_processingEndTime,
+_.lastCompletedBatch.flatMap(_.processingEndTime), -1L)
+
+  // Gauge for last received batch records and total received batch 
records.
+  registerGauge(lastReceivedBatchRecords, 
_.lastReceivedBatchRecords.values.sum, 0L)
--- End diff --

Isnt it more consistent to name this `lastReceivedBatch_records`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...

2014-12-25 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3466#discussion_r22275969
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala ---
@@ -55,19 +64,31 @@ private[streaming] class StreamingSource(ssc: 
StreamingContext) extends Source {
 
   // Gauge for last completed batch, useful for monitoring the streaming 
job's running status,
   // displayed data -1 for any abnormal condition.
-  registerGauge(lastCompletedBatch_submissionTime,
-_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L)
-  registerGauge(lastCompletedBatch_processStartTime,
-_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), 
-1L)
-  registerGauge(lastCompletedBatch_processEndTime,
-_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L)
+  registerGaugeWithOption(lastCompletedBatch_submissionTime,
+_.lastCompletedBatch.map(_.submissionTime), -1L)
+  registerGaugeWithOption(lastCompletedBatch_processingStartTime,
+_.lastCompletedBatch.flatMap(_.processingStartTime), -1L)
+  registerGaugeWithOption(lastCompletedBatch_processingEndTime,
+_.lastCompletedBatch.flatMap(_.processingEndTime), -1L)
+
+  // Gauge for last completed batch's delay information.
+  registerGaugeWithOption(lastCompletedBatch_processingDelay,
+_.lastCompletedBatch.flatMap(_.processingDelay), -1L)
+  registerGaugeWithOption(lastCompletedBatch_schedulingDelay,
+_.lastCompletedBatch.flatMap(_.schedulingDelay), -1L)
+  registerGaugeWithOption(lastCompletedBatch_totalDelay,
+_.lastCompletedBatch.flatMap(_.totalDelay), -1L)
 
   // Gauge for last received batch, useful for monitoring the streaming 
job's running status,
   // displayed data -1 for any abnormal condition.
-  registerGauge(lastReceivedBatch_submissionTime,
-_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L)
-  registerGauge(lastReceivedBatch_processStartTime,
-_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), 
-1L)
-  registerGauge(lastReceivedBatch_processEndTime,
-_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L)
+  registerGaugeWithOption(lastReceivedBatch_submissionTime,
+_.lastCompletedBatch.map(_.submissionTime), -1L)
+  registerGaugeWithOption(lastReceivedBatch_processingStartTime,
+_.lastCompletedBatch.flatMap(_.processingStartTime), -1L)
+  registerGaugeWithOption(lastReceivedBatch_processingEndTime,
+_.lastCompletedBatch.flatMap(_.processingEndTime), -1L)
+
+  // Gauge for last received batch records and total received batch 
records.
+  registerGauge(lastReceivedBatchRecords, 
_.lastReceivedBatchRecords.values.sum, 0L)
+  registerGauge(totalReceivedBatchRecords, 
_.numTotalReceivedBatchRecords, 0L)
--- End diff --

Since this is more related to the global streaming metrics like 
`totalCompletedBatches`, it might be more consistent to put these near them and 
naming it `totalReceivedRecords` (please update the corresponding field in the 
listener as well if you change this). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...

2014-12-25 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3466#discussion_r22275976
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/StreamingSource.scala ---
@@ -55,19 +64,31 @@ private[streaming] class StreamingSource(ssc: 
StreamingContext) extends Source {
 
   // Gauge for last completed batch, useful for monitoring the streaming 
job's running status,
   // displayed data -1 for any abnormal condition.
-  registerGauge(lastCompletedBatch_submissionTime,
-_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L)
-  registerGauge(lastCompletedBatch_processStartTime,
-_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), 
-1L)
-  registerGauge(lastCompletedBatch_processEndTime,
-_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L)
+  registerGaugeWithOption(lastCompletedBatch_submissionTime,
+_.lastCompletedBatch.map(_.submissionTime), -1L)
+  registerGaugeWithOption(lastCompletedBatch_processingStartTime,
+_.lastCompletedBatch.flatMap(_.processingStartTime), -1L)
+  registerGaugeWithOption(lastCompletedBatch_processingEndTime,
+_.lastCompletedBatch.flatMap(_.processingEndTime), -1L)
+
+  // Gauge for last completed batch's delay information.
+  registerGaugeWithOption(lastCompletedBatch_processingDelay,
+_.lastCompletedBatch.flatMap(_.processingDelay), -1L)
+  registerGaugeWithOption(lastCompletedBatch_schedulingDelay,
+_.lastCompletedBatch.flatMap(_.schedulingDelay), -1L)
+  registerGaugeWithOption(lastCompletedBatch_totalDelay,
+_.lastCompletedBatch.flatMap(_.totalDelay), -1L)
 
   // Gauge for last received batch, useful for monitoring the streaming 
job's running status,
   // displayed data -1 for any abnormal condition.
-  registerGauge(lastReceivedBatch_submissionTime,
-_.lastCompletedBatch.map(_.submissionTime).getOrElse(-1L), -1L)
-  registerGauge(lastReceivedBatch_processStartTime,
-_.lastCompletedBatch.flatMap(_.processingStartTime).getOrElse(-1L), 
-1L)
-  registerGauge(lastReceivedBatch_processEndTime,
-_.lastCompletedBatch.flatMap(_.processingEndTime).getOrElse(-1L), -1L)
+  registerGaugeWithOption(lastReceivedBatch_submissionTime,
+_.lastCompletedBatch.map(_.submissionTime), -1L)
+  registerGaugeWithOption(lastReceivedBatch_processingStartTime,
+_.lastCompletedBatch.flatMap(_.processingStartTime), -1L)
+  registerGaugeWithOption(lastReceivedBatch_processingEndTime,
+_.lastCompletedBatch.flatMap(_.processingEndTime), -1L)
+
+  // Gauge for last received batch records and total received batch 
records.
+  registerGauge(lastReceivedBatchRecords, 
_.lastReceivedBatchRecords.values.sum, 0L)
+  registerGauge(totalReceivedBatchRecords, 
_.numTotalReceivedBatchRecords, 0L)
--- End diff --

And if its not too much work, could you add `totalProcessedRecords`? That 
seems useful. If it is too complicated then dont worry about it for this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4537][Streaming] Expand StreamingSource...

2014-12-25 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3466#issuecomment-68115893
  
Just a couple of more comments for making the name more consistent with 
existing ones. Otherwise I approve of the how the `registerGauge` works now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...

2014-12-25 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3464#discussion_r22275984
  
--- Diff: docs/streaming-programming-guide.md ---
@@ -66,7 +66,6 @@ main entry point for all streaming functionality. We 
create a local StreamingCon
 {% highlight scala %}
 import org.apache.spark._
 import org.apache.spark.streaming._
-import org.apache.spark.streaming.StreamingContext._
--- End diff --

Looks good. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...

2014-12-25 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3464#discussion_r22276020
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streamingtest/ImplicitSuite.scala ---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streamingtest
+
+/**
+ * A test suite to make sure all `implicit` functions work correctly.
+ *
+ * As `implicit` is a compiler feature, we don't need to run this class.
+ * What we need to do is making the compiler happy.
+ */
+class ImplicitSuite {
+
+  // We only want to test if `implict` works well with the compiler, so we 
don't need a real DStream.
+  def mockDStream[T]: org.apache.spark.streaming.dstream.DStream[T] = null
+
+  def testToPairDStreamFunctions(): Unit = {
+val rdd: org.apache.spark.streaming.dstream.DStream[(Int, Int)] = 
mockDStream
--- End diff --

I think this a copy-paste error ;) should be named `dstream` instead of 
`rdd`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...

2014-12-25 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3464#issuecomment-68116003
  
LGTM, except one comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2014-12-25 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/2994#issuecomment-68116052
  
@harishreedharan Since this feature involves a public API, it requires a 
design doc and some discussion.  Could you make one, so that a few us can take 
a look and discuss the naming scheme and other API stuff?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4790][STREAMING] Fix ReceivedBlockTrack...

2014-12-25 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3726#discussion_r22276050
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala
 ---
@@ -182,16 +182,34 @@ class WriteAheadLogSuite extends FunSuite with 
BeforeAndAfter {
   }
 
   test(WriteAheadLogManager - cleanup old logs) {
+logCleanUpTest(waitForCompletion = false)
+  }
+
+  test(WriteAheadLogManager - cleanup old logs synchronously) {
+logCleanUpTest(waitForCompletion = true)
+  }
+
+  private def logCleanUpTest(waitForCompletion: Boolean): Unit = {
 // Write data with manager, recover with new manager and verify
 val manualClock = new ManualClock
 val dataToWrite = generateRandomData()
 manager = writeDataUsingManager(testDir, dataToWrite, manualClock, 
stopManager = false)
 val logFiles = getLogFilesInDirectory(testDir)
 assert(logFiles.size  1)
-manager.cleanupOldLogs(manualClock.currentTime() / 2)
-eventually(timeout(1 second), interval(10 milliseconds)) {
+
+// To avoid code repeat
+def cleanUpAndVerify(): Unit = {
+  manager.cleanupOldLogs(manualClock.currentTime() / 2, 
waitForCompletion)
--- End diff --

This call should be made only once (as in real use), instead of being 
called multiple times from within a `eventually` block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4790][STREAMING] Fix ReceivedBlockTrack...

2014-12-25 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3726#discussion_r22276054
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/receiver/ReceivedBlockHandler.scala
 ---
@@ -178,7 +178,7 @@ private[streaming] class WriteAheadLogBasedBlockHandler(
   }
 
   def cleanupOldBlock(threshTime: Long) {
--- End diff --

Actually, mind renaming this method to `cleanupOldBlocks`? Realized that it 
was inconsistent with `cleanupOldBatches` and `cleanupOldLogs`. I know its not 
your code :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4790][STREAMING] Fix ReceivedBlockTrack...

2014-12-25 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/3726#issuecomment-68116279
  
Looking good, except one (and one optional) comment in the testsuite. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3795#issuecomment-68116658
  
@aarondav Good idea, I'll make that change.

Note that we can't do a similar fix for arrays: many PairRDDFunctions 
methods rely on being able to use keys to index into hashmaps, and that will 
involve the arrays' Object.hashCode.  Therefore, we should probably strengthen 
the warnings for array into errors, since there's a high likelihood that users 
will get incorrect results.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

2014-12-25 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3795#issuecomment-68117017
  
  [Test build #24825 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24825/consoleFull)
 for   PR 3795 at commit 
[`1cd87e0`](https://github.com/apache/spark/commit/1cd87e051c72ff7c17ee7c3aa8b9fd507167cdad).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Raise exception when hashing Java...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3795#issuecomment-68117045
  
Alright, I've updated this to support Enums as @aarondav has described and 
have strengthened the array error-checking to prohibit most uses of arrays as 
keys in PairRDDFunctions, even when using a custom partitioner.  In order to 
properly handle those cases, we would need to make sure that the hashmaps that 
we use for aggregation will perform special-case hashing of arrays.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3847] Use portable hashcode for Java en...

2014-12-25 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3795#issuecomment-68117310
  
I just realized that some of this error-checking for array might not work 
for Java API users due to type erasure / fake class manifests.  If that's the 
case, we might want to just move the check to runtime in HashPartitioner and 
throw an exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4970Fix an implicit bug in SparkSubmitSu...

2014-12-25 Thread maropu

GitHub user maropu opened a pull request:

https://github.com/apache/spark/pull/3805

SPARK-4970Fix an implicit bug in SparkSubmitSuite



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/spark SparkSubmitBugFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/3805.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3805


commit 41ede0ee67f77e09f2abe96c981167ed671e0504
Author: Takeshi Yamamuro linguin@gmail.com
Date:   2014-12-25T13:57:49Z

Fix an implicit bug in SparkSubmitSuite




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...

2014-12-25 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/3805#issuecomment-68117570
  
The test 'includes jars passed in through --jarsâ in SparkSubmitSuite 
fails
when spark.executor.memory is set at over 512MiB in conf/spark-default.conf.
An exception is thrown as follows:
Exception in thread main org.apache.spark.SparkException: Asked to launch 
cluster with 512 MB RAM / worker but requested 1024 MB/worker
at 
org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1889)
at org.apache.spark.SparkContext.init(SparkContext.scala:322)
at org.apache.spark.deploy.JarCreationTest$.main(SparkSubmitSuite.scala:458)
at org.apache.spark.deploy.JarCreationTest.main(SparkSubmitSuite.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:367)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4970] Fix an implicit bug in SparkSubmi...

2014-12-25 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3805#issuecomment-68117596
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4937][SQL] Normalizes conjunctions and ...

2014-12-25 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/3784#issuecomment-68117927
  
Hi, @liancheng, my PR originally also not limited to Filter, i used 
```transformExpressionsDown``` from my first version, the tittle of my first 
version is not accurate:)   


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [WIP][SPARK-4937][SQL] Adding optimization to ...

2014-12-25 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/3778#discussion_r22276984
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -293,6 +295,380 @@ object OptimizeIn extends Rule[LogicalPlan] {
   }
 }
 
+object ConditionSimplification extends Rule[LogicalPlan] {
+  import BinaryComparison.LiteralComparison
+
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case q: LogicalPlan = q transformExpressionsDown  {
+  case origin: CombinePredicate =
+origin.toOptimized
+}
+  }
+
+  type SplitFragments = Map[Expression, Option[Expression]]
+
+  implicit class CombinePredicateExtension(source: CombinePredicate) {
+def find(goal: Expression): Boolean = {
+  def delegate(child: Expression): Boolean = (child, goal) match {
+case (combine: CombinePredicate, _) =
+  isSameCombinePredicate(source, combine)  combine.find(goal)
+
+ // if left child is a literal
+ // LiteralComparison's unapply method change the literal and 
attribute position
+case (LiteralComparison(childComparison), 
LiteralComparison(goalComparison)) =
+  isSame(childComparison, goalComparison)
+
+case other =
+  isSame(child, goal)
+  }
+
+  // using method to avoid right side compute if left side is true
+  val leftResult = () = delegate(source.left)
+  val rightResult = () = delegate(source.right)
+  leftResult() || rightResult()
+}
+
+@inline
+def isOrPredicate: Boolean = {
+  source.isInstanceOf[Or]
+}
+
+// create a new combine predicate that has the same combine operator 
as this
+@inline
+def build(left: Expression, right: Expression): CombinePredicate = {
+  CombinePredicate(left, right, isOrPredicate)
+}
+
+// swap left child and right child
+@inline
+def swap: CombinePredicate = {
+  source.build(source.right, source.left)
+}
+
+def toOptimized: Expression = source match {
+  // one CombinePredicate, left equals right , drop right, keep left
+  // examples: a  a = a, a || a = a
+  case CombinePredicate(left, right) if left.fastEquals(right) =
+left
+
+  // one CombinePredicate and left and right are both binary comparison
+  // examples: a  2  a  2 = false
+  case origin @ CombinePredicate(LiteralComparison(left), 
LiteralComparison(right)) =
+// left or right maybe change its child position, so rebuild one
+val changed = origin.build(left, right)
+val optimized = changed.mergeComparison
+if (isSame(changed, optimized)) {
+  origin
+} else {
+  optimized
+}
+
+  case origin @ CombinePredicate(left @ CombinePredicate(ll, lr), 
right)
+if isNotCombinePredicate(ll, lr, right) =
+val leftOptimized = left.toOptimized
+if (isSame(left, leftOptimized)) {
+  if (isSame(ll, right) || isSame(lr, right)) {
+if (isSameCombinePredicate(origin, left)) leftOptimized else 
right
+  } else {
+val llRight = origin.build(ll, right)
+val lrRight = origin.build(lr, right)
+val llRightOptimized = llRight.toOptimized
+val lrRightOptimized = lrRight.toOptimized
+if (isSame(llRight, llRightOptimized)  isSame(lrRight, 
lrRightOptimized)) {
+  origin
+} else if ((isNotCombinePredicate(llRightOptimized, 
lrRightOptimized))
+  || isSameCombinePredicate(origin, left)) {
+  left.build(llRightOptimized, lrRightOptimized).toOptimized
+} else if (llRightOptimized.isLiteral || 
lrRightOptimized.isLiteral) {
+  left.build(llRightOptimized, lrRightOptimized)
+} else {
+  origin
+}
+  }
+} else if (isNotCombinePredicate(leftOptimized)) {
+  origin.build(leftOptimized, right).toOptimized
+} else {
+  origin
+}
+
+  case origin @ CombinePredicate(left, right @ CombinePredicate(left2, 
right2))
+if isNotCombinePredicate(left, left2, right2) =
+val changed = origin.swap
+val optimized = changed.toOptimized
+if (isSame(changed, optimized)) {
+  origin
+} else {
+  optimized
+}
+
+  // do optimize like : (a || b || c)   a = a, here a, b , c is a 
condition
+  case origin @ CombinePredicate(left: CombinePredicate,

[GitHub] spark pull request: [SPARK-4950] Delete obsolete mapReduceTripelet...

2014-12-25 Thread maropu

Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/3782#issuecomment-68118211
  
Understood. I got back to the old Pregel API.
And also, I'll check #1217 later :))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-4608][Streaming] Reorganize StreamingCo...

2014-12-25 Thread zsxwing

Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/3464#discussion_r22277047
  
--- Diff: 
streaming/src/test/scala/org/apache/spark/streamingtest/ImplicitSuite.scala ---
@@ -0,0 +1,35 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.streamingtest
+
+/**
+ * A test suite to make sure all `implicit` functions work correctly.
+ *
+ * As `implicit` is a compiler feature, we don't need to run this class.
+ * What we need to do is making the compiler happy.
+ */
+class ImplicitSuite {
+
+  // We only want to test if `implict` works well with the compiler, so we 
don't need a real DStream.
+  def mockDStream[T]: org.apache.spark.streaming.dstream.DStream[T] = null
+
+  def testToPairDStreamFunctions(): Unit = {
+val rdd: org.apache.spark.streaming.dstream.DStream[(Int, Int)] = 
mockDStream
--- End diff --

Good catch. Done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 148 matches

Mail list logo