[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-04 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/10089


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-04 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-192531351
  
LGTM. Merging to master. Thanks @JasonMWhite and @koeninger 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-02 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191290624
  
LGTM  

Thanks for following up on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-02 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191159802
  
All comments addressed, builds cleanly, all tests passing. GTM?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191159038
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191159043
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52304/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191158412
  
**[Test build #52304 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52304/consoleFull)**
 for PR 10089 at commit 
[`a7a0877`](https://github.com/apache/spark/commit/a7a08771eb9e7fa9d15ff739c0da89fd193f8ed3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191127051
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191127059
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52299/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191126457
  
**[Test build #52299 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52299/consoleFull)**
 for PR 10089 at commit 
[`0a78e8d`](https://github.com/apache/spark/commit/0a78e8d48a55df4a51e492451f73e8d95094bdba).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-01 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191113140
  
`DirectKafkaStreamSuite` passes all tests for me locally, but the test 
failure above appeared to be on an outdated sha.

Addressed @zsxwing's comments also. Waiting for the test build to complete.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191107974
  
**[Test build #52304 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52304/consoleFull)**
 for PR 10089 at commit 
[`a7a0877`](https://github.com/apache/spark/commit/a7a08771eb9e7fa9d15ff739c0da89fd193f8ed3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-191081614
  
**[Test build #52299 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52299/consoleFull)**
 for PR 10089 at commit 
[`0a78e8d`](https://github.com/apache/spark/commit/0a78e8d48a55df4a51e492451f73e8d95094bdba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-29 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-190304130
  
@JasonMWhite looks like this failed the "offset recovery" test in 
DirectKafkaStreamSuite.  Are you able to reproduce that test failure locally?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-186113150
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-186113151
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51525/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-186112872
  
**[Test build #51525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51525/consoleFull)**
 for PR 10089 at commit 
[`f19f746`](https://github.com/apache/spark/commit/f19f7465c959505d7af56a375400b202e01865ef).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-186084779
  
**[Test build #51525 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51525/consoleFull)**
 for PR 10089 at commit 
[`f19f746`](https://github.com/apache/spark/commit/f19f7465c959505d7af56a375400b202e01865ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-18 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-186083123
  
@JasonMWhite thanks, looks great except some nits.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-18 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r53426791
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -430,6 +457,27 @@ class DirectKafkaStreamSuite
   rdd.asInstanceOf[KafkaRDD[K, V, _, _, (K, V)]].offsetRanges
 }.toSeq.sortBy { _._1 }
   }
+
+  private def getDirectKafkaStream(topic: String, mockRateController: 
Option[RateController]) = {
+val batchIntervalMilliseconds = 100
+
+val sparkConf = new SparkConf()
+  // Safe, even with streaming, because we're using the direct API.
--- End diff --

Remove this comment. It's a bit misleading. Since the tests won't start the 
StreamingContext, the core number is irrelevant.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-18 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r53426422
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaTestUtils.scala
 ---
@@ -152,12 +152,15 @@ private[kafka] class KafkaTestUtils extends Logging {
   }
 
   /** Create a Kafka topic and wait until it is propagated to the whole 
cluster */
-  def createTopic(topic: String): Unit = {
-AdminUtils.createTopic(zkClient, topic, 1, 1)
+  def createTopic(topic: String, partitions: Int = 1): Unit = {
--- End diff --

Could you remove the default value `1` since you added the overload version?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-18 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-186080968
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-18 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-186037636
  
It looks like this PySpark unit test failure above was prior to my commit 
to add a backwards-compatible single-argument version as suggested. Could 
someone kick it again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185545377
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51457/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185545374
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185545031
  
**[Test build #51457 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51457/consoleFull)**
 for PR 10089 at commit 
[`7a5dad3`](https://github.com/apache/spark/commit/7a5dad38e705c79a13200b23749e347cd782b02f).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185526582
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51463/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185526580
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185524841
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51459/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185524838
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185521431
  
A single-argument version is easy enough, and good to support 
backwards-compatibility anyway. I haven't been able to get PySpark tests 
running locally yet, so I'm afraid that's a blind fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185519914
  
**[Test build #51457 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51457/consoleFull)**
 for PR 10089 at commit 
[`7a5dad3`](https://github.com/apache/spark/commit/7a5dad38e705c79a13200b23749e347cd782b02f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185518134
  
```
py4j.Py4JException: Method createTopic([class java.lang.String]) does not 
exist
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:335)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:344)
at py4j.Gateway.invoke(Gateway.java:279)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
```

From Java clients' POV, the signature of `createTopic()` in KafkaTestUtils 
has changed, so you'll need to explicitly provide the extra argument from the 
Python caller or will need to add back a single-argument constructor for 
backwards-compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185517414
  
Changes addressed, but looks like it's causing PySpark unit tests to fail. 
Investigating...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread JasonMWhite
Github user JasonMWhite commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r53265282
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -353,10 +353,52 @@ class DirectKafkaStreamSuite
 ssc.stop()
   }
 
+  test("maxMessagesPerPartition with backpressure disabled") {
+val topic = "maxMessagesPerPartition"
+def getRateController(
--- End diff --

That is so much simpler.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread JasonMWhite
Github user JasonMWhite commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r53265248
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -89,23 +89,32 @@ class DirectKafkaInputDStream[
 
   private val maxRateLimitPerPartition: Int = 
context.sparkContext.getConf.getInt(
   "spark.streaming.kafka.maxRatePerPartition", 0)
-  protected def maxMessagesPerPartition: Option[Long] = {
+
+  protected[streaming] def maxMessagesPerPartition(
+offsets: Map[TopicAndPartition, Long]): Option[Map[TopicAndPartition, 
Long]] = {
--- End diff --

Thanks! I haven't quite got the hang on Scala alignment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185511634
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51451/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185511633
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185511524
  
**[Test build #51451 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51451/consoleFull)**
 for PR 10089 at commit 
[`73e9ae3`](https://github.com/apache/spark/commit/73e9ae3ce7cc14c8ca50d8e385ad2604cfbf76f0).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r53261182
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -430,6 +471,32 @@ class DirectKafkaStreamSuite
   rdd.asInstanceOf[KafkaRDD[K, V, _, _, (K, V)]].offsetRanges
 }.toSeq.sortBy { _._1 }
   }
+
+  private def getDirectKafkaStream(
+  topic: String,
+  getRateController: DirectKafkaInputDStream[String, String, 
StringDecoder,
+ StringDecoder, (String, 
String)]
+=> Option[RateController]) = {
+val batchIntervalMilliseconds = 100
+
+val sparkConf = new SparkConf()
+  // Safe, even with streaming, because we're using the direct API.
+  // Using 1 core is useful to make the test more predictable.
+  .setMaster("local[1]")
+  .setAppName(this.getClass.getSimpleName)
+  .set("spark.streaming.kafka.maxRatePerPartition", "100")
+
+// Setup the streaming context
+ssc = new StreamingContext(sparkConf, 
Milliseconds(batchIntervalMilliseconds))
+
+val earliestOffsets = Map(TopicAndPartition(topic, 0) -> 0L, 
TopicAndPartition(topic, 1) -> 0L)
+val messageHandler = (mmd: MessageAndMetadata[String, String]) => 
(mmd.key, mmd.message)
+new DirectKafkaInputDStream[String, String, StringDecoder,
+StringDecoder, (String, String)](
+  ssc, Map[String, String](), earliestOffsets, messageHandler) {
--- End diff --

nit: 4 spaces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r53261176
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -430,6 +471,32 @@ class DirectKafkaStreamSuite
   rdd.asInstanceOf[KafkaRDD[K, V, _, _, (K, V)]].offsetRanges
 }.toSeq.sortBy { _._1 }
   }
+
+  private def getDirectKafkaStream(
+  topic: String,
+  getRateController: DirectKafkaInputDStream[String, String, 
StringDecoder,
+ StringDecoder, (String, 
String)]
+=> Option[RateController]) = {
+val batchIntervalMilliseconds = 100
+
+val sparkConf = new SparkConf()
+  // Safe, even with streaming, because we're using the direct API.
+  // Using 1 core is useful to make the test more predictable.
+  .setMaster("local[1]")
+  .setAppName(this.getClass.getSimpleName)
+  .set("spark.streaming.kafka.maxRatePerPartition", "100")
+
+// Setup the streaming context
+ssc = new StreamingContext(sparkConf, 
Milliseconds(batchIntervalMilliseconds))
+
+val earliestOffsets = Map(TopicAndPartition(topic, 0) -> 0L, 
TopicAndPartition(topic, 1) -> 0L)
+val messageHandler = (mmd: MessageAndMetadata[String, String]) => 
(mmd.key, mmd.message)
+new DirectKafkaInputDStream[String, String, StringDecoder,
+StringDecoder, (String, String)](
--- End diff --

nit: can be fit into one line


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r53260582
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -353,10 +353,52 @@ class DirectKafkaStreamSuite
 ssc.stop()
   }
 
+  test("maxMessagesPerPartition with backpressure disabled") {
+val topic = "maxMessagesPerPartition"
+def getRateController(
--- End diff --

`getRateController` is not necessary. You can just pass a rate controller 
to `getDirectKafkaStream` directly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r53260346
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -89,23 +89,32 @@ class DirectKafkaInputDStream[
 
   private val maxRateLimitPerPartition: Int = 
context.sparkContext.getConf.getInt(
   "spark.streaming.kafka.maxRatePerPartition", 0)
-  protected def maxMessagesPerPartition: Option[Long] = {
+
+  protected[streaming] def maxMessagesPerPartition(
+offsets: Map[TopicAndPartition, Long]): Option[Map[TopicAndPartition, 
Long]] = {
--- End diff --

nit: 4 spaces


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185479065
  
**[Test build #51451 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51451/consoleFull)**
 for PR 10089 at commit 
[`73e9ae3`](https://github.com/apache/spark/commit/73e9ae3ce7cc14c8ca50d8e385ad2604cfbf76f0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-17 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-185478351
  
I added unit tests for `maxMessagesPerPartition` to cover 3 edge cases that 
have been raised here:
- when backpressure is disabled, simply uses the `maxRatePerPartition`
- when there is no lag, returns `None`
- when backpressure calculations returns higher than `maxRatePerPartition`, 
respects `maxRatePerPartition`

The normal test, using the fake kafka test data, exercises 
`maxMessagePerPartition` in the normal case, i.e. backpressure is enabled, 
there is lag, backpressure is lower than the max rate per partition.

I've also moved the MimaExcludes to the 2.0 section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184920176
  
> Sorry, did I put the MiMa exclusion in the wrong section of the file?

Then should be in 2.0 section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184919738
  
@JasonMWhite could you add a unit test for `maxMessagesPerPartition`? Since 
it's protected, you should be able to test it using some faking data.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread zsxwing
Github user zsxwing commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r53098287
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -89,23 +89,31 @@ class DirectKafkaInputDStream[
 
   private val maxRateLimitPerPartition: Int = 
context.sparkContext.getConf.getInt(
   "spark.streaming.kafka.maxRatePerPartition", 0)
-  protected def maxMessagesPerPartition: Option[Long] = {
+  protected def maxMessagesPerPartition(leaderOffsets: 
Map[TopicAndPartition, LeaderOffset])
+: Option[Map[TopicAndPartition, Long]] = {
 val estimatedRateLimit = rateController.map(_.getLatestRate().toInt)
-val numPartitions = currentOffsets.keys.size
-
-val effectiveRateLimitPerPartition = estimatedRateLimit
-  .filter(_ > 0)
-  .map { limit =>
-if (maxRateLimitPerPartition > 0) {
-  Math.min(maxRateLimitPerPartition, (limit / numPartitions))
-} else {
-  limit / numPartitions
+
+// calculate a per-partition rate limit based on current lag
+val effectiveRateLimitPerPartition = estimatedRateLimit.filter(_ > 0) 
match {
+  case Some(rate) =>
+val lagPerPartition = leaderOffsets.map { case (tp, lo) =>
+  tp -> Math.max(lo.offset - currentOffsets(tp), 0)
+}
+val totalLag = lagPerPartition.values.sum.toFloat
--- End diff --

nit: If `lagPerPartition.values.sum` is 0, you can just return `None`. Then 
`Math.max(totalLag, 1)` can be changed to `totalLag`. E.g.,

```
val totalLag = lagPerPartition.values.sum
if (totalLag == 0) {
  return None
}
...
val backpressureRate = Math.round(lag.toFloat / totalLag * rate)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184916602
  
Sorry, did I put the MiMa exclusion in the wrong section of the file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184912435
  
**[Test build #51390 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51390/consoleFull)**
 for PR 10089 at commit 
[`f7ffd6f`](https://github.com/apache/spark/commit/f7ffd6f4b148ca56927e553970f464c907ced38a).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184912515
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51390/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184912512
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184907112
  
**[Test build #51390 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51390/consoleFull)**
 for PR 10089 at commit 
[`f7ffd6f`](https://github.com/apache/spark/commit/f7ffd6f4b148ca56927e553970f464c907ced38a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184905769
  
ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184901629
  
I think that should be ok.

On Tue, Feb 16, 2016 at 4:29 PM, Jason White 
wrote:

> @koeninger  I've added the two error
> messages from Jenkins to the MimaExcludes, under the v2.0 section. There
> was one from the test function, but this PR also changes the signature of
> the protected method maxMessagesPerPartition.
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-184900516
  
@koeninger I've added the two error messages from Jenkins to the 
MimaExcludes, under the v2.0 section. There was one from the test function, but 
this PR also changes the signature of the protected method 
`maxMessagesPerPartition`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-182008116
  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-182007914
  
Sure, it's in project/MimaExcludes.scala.  You should be able to match up
the problem type and class in the error message from jenkins when adding an
appropriate ProblemFilters.exclude line.

On Tue, Feb 9, 2016 at 12:22 PM, Jason White 
wrote:

> I'm not sure how to handle the mima test failure, could you point me to
> where to add the exclude? I'll rebase and add the exception.
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-182002607
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50983/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-182002605
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-182002520
  
**[Test build #50983 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50983/consoleFull)**
 for PR 10089 at commit 
[`b58d517`](https://github.com/apache/spark/commit/b58d51767f0370c65fa65dfb15416b75fa914d05).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-181997614
  
**[Test build #50983 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50983/consoleFull)**
 for PR 10089 at commit 
[`b58d517`](https://github.com/apache/spark/commit/b58d51767f0370c65fa65dfb15416b75fa914d05).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-181991672
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-181990835
  
I'm not sure how to handle the mima test failure, could you point me to 
where to add the exclude? I'll rebase and add the exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-181964379
  
This looked like a good patch, did it just fall through the cracks?  The 
mima test failure was probably just due to the change in signature of 
KafkaTestUtils.createTopic, it should be ok to add an exclude for that since 
it's a testing class.

Jason if you don't have time to deal with it let me know + I can fix it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165545678
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165545581
  
**[Test build #47938 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47938/consoleFull)**
 for PR 10089 at commit 
[`b58d517`](https://github.com/apache/spark/commit/b58d51767f0370c65fa65dfb15416b75fa914d05).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165545679
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47938/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165540920
  
**[Test build #47938 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47938/consoleFull)**
 for PR 10089 at commit 
[`b58d517`](https://github.com/apache/spark/commit/b58d51767f0370c65fa65dfb15416b75fa914d05).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165539557
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165536634
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47935/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165536632
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165531887
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165528447
  
The console output seems like it's available without logging in, and looks 
like a jenkins issue rather than an actual test failure:

GitHub pull request #10089 of commit 
b58d51767f0370c65fa65dfb15416b75fa914d05 automatically merged.
[EnvInject] - Loading node environment variables.
Building remotely on amp-jenkins-worker-07 (centos spark-test) in workspace 
/home/jenkins/workspace/SparkPullRequestBuilder
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/apache/spark.git # 
timeout=10
Fetching upstream changes from https://github.com/apache/spark.git
 > git --version # timeout=10
 > git fetch --tags --progress https://github.com/apache/spark.git 
+refs/pull/10089/*:refs/remotes/origin/pr/10089/* # timeout=15
Build was aborted
Aborted by anonymous
ERROR: Step ?Archive the artifacts? failed: no workspace for 
SparkPullRequestBuilder #47665
ERROR: Step ?Publish JUnit test result report? failed: no workspace for 
SparkPullRequestBuilder #47665
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47665/
Test FAILed.
ERROR: amp-jenkins-worker-07 is offline; cannot locate JDK 7u60



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-165518544
  
Is this a CI failure? I don't have access rights to see what happened. I'd 
like to get this ready to merge, could someone give me a hand?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-164539536
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-164539550
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47665/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-14 Thread zsxwing
Github user zsxwing commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-164527592
  
Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-11 Thread JasonMWhite
Github user JasonMWhite commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r47363415
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -364,8 +365,8 @@ class DirectKafkaStreamSuite
 
 val batchIntervalMilliseconds = 100
 val estimator = new ConstantEstimator(100)
-val messageKeys = (1 to 200).map(_.toString)
-val messages = messageKeys.map((_, 1)).toMap
+val messages = Map("foo" -> 200)
+kafkaTestUtils.sendMessages(topic, messages)
--- End diff --

In the original code, as I understand it, since the Kafka test setup wasn't 
cleared/reinitialized between testing rounds, only the first batch of 200 
messages, produced in the first of 3 test rounds, was ever consumed. The other 
testing rounds produced messages that were never used. My changes aside, I 
think moving the test message generation outside of the individual test rounds 
makes the most sense.

This failure scenario depended on imbalanced Kafka partitions, would you 
prefer to see tests on both a balanced and an imbalanced scenario?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-10 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r47289494
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -364,8 +365,8 @@ class DirectKafkaStreamSuite
 
 val batchIntervalMilliseconds = 100
 val estimator = new ConstantEstimator(100)
-val messageKeys = (1 to 200).map(_.toString)
-val messages = messageKeys.map((_, 1)).toMap
+val messages = Map("foo" -> 200)
+kafkaTestUtils.sendMessages(topic, messages)
--- End diff --

Oh yah only for testing - this was in response to switching the test to all 
messages on a single partition (which seemed limiting for testing code which 
changes us to handling each partitions back pressure instead of a single 
global).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-10 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r47286143
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -364,8 +365,8 @@ class DirectKafkaStreamSuite
 
 val batchIntervalMilliseconds = 100
 val estimator = new ConstantEstimator(100)
-val messageKeys = (1 to 200).map(_.toString)
-val messages = messageKeys.map((_, 1)).toMap
+val messages = Map("foo" -> 200)
+kafkaTestUtils.sendMessages(topic, messages)
--- End diff --

Not sure if you're just talking about for testing, or for the main code.
For production use the complication is that how messages are partitioned
into kafka is configurable at the time you're producing the messages, so
configuration of spark partitioner would have to match.

On Thu, Dec 10, 2015 at 2:14 PM, Holden Karau 
wrote:

> In
> 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
> :
>
> > @@ -364,8 +365,8 @@ class DirectKafkaStreamSuite
> >
> >  val batchIntervalMilliseconds = 100
> >  val estimator = new ConstantEstimator(100)
> > -val messageKeys = (1 to 200).map(_.toString)
> > -val messages = messageKeys.map((_, 1)).toMap
> > +val messages = Map("foo" -> 200)
> > +kafkaTestUtils.sendMessages(topic, messages)
>
> While I haven't done much work with Kafka, it seems like we could maybe
> explicitly specify the partitioner for producer in the
> producerConfiguration to be round robin if we wanted to (although that
> requires some custom code from what I can tell) or rotating the partition
> key.
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-10 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r47278574
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -364,8 +365,8 @@ class DirectKafkaStreamSuite
 
 val batchIntervalMilliseconds = 100
 val estimator = new ConstantEstimator(100)
-val messageKeys = (1 to 200).map(_.toString)
-val messages = messageKeys.map((_, 1)).toMap
+val messages = Map("foo" -> 200)
+kafkaTestUtils.sendMessages(topic, messages)
--- End diff --

While I haven't done much work with Kafka, it seems like we could maybe 
explicitly specify the partitioner for producer in the producerConfiguration to 
be round robin if we wanted to (although that requires some custom code from 
what I can tell) or rotating the partition key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-10 Thread mrszg
Github user mrszg commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-163532836
  
Original code (before this patch) has serious error - it doesn't respect 
maxRateLimitPerPartition in case when backpressure rate is smaller than number 
of partitions. In such case effectiveRateLimitPerPartition equals 0 (because 
limit / numPartitions == 0) and in consequence all messages from topic are 
consumed at once. According my tests this patch solves this error.  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-09 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-163427146
  
Could someone verify the patch please? @tdas perhaps?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-07 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-162562042
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-05 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-162260106
  
`maxRatePerPartition` now respects the limit set by 
`maxRateLimitPerPartition`, if it is set. Let me know if you think any 
additional tests are needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-05 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-162252793
  
Sounds good. If you can add that maxRatePerPartition handling this would be
ready to go from my point of view
On Dec 5, 2015 3:59 PM, "Jason White"  wrote:

> This patch solved our skew problem. Below is a 15-minute snapshot of our
> lag earlier this week, showing a single partition getting slowly worse. It
> would get to about 8 million messages behind overnight.
>
> https://monosnap.com/file/VG0rXFZn05bKIIDLFIHOxfNOWKOxy5
>
> And the most recent 24 hour period, after this patch went live on our
> system.
>
> https://monosnap.com/file/DMSzLP8lFKJ0DxHuiWcxZHKqVeJHWL
>
> —
> Reply to this email directly or view it on GitHub
> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-05 Thread JasonMWhite
Github user JasonMWhite commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-162250749
  
This patch solved our skew problem. Below is a 15-minute snapshot of our 
lag earlier this week, showing a single partition getting slowly worse. It 
would get to about 8 million messages behind overnight.

https://monosnap.com/file/VG0rXFZn05bKIIDLFIHOxfNOWKOxy5

And the most recent 24 hour period, after this patch went live on our 
system.

https://monosnap.com/file/DMSzLP8lFKJ0DxHuiWcxZHKqVeJHWL


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-05 Thread JasonMWhite
Github user JasonMWhite commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r46763463
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -89,23 +89,29 @@ class DirectKafkaInputDStream[
 
   private val maxRateLimitPerPartition: Int = 
context.sparkContext.getConf.getInt(
   "spark.streaming.kafka.maxRatePerPartition", 0)
-  protected def maxMessagesPerPartition: Option[Long] = {
+  protected def maxMessagesPerPartition(leaderOffsets: 
Map[TopicAndPartition, LeaderOffset])
+: Option[Map[TopicAndPartition, Long]] = {
 val estimatedRateLimit = rateController.map(_.getLatestRate().toInt)
-val numPartitions = currentOffsets.keys.size
-
-val effectiveRateLimitPerPartition = estimatedRateLimit
-  .filter(_ > 0)
-  .map { limit =>
-if (maxRateLimitPerPartition > 0) {
-  Math.min(maxRateLimitPerPartition, (limit / numPartitions))
-} else {
-  limit / numPartitions
+
+// calculate a per-partition rate limit based on current lag
+val effectiveRateLimitPerPartition = estimatedRateLimit.filter(_ > 0) 
match {
+  case Some(rate) =>
+val lagPerPartition = leaderOffsets.map { case (tp, lo) =>
+  tp -> Math.max(lo.offset - currentOffsets(tp), 0)
+}
+val totalLag = lagPerPartition.values.sum.toFloat
+
+lagPerPartition.map { case (tp, lag) =>
+  tp -> Math.round(lag / Math.max(totalLag, 1) * rate)
--- End diff --

The current test puts all events on a single partition, while the other 
partition has no events, and therefore no lag. I could add a test for the 
boundary condition of no lag on any partitions, also one for where the current 
offset is less than the latest offset (can happen in unclean election 
situations).

Definitely needs to respect `maxRatePerPartition` even when backpressure is 
working, so that needs some adjustment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread JasonMWhite
Github user JasonMWhite commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r46449516
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -36,9 +36,10 @@ import org.scalatest.concurrent.Eventually
 
 import org.apache.spark.{Logging, SparkConf, SparkContext, SparkFunSuite}
 import org.apache.spark.rdd.RDD
-import org.apache.spark.streaming.{Milliseconds, StreamingContext, Time}
+import org.apache.spark.streaming.{kafka, Milliseconds, StreamingContext, 
Time}
--- End diff --

yes, sorry about that, I'll fix that up


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread JasonMWhite
Github user JasonMWhite commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r46449571
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -89,23 +89,29 @@ class DirectKafkaInputDStream[
 
   private val maxRateLimitPerPartition: Int = 
context.sparkContext.getConf.getInt(
   "spark.streaming.kafka.maxRatePerPartition", 0)
-  protected def maxMessagesPerPartition: Option[Long] = {
+  protected def maxMessagesPerPartition(leaderOffsets: 
Map[TopicAndPartition, LeaderOffset])
+: Option[Map[TopicAndPartition, Long]] = {
 val estimatedRateLimit = rateController.map(_.getLatestRate().toInt)
-val numPartitions = currentOffsets.keys.size
-
-val effectiveRateLimitPerPartition = estimatedRateLimit
-  .filter(_ > 0)
-  .map { limit =>
-if (maxRateLimitPerPartition > 0) {
-  Math.min(maxRateLimitPerPartition, (limit / numPartitions))
-} else {
-  limit / numPartitions
+
+// calculate a per-partition rate limit based on current lag
+val effectiveRateLimitPerPartition = estimatedRateLimit.filter(_ > 0) 
match {
+  case Some(rate) =>
+val lagPerPartition = leaderOffsets.map { case (tp, lo) =>
+  tp -> Math.max(lo.offset - currentOffsets(tp), 0)
+}
+val totalLag = lagPerPartition.values.sum.toFloat
+
+lagPerPartition.map { case (tp, lag) =>
+  tp -> Math.round(lag / Math.max(totalLag, 1) * rate)
--- End diff --

Good point!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread JasonMWhite
Github user JasonMWhite commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r46449467
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -364,8 +365,8 @@ class DirectKafkaStreamSuite
 
 val batchIntervalMilliseconds = 100
 val estimator = new ConstantEstimator(100)
-val messageKeys = (1 to 200).map(_.toString)
-val messages = messageKeys.map((_, 1)).toMap
+val messages = Map("foo" -> 200)
+kafkaTestUtils.sendMessages(topic, messages)
--- End diff --

The kafka messages don't actually need to be sent three times, just once is 
sufficent for all tests. When I send "foo" 200 times in one batch, they all go 
to the same partition. However, when I do this 3 times (for each of 100, 50, 
20), the batches of 200 go to a random partition each time. I suspect something 
in how the test kafka cluster does the partitioning.

I was usually getting 200 on 1 partitions, and 400 on the other 2. I was 
explicitly changing the test case to "all the messages are on one partition" 
since I can't control the split deterministically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r46438227
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -89,23 +89,29 @@ class DirectKafkaInputDStream[
 
   private val maxRateLimitPerPartition: Int = 
context.sparkContext.getConf.getInt(
   "spark.streaming.kafka.maxRatePerPartition", 0)
-  protected def maxMessagesPerPartition: Option[Long] = {
+  protected def maxMessagesPerPartition(leaderOffsets: 
Map[TopicAndPartition, LeaderOffset])
+: Option[Map[TopicAndPartition, Long]] = {
 val estimatedRateLimit = rateController.map(_.getLatestRate().toInt)
-val numPartitions = currentOffsets.keys.size
-
-val effectiveRateLimitPerPartition = estimatedRateLimit
-  .filter(_ > 0)
-  .map { limit =>
-if (maxRateLimitPerPartition > 0) {
-  Math.min(maxRateLimitPerPartition, (limit / numPartitions))
-} else {
-  limit / numPartitions
+
+// calculate a per-partition rate limit based on current lag
+val effectiveRateLimitPerPartition = estimatedRateLimit.filter(_ > 0) 
match {
+  case Some(rate) =>
+val lagPerPartition = leaderOffsets.map { case (tp, lo) =>
+  tp -> Math.max(lo.offset - currentOffsets(tp), 0)
+}
+val totalLag = lagPerPartition.values.sum.toFloat
+
+lagPerPartition.map { case (tp, lag) =>
+  tp -> Math.round(lag / Math.max(totalLag, 1) * rate)
--- End diff --

I also think you should account for what happens when someone has set
spark.streaming.kafka.maxRatePerPartition
in addition to enabling backpressure



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-161352747
  
This generally looks sensible to me, would like to see if it solves your 
issue first. Thanks for working on it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r46431828
  
--- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala
 ---
@@ -89,23 +89,29 @@ class DirectKafkaInputDStream[
 
   private val maxRateLimitPerPartition: Int = 
context.sparkContext.getConf.getInt(
   "spark.streaming.kafka.maxRatePerPartition", 0)
-  protected def maxMessagesPerPartition: Option[Long] = {
+  protected def maxMessagesPerPartition(leaderOffsets: 
Map[TopicAndPartition, LeaderOffset])
+: Option[Map[TopicAndPartition, Long]] = {
 val estimatedRateLimit = rateController.map(_.getLatestRate().toInt)
-val numPartitions = currentOffsets.keys.size
-
-val effectiveRateLimitPerPartition = estimatedRateLimit
-  .filter(_ > 0)
-  .map { limit =>
-if (maxRateLimitPerPartition > 0) {
-  Math.min(maxRateLimitPerPartition, (limit / numPartitions))
-} else {
-  limit / numPartitions
+
+// calculate a per-partition rate limit based on current lag
+val effectiveRateLimitPerPartition = estimatedRateLimit.filter(_ > 0) 
match {
+  case Some(rate) =>
+val lagPerPartition = leaderOffsets.map { case (tp, lo) =>
+  tp -> Math.max(lo.offset - currentOffsets(tp), 0)
+}
+val totalLag = lagPerPartition.values.sum.toFloat
+
+lagPerPartition.map { case (tp, lag) =>
+  tp -> Math.round(lag / Math.max(totalLag, 1) * rate)
--- End diff --

What happens when you only have 1 partition with lag > 0 ?
I think this needs tests for boundary conditions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r46431401
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -364,8 +365,8 @@ class DirectKafkaStreamSuite
 
 val batchIntervalMilliseconds = 100
 val estimator = new ConstantEstimator(100)
-val messageKeys = (1 to 200).map(_.toString)
-val messages = messageKeys.map((_, 1)).toMap
+val messages = Map("foo" -> 200)
+kafkaTestUtils.sendMessages(topic, messages)
--- End diff --

Why was sendMessages moved to here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/10089#discussion_r46431493
  
--- Diff: 
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala
 ---
@@ -36,9 +36,10 @@ import org.scalatest.concurrent.Eventually
 
 import org.apache.spark.{Logging, SparkConf, SparkContext, SparkFunSuite}
 import org.apache.spark.rdd.RDD
-import org.apache.spark.streaming.{Milliseconds, StreamingContext, Time}
+import org.apache.spark.streaming.{kafka, Milliseconds, StreamingContext, 
Time}
--- End diff --

This is just an IDE unnecessarily mucking with imports, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/10089#issuecomment-161188280
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-01 Thread JasonMWhite
GitHub user JasonMWhite opened a pull request:

https://github.com/apache/spark/pull/10089

[SPARK-12073] [Streaming] backpressure rate controller consumes events 
preferentially from lagg…

…ing partitions

I'm pretty sure this is the reason we couldn't easily recover from an 
unbalanced Kafka partition under heavy load when using backpressure.

`maxMessagesPerPartition` calculates an appropriate limit for the message 
rate from all partitions, and then divides by the number of partitions to 
determine how many messages to retrieve per partition. The problem with this 
approach is that when one partition is behind by millions of records (due to 
random Kafka issues), but the rate estimator calculates only 100k total 
messages can be retrieved, each partition (out of say 32) only retrieves max 
100k/32=3125 messages.

This PR (still needing a test) determines a per-partition desired message 
count by using the current lag for each partition to preferentially weight the 
total message limit among the partitions. In this situation, if each partition 
gets 1k messages, but 1 partition starts 1M behind, then the total number of 
messages to retrieve is (32 * 1k + 1M) = 1032000 messages, of which the one 
partition needs 1001000. So, it gets (1001000 / 1032000) = 97% of the 100k 
messages, and the other 31 partitions share the remaining 3%.

Assuming all of 100k the messages are retrieved and processed within the 
batch window, the rate calculator will increase the number of messages to 
retrieve in the next batch, until it reaches a new stable point or the backlog 
is finished processed.

We're going to try deploying this internally at Shopify to see if this 
resolves our issue.

@tdas @koeninger @holdenk 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JasonMWhite/spark rate_controller_offsets

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10089


commit 117ca3c2fbef5d5657bf038474dc5e7c1a588818
Author: Jason White 
Date:   2015-12-01T06:13:49Z

backpressure rate controller consumes events preferentially from lagging 
partitions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org