[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20572#discussion_r169536036 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala --- @@ -22,12 +22,17 @@ import java.{ util =>

[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/20572 @srowen Sorry to turn on the bat-signal, but would you be able to help find a committer willing to look at this? After finally finding someone willing to test in production, don't want

[GitHub] spark issue #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive of...

2018-02-12 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/20572 @zsxwing you have time to review this? It's been a long standing issue. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-10 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/20572 [SPARK-17147][STREAMING][KAFKA] Allow non-consecutive offsets ## What changes were proposed in this pull request? Add a configuration spark.streaming.kafka.allowNonConsecutiveOffsets

[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-24 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19789 Seems reasonable to me but you should probably ask zsxwing if it fits in with plans for the structured streaming kafka code. On Thu, Nov 23, 2017 at 10:23 PM, Hyukjin Kwon <notific

[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-23 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19789 You'll need to get a commiter's attention to merge it anyway On Nov 23, 2017 01:48, "Daroo" <notificati...@github.com> wrote: It seems that your "mag

[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19789 ok to test On Wed, Nov 22, 2017 at 2:49 PM, Daroo <notificati...@github.com> wrote: > Cool. Could you please authorize it for testing? > > — >

[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19789 Seems reasonable. On Wed, Nov 22, 2017 at 1:52 PM, Daroo <notificati...@github.com> wrote: > It fails on the current master branch and doesn't after the patch

[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19789 What are you actually asserting in that test and/or does it reliably fail if run on the version of your code before the patch? On Wed, Nov 22, 2017 at 1:33 PM, Daroo <notific

[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19789 Yeah, subscribepattern could definitely be an issue. As far as unit testing, have you tried anything along the lines of setting the cache size artificially low and then introducing new

[GitHub] spark issue #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsafe evic...

2017-11-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19789 My main comment is that if you have a situation where there's actually contention on the size of the cache, chances are things are going to be screwed up anyway due to consumers being recreated

[GitHub] spark pull request #19789: [SPARK-22562][Streaming] CachedKafkaConsumer unsa...

2017-11-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/19789#discussion_r152056775 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -155,11 +178,11 @@ object

[GitHub] spark issue #19274: [SPARK-22056][Streaming] Add subconcurrency for KafkaRDD...

2017-09-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19274 Search Jira and the mailing list, this idea has been brought up multiple times. I don't think breaking fundamental assumptions of Kafka (one consumer thread per group per partition) is a good

[GitHub] spark issue #19134: [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behin...

2017-09-11 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19134 There's already a jira about why 0.10 doesn't have python support, https://issues-test.apache.org/jira/browse/SPARK-16534

[GitHub] spark issue #19134: [SPARK-21893][BUILD][STREAMING][WIP] Put Kafka 0.8 behin...

2017-09-05 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19134 I think it makes sense to go ahead and put deprecation warnings in this PR as well --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2017-08-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 You won't get any reasonable semantics out of auto commit, because it will commit on the driver without regard to what the executors have done. On Aug 2, 2017 21:46, "Wallace

[GitHub] spark issue #18353: [SPARK-21142][SS] spark-streaming-kafka-0-10 should depe...

2017-06-23 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/18353 I thought the historical reason testutils was in main rather than test was for ease of use by other tests (e.g. python). There's even still a comment in the code to that effect

[GitHub] spark issue #18234: [SPARK-19185][DSTREAM] Make Kafka consumer cache configu...

2017-06-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/18234 LGTM, thanks Mark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #18234: [SPARK-19185][DSTREAM] Make Kafka consumer cache ...

2017-06-07 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/18234#discussion_r120716157 --- Diff: docs/streaming-kafka-0-10-integration.md --- @@ -91,7 +91,7 @@ The new Kafka consumer API will pre-fetch messages into buffers. Therefore it i

[GitHub] spark issue #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaConsumer ...

2017-06-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/18143 Sorry, noticed a couple more minor configuration related things. Otherwise LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaCo...

2017-06-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/18143#discussion_r119636878 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -109,34 +113,42 @@ object

[GitHub] spark pull request #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaCo...

2017-06-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/18143#discussion_r119636950 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -109,34 +113,42 @@ object

[GitHub] spark pull request #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaCo...

2017-06-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/18143#discussion_r119636360 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -109,34 +113,42 @@ object

[GitHub] spark pull request #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaCo...

2017-05-31 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/18143#discussion_r119374298 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -147,20 +155,14 @@ object

[GitHub] spark pull request #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaCo...

2017-05-31 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/18143#discussion_r119371920 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -109,34 +113,38 @@ object

[GitHub] spark pull request #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaCo...

2017-05-31 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/18143#discussion_r119371285 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -109,34 +113,38 @@ object

[GitHub] spark issue #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaConsumer ...

2017-05-30 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/18143 Generally looks ok to me, thanks. 2 questions - Did you do any testing on workloads to see if performance stayed the same? Is there a reason not to do the same thing

[GitHub] spark pull request #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaCo...

2017-05-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/18143#discussion_r119132090 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -310,62 +308,45 @@ private[kafka010

[GitHub] spark pull request #18143: [SPARK-20919][SS] Simplificaiton of CachedKafkaCo...

2017-05-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/18143#discussion_r119131006 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -310,62 +308,45 @@ private[kafka010

[GitHub] spark issue #17308: [SPARK-19968][SS] Use a cached instance of `KafkaProduce...

2017-05-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/17308 @marmbrus @zsxwing @tdas This needs attention from someone --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/17774 LGTM pending jason's comments on tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/17774 Have you read the function def clamp? Rate limit of 1 should not imply an attempt to grab 1 message even if it doesn't exist. On Apr 27, 2017 11:01, "Sebastian

[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/17774 @arzt It's entirely possible to have batch times less than a second, and I'm not sure I agree that the absolute number of messages allowable for a partition should ever be zero. So

[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2017-04-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/17774 How do you read 0.1 of a kafka message for a given partition of a given batch? Ultimately the floor for a rate limit, assuming one is set, needs to be 1 message per partition per batch

[GitHub] spark pull request #17675: [SPARK-20036][DOC] Note incompatible dependencies...

2017-04-18 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/17675 [SPARK-20036][DOC] Note incompatible dependencies on org.apache.kafka artifacts ## What changes were proposed in this pull request? Note that you shouldn't manually add

[GitHub] spark issue #17308: [SPARK-19968][SS] Use a cached instance of `KafkaProduce...

2017-04-12 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/17308 Just to throw in my two cents, a change like this is definitely needed, as is made clear by the second sentence of the docs http://kafka.apache.org/0102/javadoc/index.html?org/apache

[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2017-03-13 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 @zsxwing you or anyone else have time to look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2017-03-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r105720383 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -143,9 +147,16 @@ private[spark

[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2017-03-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 Any interest in picking this back up if you can get a committer's attention? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

2017-01-18 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16629 I don't think it's a problem to make disabling the cache configurable, as long as it's on by default. I don't think the additional static constructors in kafka utils are necessary

[GitHub] spark issue #16569: [SPARK-19206][DOC][DStream] Fix outdated parameter descr...

2017-01-14 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16569 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r91118893 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -143,9 +147,14 @@ private[spark

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r91118458 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -67,6 +67,10 @@ private[spark

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r91001763 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -67,6 +67,10 @@ private[spark

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r91002372 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -143,9 +147,14 @@ private[spark

[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...

2016-11-19 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15820 Because the comment made by me and +1'ed by marmbrus is hidden at this point, I just want to re-iterate that this patch should not skip the rest of the partition in the case that a timeout

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-16 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r88360922 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,129 @@ private[kafka010] case

[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example

2016-11-15 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15849 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #15849: [SPARK-18410][STREAMING] Add structured kafka exa...

2016-11-14 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15849#discussion_r87795091 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredKafkaWordCount.java --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87439798 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,129 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87438222 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,129 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87438788 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,129 @@ private[kafka010] case

[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...

2016-11-08 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15820 Wow, looks like the new github comment interface did all kinds of weird things, apologies about that. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129126 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129981 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129817 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129927 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87127811 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87130059 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87128373 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129204 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-08 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 @zsxwing @rxin the per-partition rate limit probably won't overflow, but the overall backpressure rate limit was being cast to an int, which definitely can overflow. I changed it in this latest

[GitHub] spark pull request #15132: [SPARK-17510][STREAMING][KAFKA] config max rate o...

2016-11-07 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15132#discussion_r86810750 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/PerPartitionConfig.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 @rxin thanks, changed to abstract class. If you think that's sufficient future proofing I otherwise think this is a worthwhile change, seems like it meets a real user need. --- If your project

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 I sent a message to d...@spark.apache.org, if you're not already subscribed I'd say subscribe and follow up in case there's any discussion there rather than on the pr / jira. Someone may want

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-03 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 Ok... so the question at this point is whether it's worth making the API change, which ultimately we'll have to track down a committer to decide. As the PR stands, it shouldn't break

[GitHub] spark issue #15737: [SPARK-18212][SS][KAFKA] increase executor poll timeout

2016-11-03 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15737 Good catch, I just mistakenly changed to AsMS in one place but not both. On the test changes, do you want tests waiting up to 2 minutes * however many kafka calls are being made? If so I

[GitHub] spark pull request #15737: [SPARK-18212][SS][KAFKA] increase executor poll t...

2016-11-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15737#discussion_r86254376 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -88,7 +88,10 @@ private[kafka010] case class

[GitHub] spark issue #15737: [SPARK-18212][SS][KAFKA] increase executor poll timeout

2016-11-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15737 Ok, I'll update the default in both places to use spark.network.timeout and leave the test config at 10 seconds --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #15737: [SPARK-18212][SS][KAFKA] increase executor poll timeout

2016-11-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15737 I will always choose "fail in an obvious way that I can start debugging" versus "start behaving poorly in non-obvious ways". Similar reason I thought it wa

[GitHub] spark issue #15737: [SPARK-18212][SS][KAFKA] increase executor poll timeout

2016-11-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15737 In most cases poll should be returning prefetched data from the buffer, not waiting to talk to kafka over the network. I could see increasing it a little bit, but I don't think

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86203226 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -104,85 +110,105 @@ case class

[GitHub] spark pull request #15737: [SPARK-18212][SS][KAFKA] increase executor poll t...

2016-11-02 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15737 [SPARK-18212][SS][KAFKA] increase executor poll timeout ## What changes were proposed in this pull request? Increase poll timeout to try and address flaky test ## How

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15702 Given the concerns Ofir raised about a single far future event screwing up monotonic event time, do you want to document that problem even if there isn't an enforced filter for it? --- If your

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86066774 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -536,6 +535,41 @@ class Dataset[T] private[sql

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86067616 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -104,85 +110,105 @@ case class

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86066376 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -536,6 +535,41 @@ class Dataset[T] private[sql

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86066082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -536,6 +535,41 @@ class Dataset[T] private[sql

[GitHub] spark issue #15715: [SPARK-18198][Doc][Streaming] Highlight code snippets

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15715 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15681: [SPARK-18176][Streaming][Kafka] Kafka010 .createRDD() sc...

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 I'm agnostic on the value of adding the overload, if @lw-lin thinks it's more convenient for users. There are considerably fewer overloads as it stands than the old 0.8 version of KafkaUtils, so

[GitHub] spark issue #15681: [SPARK-18176][Streaming][Kafka] Kafka010 .createRDD() sc...

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 I don't think there's a reason to deprecate it. ju.Map is the lowest common denominator for kafka params, it's used by the underlying consumer, and it's what the ConsumerStrategy interface

[GitHub] spark issue #15681: [SPARK-18176][Streaming][Kafka] Kafka010 .createRDD() sc...

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 LGTM, thanks. If you want to open a separate PR to cleanup the private doc issues you noticed, go for it, shouldn't need another Jira imho if it isn't changing code --- If your project

[GitHub] spark issue #15681: [SPARK-18176][Streaming][Kafka] Kafka010 .createRDD() sc...

2016-10-31 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 I think this should just be another createRDD overload that takes a scala map. The minor additional maintenance overhead of that method as opposed to change the existing one isn't worth breaking

[GitHub] spark issue #15681: [Minor][Streaming][Kafka] Kafka010 .createRDD() scala AP...

2016-10-30 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 Public API changes even on an experimental module aren't a minor thing, open a Jira. It's also better not to lump unrelated changes together. --- If your project is set up for it, you

[GitHub] spark issue #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] Add java...

2016-10-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15679 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] Add java...

2016-10-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15679 Thanks for working on this, couple minor things to fix but otherwise looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] A...

2016-10-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15679#discussion_r85642029 --- Diff: docs/streaming-kafka-0-10-integration.md --- @@ -165,6 +240,36 @@ For data stores that support transactions, saving offsets in the same

[GitHub] spark pull request #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15626#discussion_r85641502 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala --- @@ -36,8 +36,8 @@ class StreamingQueryException

[GitHub] spark issue #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] Add java...

2016-10-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15679 Were these extracted from compiled example projects, or just written up? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] A...

2016-10-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15679#discussion_r85641432 --- Diff: docs/streaming-kafka-0-10-integration.md --- @@ -120,15 +184,24 @@ Kafka has an offset commit API that stores offsets in a special Kafka topic

[GitHub] spark pull request #15527: [SPARK-17813][SQL][KAFKA] Maximum data per trigge...

2016-10-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15527#discussion_r85253453 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -153,11 +201,7 @@ private[kafka010] case class

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15626 So it looks like this is the json format being used for kafka offsets: [{"_1":{"hash":0,"partition":0,"topic":"t"},"_2":2},{&q

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84378170 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84374872 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84372134 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -232,6 +232,42 @@ private[kafka010] case class

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84371466 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -317,6 +353,8 @@ private[kafka010] case class

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84370308 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed

[GitHub] spark pull request #15570: [STREAMING][KAFKA][DOC] clarify kafka settings ne...

2016-10-20 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15570 [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches ## What changes were proposed in this pull request? Minor doc change to mention kafka configuration for larger

[GitHub] spark pull request #15527: [SPARK-17813][SQL][KAFKA] Maximum data per trigge...

2016-10-17 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15527 [SPARK-17813][SQL][KAFKA] Maximum data per trigger ## What changes were proposed in this pull request? maxOffsetsPerTrigger option for rate limiting, proportionally based on volume

[GitHub] spark issue #15407: [SPARK-17841][STREAMING][KAFKA] drain commitQueue

2016-10-17 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15407 @rxin @tdas right now, items to be committed can be added to the queue, but they will never actually be removed from the queue. poll() removes, iterator() does not. I updated the description

<    1   2   3   4   5   6   7   >