[GitHub] spark pull request #13945: [SPARK-16256][SQL][STREAMING] Added Structured St...

2016-06-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/13945#discussion_r68839249 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -0,0 +1,888 @@ +--- +layout: global +displayTitle: Structured Streaming

[GitHub] spark pull request #13945: [SPARK-16256][SQL][STREAMING] Added Structured St...

2016-06-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/13945#discussion_r68838818 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -0,0 +1,888 @@ +--- +layout: global +displayTitle: Structured Streaming

[GitHub] spark pull request #13945: [SPARK-16256][SQL][STREAMING] Added Structured St...

2016-06-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/13945#discussion_r68835454 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -0,0 +1,888 @@ +--- +layout: global +displayTitle: Structured Streaming

[GitHub] spark pull request #13945: [SPARK-16256][SQL][STREAMING] Added Structured St...

2016-06-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/13945#discussion_r68835399 --- Diff: docs/structured-streaming-programming-guide.md --- @@ -0,0 +1,888 @@ +--- +layout: global +displayTitle: Structured Streaming

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 I'm already logging at error level a message about serializability / suggestion to use map first before persist, which seems like the most common case that will come up. I can do something

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 If they're using the same group id, that's at least plausible, but it's in general not a good idea to run two different applications with the same group id at the same time. On Mon

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 I was going to leave some code comments once I'm not at work & have time to make another patch, but the quick answer is that should be determined in the consumer by auto.offset.r

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 Regarding your general comments 1 + 2 sure, will add 3. The reason ConsumerStrategy is last is because it's much more natural for inline one-off subclasses of it to customize

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-27 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68579465 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/ConsumerStrategy.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-27 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68578250 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/ConsumerStrategy.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-27 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68577783 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/ConsumerStrategy.scala --- @@ -0,0 +1,164 @@ +/* + * Licensed

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 Moved the prefered location and consumer creation strategies to an explicit interface, let me know if that's more usable from your point of view. Scala api looks like

[GitHub] spark pull request #13908: [SPARK-16212][STREAMING][KAFKA] code cleanup from...

2016-06-25 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/13908 [SPARK-16212][STREAMING][KAFKA] code cleanup from review feedback ## What changes were proposed in this pull request? code cleanup in kafka-0-8 to match suggested changes for kafka-0-10

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-25 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 Sorry for the delayed reply, I had travel plans that had to be canceled due to a family emergency (everyone's mostly ok). 1 + 2, I understand that preferred locations

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-23 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 1. When the user has no preferences, the system already does figure out preferred locations, and not in a random way as you claimed. 2. So lets talk concretely, not hypothetically

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 Regarding the possible arguments to preferredHosts, I'm pretty sure you're misunderstanding what happens. There are 3 uses cases here: - I don't care where things run. I pass

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68161842 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,401

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68161572 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,401

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68161054 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,309 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68161026 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,309 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68139170 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,401

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68138575 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,401

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68136246 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,401

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68128869 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/CachedKafkaConsumer.scala --- @@ -0,0 +1,184 @@ +/* + * Licensed

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r68121385 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/CachedKafkaConsumer.scala --- @@ -0,0 +1,184 @@ +/* + * Licensed

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r67878608 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,401

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r67877379 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,309 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r67876883 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,401

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r67875507 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,401

[GitHub] spark pull request #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStre...

2016-06-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r67875307 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,401

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-16 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 @rxin I'm pretty confident that fix addressed the test issue. It's passed twice now, the prior failure was unrelated. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-15 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-15 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 Jenkins, retest this please On Jun 15, 2016 8:43 PM, "UCB AMPLab" <notificati...@github.com> wrote: > Merged build finished. Test FAILed. > > â€

[GitHub] spark issue #11863: [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to ...

2016-06-15 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/11863 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...

2016-05-27 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10953#issuecomment-222164687 I think this PR is fairly out of date regarding things like needing to cache consumers on executors, getPreferredLocations, and a separate subproject for kafka-0.8

[GitHub] spark pull request: [SPARK-15085][Streaming][Kafka] Rename streami...

2016-05-10 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/12946#issuecomment-218355852 @JoshRosen let me know if that fix to pass mima is ok with you. I can add settings for the kafka subproject instead, but this seemed like the most likely place

[GitHub] spark pull request: [SPARK-15085][Streaming][Kafka] Rename streami...

2016-05-10 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/12946#issuecomment-21822 @rxin ping to make sure this doesn't get lost in the shuffle before 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r62336356 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r62336181 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,344

[GitHub] spark pull request: [SPARK-15085][Streaming][Kafka] Rename streami...

2016-05-05 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/12946 [SPARK-15085][Streaming][Kafka] Rename streaming-kafka artifact to in… ## What changes were proposed in this pull request? Renaming the streaming-kafka artifact to include kafka version

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r62284776 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r62284518 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,344

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r62283730 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,344

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r62283685 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,344

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r62283513 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,344

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-05-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r62283462 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,344

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60417031 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,344

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/11863#issuecomment-211466703 As far as the name, I don't really care. If you name it beta, one day it will be GA. If you name it newapi, one day it will be the old api. Stuff's going

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60089951 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/package.scala --- @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60089859 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60089788 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60089646 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60089303 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60089221 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60089024 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60088913 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -0,0 +1,344

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60088654 --- Diff: external/kafka-beta/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,259 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-04-18 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/11863#discussion_r60088571 --- Diff: external/kafka-beta-assembly/pom.xml --- @@ -0,0 +1,186 @@ + + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://w

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-24 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-200995808 So did you verify that doing the transform before the window fixed your issue in your actual code? It certainly does for the minimal example. The other

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-24 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-200977714 If all you care about is counts, why wouldn't you at least just write inputStream.map(x => 1).window(Seconds(slidingWindowInterval), Seconds(intervalSeco

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-24 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-200902045 KafkaRDD doesn't have a storage level. If you don't do any caching, and do multiple actions on a KafkaRDD, it will pull from kafka each time. This is the exact

[GitHub] spark pull request: [SPARK-14105][STREAMING] Deep copy each kafka ...

2016-03-24 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/11921#issuecomment-200877678 I don't know that this PR will do much harm, since it's just one more conditional check and KafkaRDD has a storage level of none by default. But I also don't

[GitHub] spark pull request: [SPARK-12177][Streaming][Kafka] Update KafkaDS...

2016-03-21 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/11863 [SPARK-12177][Streaming][Kafka] Update KafkaDStreams to new Kafka 0.9 Consumer API for discussion, DO NOT MERGE, blocked by SPARK-13877 ## What changes were proposed in this pull

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-03-02 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-191290624 LGTM Thanks for following up on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-13252] [KAFKA] Bump up Kafka to 0.9.0.0

2016-03-01 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/11143#issuecomment-190914907 I really don't think it makes sense to discuss this PR outside of the context of SPARK-12177 and the approach taken for supporting the new consumer. Merging

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-190304130 @JasonMWhite looks like this failed the "offset recovery" test in DirectKafkaStreamSuite. Are you able to reproduce that test failure locally? --- If yo

[GitHub] spark pull request: [SPARK-13252] [KAFKA] Bump up Kafka to 0.9.0.0

2016-02-26 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/11143#issuecomment-189365359 Kafka isn't just a library dependency that a user can try out a new version of on a particular job and see if it works. It's an infrastructure component

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-16 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-184901629 I think that should be ok. On Tue, Feb 16, 2016 at 4:29 PM, Jason White <notificati...@github.com> wrote: > @koeninger <https:

[GitHub] spark pull request: Added missing utility method

2016-02-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/11173#issuecomment-183330790 Kafkacluster methods are public now, you can do this yourself. I think its fine to just have two overloads, the easy mode and the flexible mode. --- If your

[GitHub] spark pull request: Added missing utility method

2016-02-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/11173#issuecomment-183332361 Existing ticket https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-13106 --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-181964379 This looked like a good patch, did it just fall through the cracks? The mima test failure was probably just due to the change in signature

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2016-02-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-182007914 Sure, it's in project/MimaExcludes.scala. You should be able to match up the problem type and class in the error message from jenkins when adding

[GitHub] spark pull request: [SPARK-10963] [Streaming] [Kafka] make KafkaCl...

2016-02-01 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/9007#issuecomment-178008540 @tdas @srowen Can we either get this merged, or officially close off the possibility of merging it? People are still raising specific issues on a fairly

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-167808178 That yahoo benchmark has a lot of issues, they've already been contacted by myself and others as to some obvious errors they made in their spark job

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-17 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-165528447 The console output seems like it's available without logging in, and looks like a jenkins issue rather than an actual test failure: GitHub pull request

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/10089#discussion_r47286143 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala --- @@ -364,8 +365,8 @@ class

[GitHub] spark pull request: [SPARK-12203][STREAMING] Add KafkaDirectInputD...

2015-12-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10197#issuecomment-163302006 The reason direct stream has some latency is because it is figuring out, in advance, on the driver, which offsets are in each partition. That means that all

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-07 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-162562042 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-162252793 Sounds good. If you can add that maxRatePerPartition handling this would be ready to go from my point of view On Dec 5, 2015 3:59 PM, "Jason White" &

[GitHub] spark pull request: [SPARK-12103][Streaming][Kafka][Doc] document ...

2015-12-03 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/10132 [SPARK-12103][Streaming][Kafka][Doc] document that K means Key and V … …means Value You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/10089#discussion_r46431401 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala --- @@ -364,8 +365,8 @@ class

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/10089#discussion_r46431493 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala --- @@ -36,9 +36,10 @@ import

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/10089#issuecomment-161352747 This generally looks sensible to me, would like to see if it solves your issue first. Thanks for working on it. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/10089#discussion_r46431828 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -89,23 +89,29 @@ class

[GitHub] spark pull request: [SPARK-12073] [Streaming] backpressure rate co...

2015-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/10089#discussion_r46438227 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -89,23 +89,29 @@ class

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-15 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156822484 Are you 100% sure that all uses of the partition array only use the index associated with the individual Partition, and not its position in the array

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-13 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156457252 Well, the implementation of HasOffsetRanges.offsetRanges in KafkaRDD is just the val offsetRanges provided on creation, so you're talking about a fair amount

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156290949 https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md#hasoffsetranges The 1:1 correspondence is also mentioned in the spark docs Kafka

[GitHub] spark pull request: [SPARK-11632][Streaming] Filter out empty part...

2015-11-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/9597#issuecomment-156167582 This is going to break the invariant that offset ranges are 1:1 with spark partitions, which will definitely break some people's jobs in a non-obvious manner

[GitHub] spark pull request: [SPARK-10963] [Streaming] [Kafka] make KafkaCl...

2015-10-06 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/9007 [SPARK-10963] [Streaming] [Kafka] make KafkaCluster public You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-10963

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-09-03 Thread koeninger
Github user koeninger closed the pull request at: https://github.com/apache/spark/pull/3543 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-09-02 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-137253615 @andrewor14 master has diverged sufficiently from this PR that I don't think it's useful to keep it merge-able. If we think someone's willing to accept the changes

[GitHub] spark pull request: [SPARK-9786][Streaming][Kafka] fix backpressur...

2015-08-24 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/8413 [SPARK-9786][Streaming][Kafka] fix backpressure so it works with defa… …ult maxRatePerPartition setting of 0 You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [SPARK-9780][Streaming][Kafka] prevent NPE if ...

2015-08-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/8133#issuecomment-132091090 This PR has already been merged to suppress the NPE when someone messes up instantiation On Aug 18, 2015 1:13 AM, Shyam S Kumar notificati...@github.com

[GitHub] spark pull request: [SPARK-9780][Streaming][Kafka] prevent NPE if ...

2015-08-17 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/8133#issuecomment-131859733 KafkaRDD attempts to instantiate the value decoder right before connecting the consumer: val valueDecoder = classTag[T].runtimeClass.getConstructor

[GitHub] spark pull request: [SPARK-9780][Streaming][Kafka] prevent NPE if ...

2015-08-12 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/8133 [SPARK-9780][Streaming][Kafka] prevent NPE if KafkaRDD instantiation … …fails You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger

[GitHub] spark pull request: [Docs][Streaming] make the existing parameter ...

2015-08-06 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/7995 [Docs][Streaming] make the existing parameter docs for OffsetRange ac… …tually visible You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126342142 Added subtasks, changed the title of https://github.com/apache/spark/pull/7772 to refer to the streaming subtask jira ID. Let me know if you see anything

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126161394 Changing this jira to be streaming only and making another for thread safety issues still leaves all the inconsistent calls to new Configuration in SQL, and probably

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126166935 streaming only pr is at https://github.com/apache/spark/pull/7772 --- If your project is set up for it, you can reply to this email and have your reply appear

<    1   2   3   4   5   6   7   >