[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/23148 Cool, opened with what I have so far, will keep an eye out for others for a while. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #23182: Config change followup to [SPARK-26177] Automated...

2018-11-29 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/23182 Config change followup to [SPARK-26177] Automated formatting for Scala code Let's keep this open for a while to see if other configuration tweaks are suggested ## What changes were

[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/23148 Just pushed a tweak to allow closing parens on same line. New pr for that, or do we want to keep identifying other tweaks first? I think the args on their own line is triggered once

[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-27 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/23148 I tested that example, it will reformat to ``` checkScan( query, "struct,address:string,pets:int," + &quo

[GitHub] spark pull request #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-27 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/23148#discussion_r236858392 --- Diff: dev/.scalafmt.conf --- @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-27 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/23148#discussion_r236824453 --- Diff: dev/.scalafmt.conf --- @@ -0,0 +1,24 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark issue #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/23148 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/23148#discussion_r236423672 --- Diff: .scalafmt.conf --- @@ -0,0 +1,24 @@ +# --- End diff -- Sure, moved

[GitHub] spark pull request #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/23148#discussion_r236423438 --- Diff: pom.xml --- @@ -156,6 +156,10 @@ 3.2.2 2.12.7 2.12 +1.5.1 --- End diff -- I moved the scalafmt

[GitHub] spark pull request #23148: [SPARK-26177] Automated formatting for Scala code

2018-11-26 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/23148 [SPARK-26177] Automated formatting for Scala code ## What changes were proposed in this pull request? Add a maven plugin and wrapper script at ./dev/scalafmt to use scalafmt to format

[GitHub] spark issue #23103: [SPARK-26121] [Structured Streaming] Allow users to defi...

2018-11-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/23103 merging to master, thanks @zouzias --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-11-23 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/22824 @tdas or @jose-torres any opinion on whether it's worth refactoring these checks as suggested by @arunmahadevan

[GitHub] spark issue #23103: [SPARK-26121] [Structured Streaming] Allow users to defi...

2018-11-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/23103 @zouzias can you add the new option to docs/structured-streaming-kafka-integration.md as part of this PR? Instructions for building docs are in docs/README.md , ping me if you need a hand

[GitHub] spark pull request #23103: [SPARK-26121] [Structured Streaming] Allow users ...

2018-11-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/23103#discussion_r235480695 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -538,6 +538,17 @@ private[kafka010

[GitHub] spark pull request #23103: [SPARK-26121] [Structured Streaming] Allow users ...

2018-11-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/23103#discussion_r235479872 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -538,6 +538,17 @@ private[kafka010

[GitHub] spark pull request #23103: [SPARK-26121] [Structured Streaming] Allow users ...

2018-11-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/23103#discussion_r235462394 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -538,6 +538,17 @@ private[kafka010

[GitHub] spark pull request #23103: [SPARK-26121] [Structured Streaming] Allow users ...

2018-11-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/23103#discussion_r235461374 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala --- @@ -538,6 +538,17 @@ private[kafka010

[GitHub] spark issue #21038: [SPARK-22968][DStream] Throw an exception on partition r...

2018-11-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21038 @SehanRathnayake Kafka is designed for at most one consumer per partition per consumer group at any given point in time, https://kafka.apache.org/documentation/#design_consumerposition

[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-10-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/22824 @arunmahadevan I think this is clear, even if it is redundant. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22824: [SPARK-25834] [Structured Streaming]Update Mode should n...

2018-10-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/22824 jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integra...

2018-10-12 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/22703 I guess the only argument to the contrary would be if some of the known issues end up being better solved with minor API changes, leaving it marked as experimental would technically

[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...

2018-10-12 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/22703#discussion_r224899199 --- Diff: docs/streaming-kafka-0-10-integration.md --- @@ -3,7 +3,11 @@ layout: global title: Spark Streaming + Kafka Integration Guide (Kafka broker

[GitHub] spark issue #22223: [SPARK-25233][Streaming] Give the user the option of spe...

2018-08-30 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/3 Thanks, merging to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22223: [SPARK-25233][Streaming] Give the user the option of spe...

2018-08-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/3 I think since the default behavior is still 1, it's probably ok to let someone do what they want here On Wed, Aug 29, 2018 at 3:51 PM, rezasafi wrote: > *@rezasafi* commen

[GitHub] spark pull request #22223: [SPARK-25233][Streaming] Give the user the option...

2018-08-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3#discussion_r213825892 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -154,7 +153,8 @@ private[spark

[GitHub] spark pull request #22223: SPARK-25233. Give the user the option of specifyi...

2018-08-24 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3#discussion_r212691622 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -141,10 +143,9 @@ private[spark

[GitHub] spark pull request #22223: SPARK-25233. Give the user the option of specifyi...

2018-08-24 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3#discussion_r212691455 --- Diff: docs/configuration.md --- @@ -1925,6 +1925,14 @@ showDF(properties, numRows = 200, truncate = FALSE) first batch when the backpressure

[GitHub] spark issue #22223: SPARK-25233. Give the user the option of specifying a fi...

2018-08-24 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/3 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/22138 Seeking means the pre-fetched data is wasted, so it's not a light operation. It shouldn't be unavoidable, e.g. if consumers were cached keyed by topicpartition, groupid, next offset

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/22138 see e.g. https://github.com/apache/spark/pull/20767 for background Even if this patch doesn't change behavior, if it doesn't really solve the problem, it may make it harder to solve

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-20 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/22138 I thought the whole reason the caching was changed from the initial naive approach to the current approach in master was that people were running jobs that were scheduling multiple

[GitHub] spark issue #22143: [SPARK-24647][SS] Report KafkaStreamWriter's written min...

2018-08-19 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/22143 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaData...

2018-08-19 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/22138 If you have multiple consumers for a given key, and those consumers are at different offsets, isn't it likely that the client code will not get the right consumer, leading to extra seeking

[GitHub] spark issue #21917: [SPARK-24720][STREAMING-KAFKA] add option to align range...

2018-08-06 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21917 Recursively creating a Kafka RDD during creation of a Kafka RDD would need a base case, but yeah, some way to have appropriate preferred locations. On Mon, Aug 6, 2018 at 2:58 AM

[GitHub] spark issue #21917: [SPARK-24720][STREAMING-KAFKA] add option to align range...

2018-08-06 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21917 Example report of skipped offsets in a non-compacted non-transactional situation http://mail-archives.apache.org/mod_mbox/kafka-users/201801.mbox/%3ccakwx9vxc1cdosqwwwjk3qmyy3svvtmh

[GitHub] spark pull request #21917: [SPARK-24720][STREAMING-KAFKA] add option to alig...

2018-08-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/21917#discussion_r207721681 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/OffsetWithRecordScannerSuite.scala --- @@ -0,0 +1,110

[GitHub] spark pull request #21917: [SPARK-24720][STREAMING-KAFKA] add option to alig...

2018-08-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/21917#discussion_r207721657 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/OffsetRange.scala --- @@ -90,21 +90,23 @@ final class OffsetRange

[GitHub] spark pull request #21917: [SPARK-24720][STREAMING-KAFKA] add option to alig...

2018-08-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/21917#discussion_r207721492 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -191,6 +211,11 @@ private[kafka010

[GitHub] spark pull request #21917: [SPARK-24720][STREAMING-KAFKA] add option to alig...

2018-08-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/21917#discussion_r207721482 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -183,6 +187,22 @@ private[kafka010

[GitHub] spark pull request #21917: [SPARK-24720][STREAMING-KAFKA] add option to alig...

2018-08-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/21917#discussion_r207721435 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -223,17 +240,46 @@ private[spark

[GitHub] spark issue #21917: [SPARK-24720][STREAMING-KAFKA] add option to align range...

2018-08-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21917 > How do you know that offset 4 isn't just lost because poll failed? By failed, you mean returned an empty collection after timing out, even though records should be available? You do

[GitHub] spark issue #21917: [SPARK-24720][STREAMING-KAFKA] add option to align range...

2018-08-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21917 If the last offset in the range as calculated by the driver is 5, and on the executor all you can poll up to after a repeated attempt is 3, and the user already told you

[GitHub] spark issue #21917: [SPARK-24720][STREAMING-KAFKA] add option to align range...

2018-08-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21917 Still playing devil's advocate here, I don't think stopping at 3 in your example actually tells you anything about the cause of the gaps in the sequence at 4. I'm not sure you can know

[GitHub] spark issue #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version...

2018-08-03 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21955 I don't see an obvious issue. Looks like zookeeper.connection.timeout.ms isn't being set, so it's defaulting to 6 seconds... could try tweaking it upwards

[GitHub] spark pull request #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client ...

2018-08-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/21955#discussion_r207664852 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala --- @@ -109,7 +109,7 @@ private[kafka010] class

[GitHub] spark issue #21983: [SPARK-24987][SS] - Fix Kafka consumer leak when no new ...

2018-08-03 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21983 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21917: [SPARK-24720][STREAMING-KAFKA] add option to alig...

2018-08-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/21917#discussion_r207437645 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -223,17 +240,46 @@ private[spark

[GitHub] spark issue #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client version...

2018-08-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21955 jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21955: [SPARK-18057][FOLLOW-UP][SS] Update Kafka client ...

2018-08-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/21955#discussion_r207422889 --- Diff: external/kafka-0-10/pom.xml --- @@ -28,7 +28,7 @@ spark-streaming-kafka-0-10_2.11 streaming-kafka-0-10 -0.10.0.1

[GitHub] spark issue #21917: [SPARK-24720][STREAMING-KAFKA] add option to align range...

2018-07-30 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21917 jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...

2018-07-13 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21690 LGTM, merging to master. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...

2018-07-12 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21690 What results are you seeing? On Thu, Jul 12, 2018, 6:53 AM Yuanbo Liu wrote: > @koeninger <https://github.com/koeninger> Sorry to interrupt, could you > take

[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...

2018-07-06 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21690 @yuanboliu What I'm suggesting is more like this: https://github.com/apache/spark/compare/master...koeninger:SPARK-24713?expand=1

[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...

2018-07-03 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21690 @yuanboliu From reading KafkaConsumer code, and from testing, I don't see where consumer.position() alone would un-pause topicpartitions. See below. Can you give a counter-example? I

[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...

2018-07-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21690 @yuanboliu Can you clarify why repeated pause is necessary? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...

2018-07-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21690 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21651: [SPARK-18258] Sink need access to offset representation

2018-06-28 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21651 - We need agreement on whether it is worth making a change to the public Sink api (probably not any time soon, judging from the spark 3.0 vs 2.4 discussion), or whether there is a different way

[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2018-06-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 #19431 was merged, thanks for your work. This PR should probably be closed. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-05-22 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/20997 I'm fine as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #21300: [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] Allow non-...

2018-05-14 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21300 @gatorsmile this is identical to the original PR which was reviewed by @srowen and discussion on the jira to backport it had not raised any objections since April

[GitHub] spark pull request #21300: [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] All...

2018-05-12 Thread koeninger
Github user koeninger closed the pull request at: https://github.com/apache/spark/pull/21300 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21300: [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] Allow non-...

2018-05-11 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21300 Merging to branch-2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21300: [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] All...

2018-05-11 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/21300 [SPARK-24067][BACKPORT-2.3][STREAMING][KAFKA] Allow non-consecutive offsets ## What changes were proposed in this pull request? Backport of the bugfix in SPARK-17147 Add

[GitHub] spark issue #19183: [SPARK-21960][Streaming] Spark Streaming Dynamic Allocat...

2018-05-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19183 @sansagara sounds reasonable to me --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #19183: [SPARK-21960][Streaming] Spark Streaming Dynamic Allocat...

2018-05-09 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19183 I don't have personal experience with streaming dynamic allocation, but this patch makes sense to me and I don't see anything obviously wrong. I agree with Holden regarding tests

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-25 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r184210716 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,359 @@ +/* + * Licensed

[GitHub] spark issue #19887: [SPARK-21168] KafkaRDD should always set kafka clientId.

2018-04-23 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19887 Merging to master, thanks @liu-zhaokun --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19887: [SPARK-21168] KafkaRDD should always set kafka clientId.

2018-04-23 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19887 Seems ok to me, long as it passes retest --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19887: [SPARK-21168] KafkaRDD should always set kafka clientId.

2018-04-23 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19887 Jenkins, retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21038: [SPARK-22968][DStream] Throw an exception on partition r...

2018-04-17 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21038 Seems like that should help address the confusion. Merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-17 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/20997 I think if we can't come up with a pool design now that solves most of the issues, we should switch back to the one cached consumer approach that the SQL code is using. On Mon, Apr

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181507520 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181506863 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181506582 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka partition revoked issue

2018-04-11 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21038 I can't think of a valid reason to create a configuration to allow it. It just fundamentally doesn't make sense to run different apps with the same group id. Trying to catch

[GitHub] spark issue #21038: [SPARK-22968][DStream] Fix Kafka partition revoked issue

2018-04-11 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/21038 The log in the jira looks like it's from a consumer rebalance, i.e. more than one driver consumer was running with the same group id. Isn't the underlying problem here that the user

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180282891 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180283864 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180284812 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180280531 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180285599 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180280210 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180283733 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381 @@ +/* + * Licensed

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-09 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/20997 In general, 2 things about this make me uncomfortable: - It's basically a cut-and-paste of the SQL equivalent PR, https://github.com/apache/spark/pull/20767, but it is different from both

[GitHub] spark issue #19431: [SPARK-18580] [DSTREAM][KAFKA] Add spark.streaming.backp...

2018-03-21 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19431 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][external/...

2018-03-21 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19431 @akonopko Thanks! Sorry, but I just noticed the title of the PR - can you adjust it to match convention, e.g. [SPARK-18580] [DSTREAM][KAFKA] Add

[GitHub] spark issue #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][external/...

2018-03-21 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19431 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2018-03-16 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/17774 merged to master Thanks @arzt ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark issue #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][external/...

2018-03-16 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19431 @akonopko thanks for this, if you can resolve merge conflict I think we can get this in --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][external/...

2018-03-14 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19431 Jenkins, ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #17774: [SPARK-18371][Streaming] Spark Streaming backpressure ge...

2018-03-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/17774 LGTM @tdas @zsxwing absent any objections from you in the next couple of days, I'll merge this --- - To unsubscribe, e

[GitHub] spark issue #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][external/...

2018-03-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/19431 @tdas any concerns? If @omuravskiy doesn't express any objections (since these tests are basically taken directly from his linked PR) in the next couple of days, I'm inclined to merge

[GitHub] spark pull request #19431: [SPARK-18580] [DStreams] [external/kafka-0-10][ex...

2018-03-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/19431#discussion_r173641331 --- Diff: external/kafka-0-8/src/test/scala/org/apache/spark/streaming/kafka/DirectKafkaStreamSuite.scala --- @@ -456,6 +455,60 @@ class

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/20767 Can you clarify why you want to allow only 1 cached consumer per topicpartition, closing any others at task end? It seems like opening and closing consumers would be less efficient than

[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2018-03-05 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 @omuravskiy can you comment on https://github.com/apache/spark/pull/19431 since it appears to be based on your PR

[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2018-03-05 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20572#discussion_r170799163 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala --- @@ -64,6 +69,41 @@ class KafkaRDDSuite extends

[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-21 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20572#discussion_r169850605 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -83,13 +81,50 @@ class

[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20572#discussion_r169538019 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -53,7 +51,7 @@ class

[GitHub] spark pull request #20572: [SPARK-17147][STREAMING][KAFKA] Allow non-consecu...

2018-02-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/20572#discussion_r169537541 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala --- @@ -83,13 +81,50 @@ class

  1   2   3   4   5   6   7   >