[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-11-04 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/3102 Spark 4229 Create hadoop configuration in a consistent way You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-4229

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17279539 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDDSuite.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17279806 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -81,8 +113,14 @@ class JdbcRDD[T: ClassTag]( logInfo("statement

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17280404 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag]( }).toArray

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17280425 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag]( }).toArray

[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-10 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/2345 SPARK-3462 push down filters and projections into Unions You can merge this pull request into a Git repository by running: $ git pull https://github.com/mediacrossinginc/spark SPARK-3462

[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55336261 @marbrus I see what you mean. Updated to basically what you suggested, aside from building the map once. Let me know, once it's finalized I can try to test one

[GitHub] spark issue #16569: [SPARK-19206][DOC][DStream] Fix outdated parameter descr...

2017-01-14 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16569 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81641129 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -273,8 +304,14 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81642196 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,252 @@ +/* + * Licensed to the

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15355 I have generally been unable to reproduce these kinds of test failures on my local environment, and don't have access to the build server, so trying fix without repro is pretty much sho

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-10-06 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 @jnadler Have you had a chance to try this out and see whether it addresses your issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race co...

2016-10-06 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15387 [SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll twice ## What changes were proposed in this pull request? Kafka consumers can't subscribe or maintain heartbeat wi

[GitHub] spark issue #15355: [SPARK-17782][STREAMING] Disable Kafka 010 pattern based...

2016-10-06 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15355 @zsxwing good eye, thanks. It's not that auto.offset.reset.earliest doesn't work, it's that there's a potential race condition that poll gets called twice slowly enough for

[GitHub] spark issue #15367: [SPARK-17346][SQL][test-maven]Add Kafka source for Struc...

2016-10-06 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15367 Does backporting reduce the likelihood of change if user feedback indicates we got it wrong? My technical concerns were largely addressed, that's my big remaining organizational co

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 I'm not going to say anything is impossible, which is the point of the assert. If it does somehow happen, it will be at start, so should be obvious. The whole poll 0 / pause

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 I set auto commit to false, and still recreated the test failure. That makes sense to me, consumer position should still be getting updated in memory even if it isn't saved to st

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 You dont want poll consuming messages, its not just about offset correctness, the driver shouldnt be spending time or bandwidth doing that. What is the substantive concern with this

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 Poll also isn't going to return you just messages for a single topicpartition, so to do what you're suggesting you'd have to go through all the messages and do additional

[GitHub] spark issue #15397: [SPARK-17834][SQL]Fetch the earliest offsets manually in...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15397 How is this going to work with assign? It seems like it's just avoiding the problem, not fixing it. --- If your project is set up for it, you can reply to this email and have your reply a

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 If the concern is TD's comment, "Future calls to {@link #poll(long)} will not return any records from these partitions until they have been resumed using {@link #resume(

[GitHub] spark pull request #15401: [SPARK-17782][STREAMING][KAFKA] alternative elimi...

2016-10-07 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15401 [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice ## What changes were proposed in this pull request? Alternative approach to https://github.com/apache

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-07 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 Let me know if you guys like that alternative PR better --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #15407: [SPARK-17841][STREAMING][KAFKA] drain commitQueue

2016-10-09 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15407 [SPARK-17841][STREAMING][KAFKA] drain commitQueue ## What changes were proposed in this pull request? Actually drain commit queue rather than just iterating it ## How was

[GitHub] spark issue #15397: [SPARK-17834][SQL]Fetch the earliest offsets manually in...

2016-10-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15397 Look at the poll/seek implementation in the DStream's subscribe and subscribe pattern when user offsets are provided, i.e. the problem that triggered this ticket to begin with. You

[GitHub] spark issue #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race condition...

2016-10-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15387 If you're worried about it then accept the alternative PR I linked. On Sun, Oct 9, 2016 at 11:37 PM, Shixiong Zhu wrote: > During the original implementation I had

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82857280 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the

[GitHub] spark pull request #15442: [SPARK-17853][STREAMING][KAFKA][DOC] make it clea...

2016-10-11 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15442 [SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is bad ## What changes were proposed in this pull request? Documentation fix to make it clear that reusing group

[GitHub] spark issue #15401: [SPARK-17782][STREAMING][KAFKA] alternative eliminate ra...

2016-10-12 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15401 @zsxwing so poll is only called in consumer strategy in situations in which starting offsets have been provided, and seek is called immediately thereafter for those offsets. What is the specific

[GitHub] spark pull request #15387: [SPARK-17782][STREAMING][KAFKA] eliminate race co...

2016-10-12 Thread koeninger
Github user koeninger closed the pull request at: https://github.com/apache/spark/pull/15387 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #15397: [SPARK-17834][SQL]Fetch the earliest offsets manually in...

2016-10-12 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15397 My main point is that whoever implements SPARK-17812 is going to have to deal with the issue shown in SPARK-17782, which means much of this patch is going to need to be changed anyway

[GitHub] spark pull request #15397: [SPARK-17834][SQL]Fetch the earliest offsets manu...

2016-10-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15397#discussion_r83289072 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -256,8 +269,6 @@ private[kafka010] case class

[GitHub] spark pull request #15397: [SPARK-17834][SQL]Fetch the earliest offsets manu...

2016-10-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15397#discussion_r83289086 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -270,7 +281,7 @@ private[kafka010] case class

[GitHub] spark issue #15397: [SPARK-17834][SQL]Fetch the earliest offsets manually in...

2016-10-13 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15397 LGTM, thanks for talking it through --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-15 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15504 [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream ## What changes were proposed in this pull request? startingOffsets takes specific per-topicpartition

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-16 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r83551653 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -232,6 +232,42 @@ private[kafka010] case class

[GitHub] spark issue #15407: [SPARK-17841][STREAMING][KAFKA] drain commitQueue

2016-10-17 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15407 This doesn't affect correctness (only the highest offset for a given partition is used in any case), just memory leaks. I'm not sure what a good way to unit test memory leaks is

[GitHub] spark issue #15407: [SPARK-17841][STREAMING][KAFKA] drain commitQueue

2016-10-17 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15407 @rxin @tdas right now, items to be committed can be added to the queue, but they will never actually be removed from the queue. poll() removes, iterator() does not. I updated the description of

[GitHub] spark pull request #15527: [SPARK-17813][SQL][KAFKA] Maximum data per trigge...

2016-10-17 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15527 [SPARK-17813][SQL][KAFKA] Maximum data per trigger ## What changes were proposed in this pull request? maxOffsetsPerTrigger option for rate limiting, proportionally based on volume of

[GitHub] spark pull request #15570: [STREAMING][KAFKA][DOC] clarify kafka settings ne...

2016-10-20 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15570 [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches ## What changes were proposed in this pull request? Minor doc change to mention kafka configuration for larger

[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2017-03-10 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 Any interest in picking this back up if you can get a committer's attention? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as wel

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2017-03-13 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r105720383 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -143,9 +147,16 @@ private[spark

[GitHub] spark issue #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Use spark...

2017-03-13 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16006 @zsxwing you or anyone else have time to look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-08 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 @zsxwing @rxin the per-partition rate limit probably won't overflow, but the overall backpressure rate limit was being cast to an int, which definitely can overflow. I changed it in this l

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87127811 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87130059 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87128373 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129204 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129981 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129817 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129927 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87129126 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,113 @@ private[kafka010] case

[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...

2016-11-08 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15820 Wow, looks like the new github comment interface did all kinds of weird things, apologies about that. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87438222 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,129 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87438788 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,129 @@ private[kafka010] case

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r87439798 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,129 @@ private[kafka010] case

[GitHub] spark pull request #15849: [SPARK-18410][STREAMING] Add structured kafka exa...

2016-11-14 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15849#discussion_r87795091 --- Diff: examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredKafkaWordCount.java --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark issue #15849: [SPARK-18410][STREAMING] Add structured kafka example

2016-11-15 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15849 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false...

2016-11-16 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15820#discussion_r88360922 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala --- @@ -83,6 +86,129 @@ private[kafka010] case

[GitHub] spark issue #15820: [SPARK-18373][SS][Kafka]Make failOnDataLoss=false work w...

2016-11-19 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15820 Because the comment made by me and +1'ed by marmbrus is hidden at this point, I just want to re-iterate that this patch should not skip the rest of the partition in the case that a ti

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84370308 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84371466 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -317,6 +353,8 @@ private[kafka010] case class

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84372134 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -232,6 +232,42 @@ private[kafka010] case class

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84374872 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the

[GitHub] spark pull request #15504: [SPARK-17812][SQL][KAFKA] Assign and specific sta...

2016-10-20 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15504#discussion_r84378170 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -0,0 +1,115 @@ +/* + * Licensed to the

[GitHub] spark issue #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-26 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15626 So it looks like this is the json format being used for kafka offsets: [{"_1":{"hash":0,"partition":0,"topic":"t"},"_2":2},{&q

[GitHub] spark pull request #15527: [SPARK-17813][SQL][KAFKA] Maximum data per trigge...

2016-10-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15527#discussion_r85253453 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -153,11 +201,7 @@ private[kafka010] case class

[GitHub] spark pull request #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] A...

2016-10-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15679#discussion_r85641432 --- Diff: docs/streaming-kafka-0-10-integration.md --- @@ -120,15 +184,24 @@ Kafka has an offset commit API that stores offsets in a special Kafka topic

[GitHub] spark issue #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] Add java...

2016-10-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15679 Were these extracted from compiled example projects, or just written up? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #15626: SPARK-17829 [SQL] Stable format for offset log

2016-10-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15626#discussion_r85641502 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala --- @@ -36,8 +36,8 @@ class StreamingQueryException

[GitHub] spark pull request #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] A...

2016-10-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15679#discussion_r85642029 --- Diff: docs/streaming-kafka-0-10-integration.md --- @@ -165,6 +240,36 @@ For data stores that support transactions, saving offsets in the same

[GitHub] spark issue #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] Add java...

2016-10-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15679 Thanks for working on this, couple minor things to fix but otherwise looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #15679: [SPARK-16312][Follow-up][STREAMING][KAFKA][DOC] Add java...

2016-10-29 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15679 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #15681: [Minor][Streaming][Kafka] Kafka010 .createRDD() scala AP...

2016-10-30 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 Public API changes even on an experimental module aren't a minor thing, open a Jira. It's also better not to lump unrelated changes together. --- If your project is set up f

[GitHub] spark issue #15681: [SPARK-18176][Streaming][Kafka] Kafka010 .createRDD() sc...

2016-10-31 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 I think this should just be another createRDD overload that takes a scala map. The minor additional maintenance overhead of that method as opposed to change the existing one isn't worth bre

[GitHub] spark issue #15681: [SPARK-18176][Streaming][Kafka] Kafka010 .createRDD() sc...

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 LGTM, thanks. If you want to open a separate PR to cleanup the private doc issues you noticed, go for it, shouldn't need another Jira imho if it isn't changing code --- If yo

[GitHub] spark issue #15681: [SPARK-18176][Streaming][Kafka] Kafka010 .createRDD() sc...

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 I don't think there's a reason to deprecate it. ju.Map is the lowest common denominator for kafka params, it's used by the underlying consumer, and it's what the Consum

[GitHub] spark issue #15681: [SPARK-18176][Streaming][Kafka] Kafka010 .createRDD() sc...

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15681 I'm agnostic on the value of adding the overload, if @lw-lin thinks it's more convenient for users. There are considerably fewer overloads as it stands than the old 0.8 version of Kafk

[GitHub] spark issue #15715: [SPARK-18198][Doc][Streaming] Highlight code snippets

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15715 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86066082 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -536,6 +535,41 @@ class Dataset[T] private[sql

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86066376 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -536,6 +535,41 @@ class Dataset[T] private[sql

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86067616 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -104,85 +110,105 @@ case class

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86066774 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -536,6 +535,41 @@ class Dataset[T] private[sql

[GitHub] spark issue #15702: [SPARK-18124] Observed delay based Event Time Watermarks

2016-11-01 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15702 Given the concerns Ofir raised about a single far future event screwing up monotonic event time, do you want to document that problem even if there isn't an enforced filter for it? --- If

[GitHub] spark pull request #15737: [SPARK-18212][SS][KAFKA] increase executor poll t...

2016-11-02 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/15737 [SPARK-18212][SS][KAFKA] increase executor poll timeout ## What changes were proposed in this pull request? Increase poll timeout to try and address flaky test ## How was this

[GitHub] spark pull request #15702: [SPARK-18124] Observed delay based Event Time Wat...

2016-11-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15702#discussion_r86203226 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -104,85 +110,105 @@ case class

[GitHub] spark issue #15737: [SPARK-18212][SS][KAFKA] increase executor poll timeout

2016-11-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15737 In most cases poll should be returning prefetched data from the buffer, not waiting to talk to kafka over the network. I could see increasing it a little bit, but I don't thi

[GitHub] spark issue #15737: [SPARK-18212][SS][KAFKA] increase executor poll timeout

2016-11-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15737 I will always choose "fail in an obvious way that I can start debugging" versus "start behaving poorly in non-obvious ways". Similar reason I thought it was a real

[GitHub] spark issue #15737: [SPARK-18212][SS][KAFKA] increase executor poll timeout

2016-11-02 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15737 Ok, I'll update the default in both places to use spark.network.timeout and leave the test config at 10 seconds --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #15737: [SPARK-18212][SS][KAFKA] increase executor poll t...

2016-11-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15737#discussion_r86254376 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala --- @@ -88,7 +88,10 @@ private[kafka010] case class

[GitHub] spark issue #15737: [SPARK-18212][SS][KAFKA] increase executor poll timeout

2016-11-03 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15737 Good catch, I just mistakenly changed to AsMS in one place but not both. On the test changes, do you want tests waiting up to 2 minutes * however many kafka calls are being made? If so I

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-03 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 Ok... so the question at this point is whether it's worth making the API change, which ultimately we'll have to track down a committer to decide. As the PR stands, it shoul

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 I sent a message to d...@spark.apache.org, if you're not already subscribed I'd say subscribe and follow up in case there's any discussion there rather than on the pr / jira. Some

[GitHub] spark issue #15132: [SPARK-17510][STREAMING][KAFKA] config max rate on a per...

2016-11-04 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15132 @rxin thanks, changed to abstract class. If you think that's sufficient future proofing I otherwise think this is a worthwhile change, seems like it meets a real user need. --- If your pr

[GitHub] spark pull request #15132: [SPARK-17510][STREAMING][KAFKA] config max rate o...

2016-11-07 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15132#discussion_r86810750 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/PerPartitionConfig.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed

[GitHub] spark issue #16629: [SPARK-19185][DStream] Add more clear hint for 'Concurre...

2017-01-18 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/16629 I don't think it's a problem to make disabling the cache configurable, as long as it's on by default. I don't think the additional static constructors in kafka utils are

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r91002372 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -143,9 +147,14 @@ private[spark

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r91001763 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -67,6 +67,10 @@ private[spark

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r91118458 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -67,6 +67,10 @@ private[spark

[GitHub] spark pull request #16006: [SPARK-18580] [DStreams] [external/kafka-0-10] Us...

2016-12-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/16006#discussion_r91118893 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala --- @@ -143,9 +147,14 @@ private[spark

  1   2   3   4   5   6   7   >