Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r169536036
--- Diff:
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala
---
@@ -22,12 +22,17 @@ import java.{ util =>
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/20572
@srowen Sorry to turn on the bat-signal, but would you be able to help find
a committer willing to look at this?
After finally finding someone willing to test in production, don't want
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/20572
@zsxwing you have time to review this? It's been a long standing issue.
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/20572
[SPARK-17147][STREAMING][KAFKA] Allow non-consecutive offsets
## What changes were proposed in this pull request?
Add a configuration spark.streaming.kafka.allowNonConsecutiveOffsets
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
Seems reasonable to me but you should probably ask zsxwing if it fits in
with plans for the structured streaming kafka code.
On Thu, Nov 23, 2017 at 10:23 PM, Hyukjin Kwon <notific
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
You'll need to get a commiter's attention to merge it anyway
On Nov 23, 2017 01:48, "Daroo" <notificati...@github.com> wrote:
It seems that your "mag
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
ok to test
On Wed, Nov 22, 2017 at 2:49 PM, Daroo <notificati...@github.com> wrote:
> Cool. Could you please authorize it for testing?
>
> â
>
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
Seems reasonable.
On Wed, Nov 22, 2017 at 1:52 PM, Daroo <notificati...@github.com> wrote:
> It fails on the current master branch and doesn't after the patch
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
What are you actually asserting in that test and/or does it reliably fail
if run on the version of your code before the patch?
On Wed, Nov 22, 2017 at 1:33 PM, Daroo <notific
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
Yeah, subscribepattern could definitely be an issue.
As far as unit testing, have you tried anything along the lines of setting
the cache size artificially low and then introducing new
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
My main comment is that if you have a situation where there's actually
contention on the size of the cache, chances are things are going to be screwed
up anyway due to consumers being recreated
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19789#discussion_r152056775
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -155,11 +178,11 @@ object
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19274
Search Jira and the mailing list, this idea has been brought up multiple
times. I don't think breaking fundamental assumptions of Kafka (one consumer
thread per group per partition) is a good
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19134
There's already a jira about why 0.10 doesn't have python support,
https://issues-test.apache.org/jira/browse/SPARK-16534
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19134
I think it makes sense to go ahead and put deprecation warnings in this PR
as well
---
-
To unsubscribe, e-mail: reviews
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/11863
You won't get any reasonable semantics out of auto commit, because it will
commit on the driver without regard to what the executors have done.
On Aug 2, 2017 21:46, "Wallace
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/18353
I thought the historical reason testutils was in main rather than test was
for ease of use by other tests (e.g. python). There's even still a comment in
the code to that effect
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/18234
LGTM, thanks Mark
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18234#discussion_r120716157
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -91,7 +91,7 @@ The new Kafka consumer API will pre-fetch messages into
buffers. Therefore it i
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/18143
Sorry, noticed a couple more minor configuration related things. Otherwise
LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119636878
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,42 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119636950
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,42 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119636360
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,42 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119374298
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -147,20 +155,14 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119371920
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,38 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119371285
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,38 @@ object
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/18143
Generally looks ok to me, thanks.
2 questions -
Did you do any testing on workloads to see if performance stayed the same?
Is there a reason not to do the same thing
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119132090
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -310,62 +308,45 @@ private[kafka010
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119131006
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -310,62 +308,45 @@ private[kafka010
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17308
@marmbrus @zsxwing @tdas This needs attention from someone
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17774
LGTM pending jason's comments on tests
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17774
Have you read the function def clamp?
Rate limit of 1 should not imply an attempt to grab 1 message even if it
doesn't exist.
On Apr 27, 2017 11:01, "Sebastian
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17774
@arzt It's entirely possible to have batch times less than a second, and
I'm not sure I agree that the absolute number of messages allowable for a
partition should ever be zero.
So
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17774
How do you read 0.1 of a kafka message for a given partition of a given
batch?
Ultimately the floor for a rate limit, assuming one is set, needs to be 1
message per partition per batch
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/17675
[SPARK-20036][DOC] Note incompatible dependencies on org.apache.kafka
artifacts
## What changes were proposed in this pull request?
Note that you shouldn't manually add
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17308
Just to throw in my two cents, a change like this is definitely needed, as
is made clear by the second sentence of the docs
http://kafka.apache.org/0102/javadoc/index.html?org/apache
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16006
@zsxwing you or anyone else have time to look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r105720383
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -143,9 +147,16 @@ private[spark
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16006
Any interest in picking this back up if you can get a committer's attention?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16629
I don't think it's a problem to make disabling the cache configurable, as
long as it's on by default. I don't think the additional static constructors in
kafka utils are necessary
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16569
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91118893
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -143,9 +147,14 @@ private[spark
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91118458
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -67,6 +67,10 @@ private[spark
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91001763
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -67,6 +67,10 @@ private[spark
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91002372
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -143,9 +147,14 @@ private[spark
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15820
Because the comment made by me and +1'ed by marmbrus is hidden at this
point, I just want to re-iterate that this patch should not skip the rest of
the partition in the case that a timeout
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r88360922
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15849
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15849#discussion_r87795091
--- Diff:
examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredKafkaWordCount.java
---
@@ -0,0 +1,96 @@
+/*
+ * Licensed
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87439798
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87438222
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87438788
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15820
Wow, looks like the new github comment interface did all kinds of weird
things, apologies about that.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129126
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129981
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129817
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129927
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87127811
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87130059
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87128373
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129204
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
@zsxwing @rxin the per-partition rate limit probably won't overflow, but
the overall backpressure rate limit was being cast to an int, which definitely
can overflow. I changed it in this latest
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15132#discussion_r86810750
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/PerPartitionConfig.scala
---
@@ -0,0 +1,47 @@
+/*
+ * Licensed
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
@rxin thanks, changed to abstract class. If you think that's sufficient
future proofing I otherwise think this is a worthwhile change, seems like it
meets a real user need.
---
If your project
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
I sent a message to d...@spark.apache.org, if you're not already subscribed
I'd say subscribe and follow up in case there's any discussion there rather
than on the pr / jira. Someone may want
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
Ok... so the question at this point is whether it's worth making the API
change, which ultimately we'll have to track down a committer to decide.
As the PR stands, it shouldn't break
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
Good catch, I just mistakenly changed to AsMS in one place but not both.
On the test changes, do you want tests waiting up to 2 minutes * however
many kafka calls are being made? If so I
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15737#discussion_r86254376
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -88,7 +88,10 @@ private[kafka010] case class
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
Ok, I'll update the default in both places to use spark.network.timeout and
leave the test config at 10 seconds
---
If your project is set up for it, you can reply to this email and have your
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
I will always choose "fail in an obvious way that I can start debugging"
versus "start behaving poorly in non-obvious ways". Similar reason I
thought it wa
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
In most cases poll should be returning prefetched data from the buffer, not
waiting to talk to kafka over the network.
I could see increasing it a little bit, but I don't think
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86203226
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala
---
@@ -104,85 +110,105 @@ case class
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15737
[SPARK-18212][SS][KAFKA] increase executor poll timeout
## What changes were proposed in this pull request?
Increase poll timeout to try and address flaky test
## How
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15702
Given the concerns Ofir raised about a single far future event screwing up
monotonic event time, do you want to document that problem even if there isn't
an enforced filter for it?
---
If your
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86066774
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,41 @@ class Dataset[T] private[sql
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86067616
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala
---
@@ -104,85 +110,105 @@ case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86066376
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,41 @@ class Dataset[T] private[sql
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86066082
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,41 @@ class Dataset[T] private[sql
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15715
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
I'm agnostic on the value of adding the overload, if @lw-lin thinks it's
more convenient for users. There are considerably fewer overloads as it stands
than the old 0.8 version of KafkaUtils, so
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
I don't think there's a reason to deprecate it. ju.Map is the lowest
common denominator for kafka params, it's used by the underlying consumer,
and it's what the ConsumerStrategy interface
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
LGTM, thanks.
If you want to open a separate PR to cleanup the private doc issues you
noticed, go for it, shouldn't need another Jira imho if it isn't changing code
---
If your project
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
I think this should just be another createRDD overload that takes a scala
map. The minor additional maintenance overhead of that method as opposed to
change the existing one isn't worth breaking
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
Public API changes even on an experimental module aren't a minor thing,
open a Jira.
It's also better not to lump unrelated changes together.
---
If your project is set up for it, you
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15679
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15679
Thanks for working on this, couple minor things to fix but otherwise looks
good.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15679#discussion_r85642029
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -165,6 +240,36 @@ For data stores that support transactions, saving
offsets in the same
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15626#discussion_r85641502
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
---
@@ -36,8 +36,8 @@ class StreamingQueryException
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15679
Were these extracted from compiled example projects, or just written up?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15679#discussion_r85641432
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -120,15 +184,24 @@ Kafka has an offset commit API that stores offsets in
a special Kafka topic
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15527#discussion_r85253453
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -153,11 +201,7 @@ private[kafka010] case class
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15626
So it looks like this is the json format being used for kafka offsets:
[{"_1":{"hash":0,"partition":0,"topic":"t"},"_2":2},{&q
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84378170
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
---
@@ -0,0 +1,115 @@
+/*
+ * Licensed
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84374872
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
---
@@ -0,0 +1,115 @@
+/*
+ * Licensed
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84372134
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -232,6 +232,42 @@ private[kafka010] case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84371466
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -317,6 +353,8 @@ private[kafka010] case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84370308
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
---
@@ -0,0 +1,115 @@
+/*
+ * Licensed
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15570
[STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches
## What changes were proposed in this pull request?
Minor doc change to mention kafka configuration for larger
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15527
[SPARK-17813][SQL][KAFKA] Maximum data per trigger
## What changes were proposed in this pull request?
maxOffsetsPerTrigger option for rate limiting, proportionally based on
volume
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15407
@rxin @tdas right now, items to be committed can be added to the queue, but
they will never actually be removed from the queue. poll() removes, iterator()
does not. I updated the description
101 - 200 of 692 matches
Mail list logo