Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/20572#discussion_r169536036
--- Diff:
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaRDDSuite.scala
---
@@ -22,12 +22,17 @@ import java.{ util =>
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/20572
@srowen Sorry to turn on the bat-signal, but would you be able to help find
a committer willing to look at this?
After finally finding someone willing to test in production, don't
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/20572
@zsxwing you have time to review this? It's been a long standing issue.
---
-
To unsubscribe, e-mail: reviews-uns
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/20572
[SPARK-17147][STREAMING][KAFKA] Allow non-consecutive offsets
## What changes were proposed in this pull request?
Add a configuration spark.streaming.kafka.allowNonConsecutiveOffsets to
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
Seems reasonable to me but you should probably ask zsxwing if it fits in
with plans for the structured streaming kafka code.
On Thu, Nov 23, 2017 at 10:23 PM, Hyukjin Kwon
wrote
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
You'll need to get a commiter's attention to merge it anyway
On Nov 23, 2017 01:48, "Daroo" wrote:
It seems that your "magic spell" didn't work
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
ok to test
On Wed, Nov 22, 2017 at 2:49 PM, Daroo wrote:
> Cool. Could you please authorize it for testing?
>
> â
> You are receiving this beca
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
Seems reasonable.
On Wed, Nov 22, 2017 at 1:52 PM, Daroo wrote:
> It fails on the current master branch and doesn't after the patch
>
> â
> You a
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
What are you actually asserting in that test and/or does it reliably fail
if run on the version of your code before the patch?
On Wed, Nov 22, 2017 at 1:33 PM, Daroo wrote
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
Yeah, subscribepattern could definitely be an issue.
As far as unit testing, have you tried anything along the lines of setting
the cache size artificially low and then introducing new
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19789
My main comment is that if you have a situation where there's actually
contention on the size of the cache, chances are things are going to be screwed
up anyway due to consumers being recr
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/19789#discussion_r152056775
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -155,11 +178,11 @@ object
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19274
Search Jira and the mailing list, this idea has been brought up multiple
times. I don't think breaking fundamental assumptions of Kafka (one consumer
thread per group per partition) is a
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19134
There's already a jira about why 0.10 doesn't have python support,
https://issues-test.apache.org/jira/browse/S
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/19134
I think it makes sense to go ahead and put deprecation warnings in this PR
as well
---
-
To unsubscribe, e-mail: reviews
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/11863
You won't get any reasonable semantics out of auto commit, because it will
commit on the driver without regard to what the executors have done.
On Aug 2, 2017 21:46, "Wal
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/18353
I thought the historical reason testutils was in main rather than test was
for ease of use by other tests (e.g. python). There's even still a comment in
the code to that effect.
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/18234
LGTM, thanks Mark
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18234#discussion_r120716157
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -91,7 +91,7 @@ The new Kafka consumer API will pre-fetch messages into
buffers. Therefore it i
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/18143
Sorry, noticed a couple more minor configuration related things. Otherwise
LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119636878
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,42 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119636950
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,42 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119636360
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,42 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119374298
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -147,20 +155,14 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119371920
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,38 @@ object
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119371285
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/CachedKafkaConsumer.scala
---
@@ -109,34 +113,38 @@ object
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/18143
Generally looks ok to me, thanks.
2 questions -
Did you do any testing on workloads to see if performance stayed the same?
Is there a reason not to do the same thing to
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119132090
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -310,62 +308,45 @@ private[kafka010
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/18143#discussion_r119131006
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -310,62 +308,45 @@ private[kafka010
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17308
@marmbrus @zsxwing @tdas This needs attention from someone
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17774
LGTM pending jason's comments on tests
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this fe
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17774
Have you read the function def clamp?
Rate limit of 1 should not imply an attempt to grab 1 message even if it
doesn't exist.
On Apr 27, 2017 11:01, "Sebastian Ar
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17774
@arzt It's entirely possible to have batch times less than a second, and
I'm not sure I agree that the absolute number of messages allowable for a
partition should ever be zero.
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17774
How do you read 0.1 of a kafka message for a given partition of a given
batch?
Ultimately the floor for a rate limit, assuming one is set, needs to be 1
message per partition per batch
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/17675
[SPARK-20036][DOC] Note incompatible dependencies on org.apache.kafka
artifacts
## What changes were proposed in this pull request?
Note that you shouldn't manuall
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/17308
Just to throw in my two cents, a change like this is definitely needed, as
is made clear by the second sentence of the docs
http://kafka.apache.org/0102/javadoc/index.html?org/apache
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16006
@zsxwing you or anyone else have time to look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r105720383
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -143,9 +147,16 @@ private[spark
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16006
Any interest in picking this back up if you can get a committer's attention?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16629
I don't think it's a problem to make disabling the cache configurable, as
long as it's on by default. I don't think the additional static constructors in
kafka utils are
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16569
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91118893
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -143,9 +147,14 @@ private[spark
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91118458
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -67,6 +67,10 @@ private[spark
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91001763
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -67,6 +67,10 @@ private[spark
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91002372
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -143,9 +147,14 @@ private[spark
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15820
Because the comment made by me and +1'ed by marmbrus is hidden at this
point, I just want to re-iterate that this patch should not skip the rest of
the partition in the case that a ti
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r88360922
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15849
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15849#discussion_r87795091
--- Diff:
examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredKafkaWordCount.java
---
@@ -0,0 +1,96 @@
+/*
+ * Licensed
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87439798
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87438222
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87438788
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15820
Wow, looks like the new github comment interface did all kinds of weird
things, apologies about that.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129126
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129981
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129817
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129927
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87127811
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87130059
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87128373
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129204
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
@zsxwing @rxin the per-partition rate limit probably won't overflow, but
the overall backpressure rate limit was being cast to an int, which definitely
can overflow. I changed it in this l
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15132#discussion_r86810750
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/PerPartitionConfig.scala
---
@@ -0,0 +1,47 @@
+/*
+ * Licensed
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
@rxin thanks, changed to abstract class. If you think that's sufficient
future proofing I otherwise think this is a worthwhile change, seems like it
meets a real user need.
---
If your pr
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
I sent a message to d...@spark.apache.org, if you're not already subscribed
I'd say subscribe and follow up in case there's any discussion there rather
than on the pr / jira. Some
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
Ok... so the question at this point is whether it's worth making the API
change, which ultimately we'll have to track down a committer to decide.
As the PR stands, it shoul
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
Good catch, I just mistakenly changed to AsMS in one place but not both.
On the test changes, do you want tests waiting up to 2 minutes * however
many kafka calls are being made? If so I
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15737#discussion_r86254376
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -88,7 +88,10 @@ private[kafka010] case class
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
Ok, I'll update the default in both places to use spark.network.timeout and
leave the test config at 10 seconds
---
If your project is set up for it, you can reply to this email and have
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
I will always choose "fail in an obvious way that I can start debugging"
versus "start behaving poorly in non-obvious ways". Similar reason I
thought it was a real
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
In most cases poll should be returning prefetched data from the buffer, not
waiting to talk to kafka over the network.
I could see increasing it a little bit, but I don't thi
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86203226
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala
---
@@ -104,85 +110,105 @@ case class
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15737
[SPARK-18212][SS][KAFKA] increase executor poll timeout
## What changes were proposed in this pull request?
Increase poll timeout to try and address flaky test
## How was this
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15702
Given the concerns Ofir raised about a single far future event screwing up
monotonic event time, do you want to document that problem even if there isn't
an enforced filter for it?
---
If
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86066774
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,41 @@ class Dataset[T] private[sql
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86067616
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala
---
@@ -104,85 +110,105 @@ case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86066376
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,41 @@ class Dataset[T] private[sql
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86066082
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,41 @@ class Dataset[T] private[sql
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15715
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
I'm agnostic on the value of adding the overload, if @lw-lin thinks it's
more convenient for users. There are considerably fewer overloads as it stands
than the old 0.8 version of Kafk
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
I don't think there's a reason to deprecate it. ju.Map is the lowest
common denominator for kafka params, it's used by the underlying consumer,
and it's what the Consum
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
LGTM, thanks.
If you want to open a separate PR to cleanup the private doc issues you
noticed, go for it, shouldn't need another Jira imho if it isn't changing code
---
If yo
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
I think this should just be another createRDD overload that takes a scala
map. The minor additional maintenance overhead of that method as opposed to
change the existing one isn't worth bre
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
Public API changes even on an experimental module aren't a minor thing,
open a Jira.
It's also better not to lump unrelated changes together.
---
If your project is set up f
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15679
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15679
Thanks for working on this, couple minor things to fix but otherwise looks
good.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15679#discussion_r85642029
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -165,6 +240,36 @@ For data stores that support transactions, saving
offsets in the same
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15626#discussion_r85641502
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
---
@@ -36,8 +36,8 @@ class StreamingQueryException
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15679
Were these extracted from compiled example projects, or just written up?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15679#discussion_r85641432
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -120,15 +184,24 @@ Kafka has an offset commit API that stores offsets in
a special Kafka topic
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15527#discussion_r85253453
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -153,11 +201,7 @@ private[kafka010] case class
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15626
So it looks like this is the json format being used for kafka offsets:
[{"_1":{"hash":0,"partition":0,"topic":"t"},"_2":2},{&q
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84378170
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
---
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84374872
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
---
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84372134
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -232,6 +232,42 @@ private[kafka010] case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84371466
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -317,6 +353,8 @@ private[kafka010] case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84370308
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
---
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15570
[STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches
## What changes were proposed in this pull request?
Minor doc change to mention kafka configuration for larger
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15527
[SPARK-17813][SQL][KAFKA] Maximum data per trigger
## What changes were proposed in this pull request?
maxOffsetsPerTrigger option for rate limiting, proportionally based on
volume of
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15407
@rxin @tdas right now, items to be committed can be added to the queue, but
they will never actually be removed from the queue. poll() removes, iterator()
does not. I updated the description of
101 - 200 of 696 matches
Mail list logo