GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/3102
Spark 4229 Create hadoop configuration in a consistent way
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/koeninger/spark-1 SPARK-4229
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/1612#discussion_r17279539
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDDSuite.scala
---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/1612#discussion_r17279806
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -81,8 +113,14 @@ class JdbcRDD[T: ClassTag](
logInfo("statement
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/1612#discussion_r17280404
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag](
}).toArray
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/1612#discussion_r17280425
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag](
}).toArray
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/2345
SPARK-3462 push down filters and projections into Unions
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mediacrossinginc/spark SPARK-3462
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/2345#issuecomment-55336261
@marbrus I see what you mean. Updated to basically what you suggested,
aside from building the map once. Let me know, once it's finalized I can try
to test one
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16569
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81641129
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -273,8 +304,14 @@ class StreamExecution
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81642196
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala
---
@@ -0,0 +1,252 @@
+/*
+ * Licensed to the
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15355
I have generally been unable to reproduce these kinds of test failures on
my local environment, and don't have access to the build server, so trying fix
without repro is pretty much sho
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
@jnadler Have you had a chance to try this out and see whether it addresses
your issue?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15387
[SPARK-17782][STREAMING][KAFKA] eliminate race condition of poll twice
## What changes were proposed in this pull request?
Kafka consumers can't subscribe or maintain heartbeat wi
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15355
@zsxwing good eye, thanks. It's not that auto.offset.reset.earliest
doesn't work, it's that there's a potential race condition that poll gets
called twice slowly enough for
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15367
Does backporting reduce the likelihood of change if user feedback indicates
we got it wrong?
My technical concerns were largely addressed, that's my big remaining
organizational co
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15387
I'm not going to say anything is impossible, which is the point of the
assert. If it does somehow happen, it will be at start, so should be
obvious.
The whole poll 0 / pause
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15387
I set auto commit to false, and still recreated the test failure.
That makes sense to me, consumer position should still be getting updated
in memory even if it isn't saved to st
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15387
You dont want poll consuming messages, its not just about offset
correctness, the driver shouldnt be spending time or bandwidth doing that.
What is the substantive concern with this
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15387
Poll also isn't going to return you just messages for a single
topicpartition, so to do what you're suggesting you'd have to go through
all the messages and do additional
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15397
How is this going to work with assign? It seems like it's just avoiding
the problem, not fixing it.
---
If your project is set up for it, you can reply to this email and have your
reply a
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15387
If the concern is TD's comment,
"Future calls to {@link #poll(long)} will not return any records from these
partitions until they have been resumed using {@link #resume(
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15401
[SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of
poll twice
## What changes were proposed in this pull request?
Alternative approach to https://github.com/apache
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15387
Let me know if you guys like that alternative PR better
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15407
[SPARK-17841][STREAMING][KAFKA] drain commitQueue
## What changes were proposed in this pull request?
Actually drain commit queue rather than just iterating it
## How was
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15397
Look at the poll/seek implementation in the DStream's subscribe and
subscribe pattern when user offsets are provided, i.e. the problem that
triggered this ticket to begin with. You
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15387
If you're worried about it then accept the alternative PR I linked.
On Sun, Oct 9, 2016 at 11:37 PM, Shixiong Zhu
wrote:
> During the original implementation I had
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r82857280
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala
---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15442
[SPARK-17853][STREAMING][KAFKA][DOC] make it clear that reusing group.id is
bad
## What changes were proposed in this pull request?
Documentation fix to make it clear that reusing group
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15401
@zsxwing so poll is only called in consumer strategy in situations in which
starting offsets have been provided, and seek is called immediately thereafter
for those offsets. What is the specific
Github user koeninger closed the pull request at:
https://github.com/apache/spark/pull/15387
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15397
My main point is that whoever implements SPARK-17812 is going to have to
deal with the issue shown in SPARK-17782, which means much of this patch is
going to need to be changed anyway
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15397#discussion_r83289072
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -256,8 +269,6 @@ private[kafka010] case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15397#discussion_r83289086
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -270,7 +281,7 @@ private[kafka010] case class
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15397
LGTM, thanks for talking it through
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15504
[SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for
structured stream
## What changes were proposed in this pull request?
startingOffsets takes specific per-topicpartition
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r83551653
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -232,6 +232,42 @@ private[kafka010] case class
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15407
This doesn't affect correctness (only the highest offset for a given
partition is used in any case), just memory leaks. I'm not sure what a good
way to unit test memory leaks is
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15407
@rxin @tdas right now, items to be committed can be added to the queue, but
they will never actually be removed from the queue. poll() removes, iterator()
does not. I updated the description of
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15527
[SPARK-17813][SQL][KAFKA] Maximum data per trigger
## What changes were proposed in this pull request?
maxOffsetsPerTrigger option for rate limiting, proportionally based on
volume of
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15570
[STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches
## What changes were proposed in this pull request?
Minor doc change to mention kafka configuration for larger
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16006
Any interest in picking this back up if you can get a committer's attention?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as wel
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r105720383
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -143,9 +147,16 @@ private[spark
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16006
@zsxwing you or anyone else have time to look at this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
@zsxwing @rxin the per-partition rate limit probably won't overflow, but
the overall backpressure rate limit was being cast to an int, which definitely
can overflow. I changed it in this l
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87127811
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87130059
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87128373
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129204
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129981
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129817
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129927
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87129126
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,113 @@ private[kafka010] case
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15820
Wow, looks like the new github comment interface did all kinds of weird
things, apologies about that.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87438222
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87438788
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r87439798
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15849#discussion_r87795091
--- Diff:
examples/src/main/java/org/apache/spark/examples/sql/streaming/JavaStructuredKafkaWordCount.java
---
@@ -0,0 +1,96 @@
+/*
+ * Licensed
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15849
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15820#discussion_r88360922
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/CachedKafkaConsumer.scala
---
@@ -83,6 +86,129 @@ private[kafka010] case
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15820
Because the comment made by me and +1'ed by marmbrus is hidden at this
point, I just want to re-iterate that this patch should not skip the rest of
the partition in the case that a ti
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84370308
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
---
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84371466
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -317,6 +353,8 @@ private[kafka010] case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84372134
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -232,6 +232,42 @@ private[kafka010] case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84374872
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
---
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15504#discussion_r84378170
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala
---
@@ -0,0 +1,115 @@
+/*
+ * Licensed to the
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15626
So it looks like this is the json format being used for kafka offsets:
[{"_1":{"hash":0,"partition":0,"topic":"t"},"_2":2},{&q
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15527#discussion_r85253453
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -153,11 +201,7 @@ private[kafka010] case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15679#discussion_r85641432
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -120,15 +184,24 @@ Kafka has an offset commit API that stores offsets in
a special Kafka topic
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15679
Were these extracted from compiled example projects, or just written up?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15626#discussion_r85641502
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryException.scala
---
@@ -36,8 +36,8 @@ class StreamingQueryException
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15679#discussion_r85642029
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -165,6 +240,36 @@ For data stores that support transactions, saving
offsets in the same
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15679
Thanks for working on this, couple minor things to fix but otherwise looks
good.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15679
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
Public API changes even on an experimental module aren't a minor thing,
open a Jira.
It's also better not to lump unrelated changes together.
---
If your project is set up f
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
I think this should just be another createRDD overload that takes a scala
map. The minor additional maintenance overhead of that method as opposed to
change the existing one isn't worth bre
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
LGTM, thanks.
If you want to open a separate PR to cleanup the private doc issues you
noticed, go for it, shouldn't need another Jira imho if it isn't changing code
---
If yo
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
I don't think there's a reason to deprecate it. ju.Map is the lowest
common denominator for kafka params, it's used by the underlying consumer,
and it's what the Consum
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15681
I'm agnostic on the value of adding the overload, if @lw-lin thinks it's
more convenient for users. There are considerably fewer overloads as it stands
than the old 0.8 version of Kafk
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15715
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86066082
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,41 @@ class Dataset[T] private[sql
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86066376
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,41 @@ class Dataset[T] private[sql
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86067616
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala
---
@@ -104,85 +110,105 @@ case class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86066774
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -536,6 +535,41 @@ class Dataset[T] private[sql
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15702
Given the concerns Ofir raised about a single far future event screwing up
monotonic event time, do you want to document that problem even if there isn't
an enforced filter for it?
---
If
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/15737
[SPARK-18212][SS][KAFKA] increase executor poll timeout
## What changes were proposed in this pull request?
Increase poll timeout to try and address flaky test
## How was this
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15702#discussion_r86203226
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala
---
@@ -104,85 +110,105 @@ case class
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
In most cases poll should be returning prefetched data from the buffer, not
waiting to talk to kafka over the network.
I could see increasing it a little bit, but I don't thi
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
I will always choose "fail in an obvious way that I can start debugging"
versus "start behaving poorly in non-obvious ways". Similar reason I
thought it was a real
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
Ok, I'll update the default in both places to use spark.network.timeout and
leave the test config at 10 seconds
---
If your project is set up for it, you can reply to this email and have
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15737#discussion_r86254376
--- Diff:
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSource.scala
---
@@ -88,7 +88,10 @@ private[kafka010] case class
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15737
Good catch, I just mistakenly changed to AsMS in one place but not both.
On the test changes, do you want tests waiting up to 2 minutes * however
many kafka calls are being made? If so I
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
Ok... so the question at this point is whether it's worth making the API
change, which ultimately we'll have to track down a committer to decide.
As the PR stands, it shoul
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
I sent a message to d...@spark.apache.org, if you're not already subscribed
I'd say subscribe and follow up in case there's any discussion there rather
than on the pr / jira. Some
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/15132
@rxin thanks, changed to abstract class. If you think that's sufficient
future proofing I otherwise think this is a worthwhile change, seems like it
meets a real user need.
---
If your pr
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/15132#discussion_r86810750
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/PerPartitionConfig.scala
---
@@ -0,0 +1,47 @@
+/*
+ * Licensed
Github user koeninger commented on the issue:
https://github.com/apache/spark/pull/16629
I don't think it's a problem to make disabling the cache configurable, as
long as it's on by default. I don't think the additional static constructors in
kafka utils are
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91002372
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -143,9 +147,14 @@ private[spark
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91001763
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -67,6 +67,10 @@ private[spark
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91118458
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -67,6 +67,10 @@ private[spark
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/16006#discussion_r91118893
--- Diff:
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
---
@@ -143,9 +147,14 @@ private[spark
1 - 100 of 696 matches
Mail list logo