Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/2345#issuecomment-55336261
@marbrus I see what you mean. Updated to basically what you suggested,
aside from building the map once. Let me know, once it's finalized I can try
to test one more
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/1409#issuecomment-49066268
Testing that patch, it seems to have fixed the deadlock we were seeing in
production.
---
If your project is set up for it, you can reply to this email and have your
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/1612#discussion_r17279539
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDDSuite.scala
---
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/1612#discussion_r17279806
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -81,8 +113,14 @@ class JdbcRDD[T: ClassTag](
logInfo(statement fetch size
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/1612#discussion_r17280404
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag](
}).toArray
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/1612#discussion_r17280425
--- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala ---
@@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag](
}).toArray
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/2345
SPARK-3462 push down filters and projections into Unions
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/mediacrossinginc/spark SPARK-3462
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/3102
Spark 4229 Create hadoop configuration in a consistent way
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/koeninger/spark-1 SPARK-4229
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/3543
Closes SPARK-4229 Create hadoop configuration in a consistent way
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/koeninger/spark-1 SPARK
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3102#issuecomment-65176731
Yes, the new hadoop config documentation is just documenting the behavior
of SparkHadoopUtil.scala lines 95-100
Sorry about the branch situation, I was unclear
Github user koeninger closed the pull request at:
https://github.com/apache/spark/pull/3102
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3543#discussion_r21192361
--- Diff: docs/configuration.md ---
@@ -664,6 +665,24 @@ Apart from these, the following properties are also
available, and may be useful
/td
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3543#discussion_r21610025
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext:
SparkContext
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3543#discussion_r21638810
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
@@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext:
SparkContext
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3543#issuecomment-67337516
Jenkins is failing
org.apache.spark.scheduler.SparkListenerSuite.local metrics
org.apache.spark.streaming.flume.FlumeStreamSuite.flume input compressed
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/3798
[SPARK-4964] [Streaming] Exactly-once semantics for Kafka
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/koeninger/spark-1 kafkaRdd
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-68149432
Hi @jerryshao
I'd politely ask that anyone with questions read at least KafkaRDD.scala
and the example usage linked from the jira ticket (it's only about 50
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r22330325
--- Diff: external/kafka/pom.xml ---
@@ -44,7 +44,7 @@
dependency
groupIdorg.apache.kafka/groupId
artifactIdkafka_
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-68307147
I got some good feedback from Koert Kuipers at Tresata regarding location
awareness, so I'll be doing some refactoring to add that.
---
If your project is set up
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r22362375
--- Diff:
external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaRDD.scala ---
@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r22364279
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala
---
@@ -0,0 +1,123
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r22364358
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -788,6 +788,20 @@ abstract class RDD[T: ClassTag
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r22366068
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala
---
@@ -0,0 +1,123
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r22366140
--- Diff:
external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala ---
@@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r22366683
--- Diff:
external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala ---
@@ -0,0 +1,305 @@
+/*
+ * Licensed to the Apache Software
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3849#issuecomment-68405794
Thanks for this. Most of the uses of attemptId I've seen look like they
were assuming it meant the 0-based attempt number.
---
If your project is set up for it, you
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3849#issuecomment-68407388
The flip side is that it's already documented as doing the right thing:
http://spark.apache.org/docs/1.1.1/api/scala/index.html#org.apache.spark.TaskContext
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-71549132
Just updated it
On Mon, Jan 26, 2015 at 4:06 PM, Hari Shreedharan notificati...@github.com
wrote:
@koeninger https://github.com/koeninger Can you
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72893647
Here's a solution for subclassing ConsumerConfig while still silencing the
warning.
My son is doing ok(ish) now, thanks for the concern.
---
If your project
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72780349
High level consumers connect to ZK.
Simple consumers (which is what this is using) connect to brokers directly
instead. See
https://cwiki.apache.org
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72779615
Yeah, there's a weird distinction in Kafka between simple consumers and
high level consumers in that they have a lot of common configuration
parameters, but one
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72790044
The warning is for metadata.broker.list, since its not expected by the
existing ConsumerConfig (its used by other config classes)
Couldn't get subclassing
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72783141
Yeah, more importantly it's so defaults for things like connection timeouts
match what kafka provides.
It's possible to assign fake zookeeper.connect
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/4384#discussion_r24142354
--- Diff:
examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala
---
@@ -0,0 +1,60 @@
+/*
+ * Licensed
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23974964
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala
---
@@ -0,0 +1,150
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23972944
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala
---
@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23972610
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala
---
@@ -0,0 +1,150
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/4384#discussion_r24268035
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -179,121 +182,190 @@ object KafkaUtils {
errs
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24019631
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,174 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24018460
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,174 @@ object KafkaUtils
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72691392
Regarding naming, I agree. The name has been a point of discussion for a
month, how to get some consensus?
Regarding Java wrappers, there have already been
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24017652
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,174 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24019904
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,174 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24019256
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24017456
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,249 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24019815
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,174 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24018579
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24020676
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24021621
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala
---
@@ -0,0 +1,344 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24020938
--- Diff:
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaClusterSuite.scala
---
@@ -0,0 +1,73 @@
+/*
+ * Licensed
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24021039
--- Diff:
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala
---
@@ -0,0 +1,99 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24030347
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,174 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24031550
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,249 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24033509
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,174 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r24031208
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,174 @@ object KafkaUtils
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/4384#issuecomment-73269084
Thanks for adding the java friendly kafka uitls methods.
Your original reason for wanting Array[Leader] rather than
Map[TopicAndPartition, Broker] was for java
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/4384#discussion_r24252680
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -179,121 +182,190 @@ object KafkaUtils {
errs
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/4384#discussion_r24252952
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -179,121 +182,190 @@ object KafkaUtils {
errs
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/4384#discussion_r24253794
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala
---
@@ -19,16 +19,35 @@ package
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/4384#discussion_r24251761
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala
---
@@ -36,14 +36,12 @@ import kafka.utils.VerifiableProperties
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72758821
Besides introducing 2 classes where 1 would do, it implies that there are
(or could be) multiple implementations of the abstract class. You're not
using
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72760900
To put it another way, the type you return has to be public.
If you return a public abstract class, what are you going to do when
someone else subclasses
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72754442
Like patrick said, I really don't see any reason not to just expose
KafkaRDD. You can still hide its constructor without making a superflous
abstract class
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72757731
Just make the simplified createRDD return a static type of RDD[(K, V)],
that's what I'm saying.
You're already going to have to deal with those other type
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72773257
To be clear, I'm ok with any solution that gives me access to what I need,
which in this case are offsets.
What's coming across as me feeling strongly about
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-72775631
Hey man, I'd rather talk about the code anyway. I think there's just
something I'm missing as far as your underlying assumptions about interfaces go
:) Thanks
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23418815
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala
---
@@ -0,0 +1,123
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-71122072
I need to know, perhaps even at the driver, what the ending offset is in
order to be able to commit it.
I also have several use cases where I want to end
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-71120030
Yeah, it's pulled down every batch interval. That way you know exactly
what the upper and lower bounds of the offsets are.
On Thu, Jan 22, 2015 at 5:15 PM
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-71131380
Point is, it's up to client code to commit, so that it can implement
exactly-once semantics if necessary. Committing automatically at the end
of compute would
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3631#issuecomment-73728226
This will need some changes to KafkaCluster and possibly other things
related to the new api... let me know if you want a hand.
---
If your project is set up
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3631#issuecomment-73959340
@helena I updated it, pr is at https://github.com/apache/spark/pull/4537
---
If your project is set up for it, you can reply to this email and have your
reply appear
GitHub user koeninger opened a pull request:
https://github.com/apache/spark/pull/4537
[SPARK-2808][Streaming][Kafka] update kafka to 0.8.2
i don't think this should be merged until after 1.3.0 is final
You can merge this pull request into a Git repository by running:
$ git
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/4511#discussion_r24472248
--- Diff:
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala
---
@@ -40,43 +41,70 @@ class KafkaRDDSuite extends
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/4537#issuecomment-74199642
I believe that it will not be merged into 1.3
On Feb 12, 2015 8:13 PM, zzcclp notificati...@github.com wrote:
Will this RP be merged into 1.3.0
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/4537#issuecomment-75187343
Spark 1.3.0 was already supposed to be frozen a while back as far as I know.
My personal gut feeling is that it'd be better to wait for kafka 0.8.2.1
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23576495
--- Diff:
external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaRDD.scala ---
@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23726952
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,116 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23737948
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,116 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23729727
--- Diff:
external/kafka/src/main/scala/org/apache/spark/rdd/kafka/OffsetRange.scala ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23729934
--- Diff:
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala
---
@@ -130,7 +130,7 @@ abstract class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23729871
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,116 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23746235
--- Diff:
external/kafka/src/main/scala/org/apache/spark/rdd/kafka/OffsetRange.scala ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23746262
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,116 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23746302
--- Diff:
external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala
---
@@ -130,7 +130,7 @@ abstract class
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23746360
--- Diff:
external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala ---
@@ -0,0 +1,320 @@
+/*
+ * Licensed to the Apache Software
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-71968759
packaging, makes sense
method name, agreed, named it createNewStream for now
offset range, see my explanation of the interface above. I think
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23748806
--- Diff:
external/kafka/src/main/scala/org/apache/spark/rdd/kafka/OffsetRange.scala ---
@@ -0,0 +1,70 @@
+/*
+ * Licensed to the Apache Software
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23889820
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala
---
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23890348
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala
---
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23890594
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala
---
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23904918
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,249 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23904616
--- Diff:
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala
---
@@ -144,4 +150,249 @@ object KafkaUtils
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r23616474
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -788,6 +788,20 @@ abstract class RDD[T: ClassTag
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-71667244
Most of Either's problems can be fixed with a one-line implicit conversion
to RightProjection. I've seen scalactic before, seems like overkill by
comparison
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-71564796
I'm not a big fan of either either :)
The issue here is that KafkaCluster is potentially dealing with multiple
exceptions due to multiple brokers
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-71565526
I think as long as offsets are available for advanced users that want them,
relying on checkpointing for the happy path should be ok. Will probably be
some
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-68804547
I'm hopeful that SPARK-4014 will be finalized soon, waiting on that before
doing the refactor for preferred locations. That will involve changing the
partition
Github user koeninger commented on a diff in the pull request:
https://github.com/apache/spark/pull/3798#discussion_r22498572
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -788,6 +788,20 @@ abstract class RDD[T: ClassTag
Github user koeninger commented on the pull request:
https://github.com/apache/spark/pull/3798#issuecomment-69446001
I went ahead and implemented locality and checkpointing of generated rdds.
Couple of points
- still depends on SPARK-4014 eventually being merged
1 - 100 of 692 matches
Mail list logo