[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-11 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/2345#issuecomment-55336261 @marbrus I see what you mean. Updated to basically what you suggested, aside from building the map once. Let me know, once it's finalized I can try to test one more

[GitHub] spark pull request: SPARK-1097: Do not introduce deadlock while fi...

2014-07-15 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/1409#issuecomment-49066268 Testing that patch, it seems to have fixed the deadlock we were seeing in production. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17279539 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JdbcResultSetRDDSuite.scala --- @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17279806 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -81,8 +113,14 @@ class JdbcRDD[T: ClassTag]( logInfo(statement fetch size

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17280404 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag]( }).toArray

[GitHub] spark pull request: [SPARK-2710] [SQL] Build SchemaRDD from a Jdbc...

2014-09-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/1612#discussion_r17280425 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -67,6 +73,32 @@ class JdbcRDD[T: ClassTag]( }).toArray

[GitHub] spark pull request: SPARK-3462 push down filters and projections i...

2014-09-10 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/2345 SPARK-3462 push down filters and projections into Unions You can merge this pull request into a Git repository by running: $ git pull https://github.com/mediacrossinginc/spark SPARK-3462

[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-11-04 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/3102 Spark 4229 Create hadoop configuration in a consistent way You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK-4229

[GitHub] spark pull request: Closes SPARK-4229 Create hadoop configuration ...

2014-12-01 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/3543 Closes SPARK-4229 Create hadoop configuration in a consistent way You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK

[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-12-01 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3102#issuecomment-65176731 Yes, the new hadoop config documentation is just documenting the behavior of SparkHadoopUtil.scala lines 95-100 Sorry about the branch situation, I was unclear

[GitHub] spark pull request: Spark 4229 Create hadoop configuration in a co...

2014-12-01 Thread koeninger
Github user koeninger closed the pull request at: https://github.com/apache/spark/pull/3102 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21192361 --- Diff: docs/configuration.md --- @@ -664,6 +665,24 @@ Apart from these, the following properties are also available, and may be useful /td

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21610025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3543#discussion_r21638810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -262,7 +263,7 @@ class SQLContext(@transient val sparkContext: SparkContext

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2014-12-17 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-67337516 Jenkins is failing org.apache.spark.scheduler.SparkListenerSuite.local metrics org.apache.spark.streaming.flume.FlumeStreamSuite.flume input compressed

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-24 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/3798 [SPARK-4964] [Streaming] Exactly-once semantics for Kafka You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 kafkaRdd

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-26 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-68149432 Hi @jerryshao I'd politely ask that anyone with questions read at least KafkaRDD.scala and the example usage linked from the jira ticket (it's only about 50

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22330325 --- Diff: external/kafka/pom.xml --- @@ -44,7 +44,7 @@ dependency groupIdorg.apache.kafka/groupId artifactIdkafka_

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-68307147 I got some good feedback from Koert Kuipers at Tresata regarding location awareness, so I'll be doing some refactoring to add that. --- If your project is set up

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22362375 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaRDD.scala --- @@ -0,0 +1,157 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22364279 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,123

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22364358 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -788,6 +788,20 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22366068 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,123

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22366140 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala --- @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2014-12-30 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22366683 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala --- @@ -0,0 +1,305 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68405794 Thanks for this. Most of the uses of attemptId I've seen look like they were assuming it meant the 0-based attempt number. --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-4014] Change TaskContext.attemptId to r...

2014-12-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3849#issuecomment-68407388 The flip side is that it's already documented as doing the right thing: http://spark.apache.org/docs/1.1.1/api/scala/index.html#org.apache.spark.TaskContext

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-26 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71549132 Just updated it On Mon, Jan 26, 2015 at 4:06 PM, Hari Shreedharan notificati...@github.com wrote: @koeninger https://github.com/koeninger Can you

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-04 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72893647 Here's a solution for subclassing ConsumerConfig while still silencing the warning. My son is doing ok(ish) now, thanks for the concern. --- If your project

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72780349 High level consumers connect to ZK. Simple consumers (which is what this is using) connect to brokers directly instead. See https://cwiki.apache.org

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72779615 Yeah, there's a weird distinction in Kafka between simple consumers and high level consumers in that they have a lot of common configuration parameters, but one

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72790044 The warning is for metadata.broker.list, since its not expected by the existing ConsumerConfig (its used by other config classes) Couldn't get subclassing

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72783141 Yeah, more importantly it's so defaults for things like connection timeouts match what kafka provides. It's possible to assign fake zookeeper.connect

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24142354 --- Diff: examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23974964 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,150

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23972944 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala --- @@ -0,0 +1,338 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23972610 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,150

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24268035 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24019631 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24018460 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72691392 Regarding naming, I agree. The name has been a point of discussion for a month, how to get some consensus? Regarding Java wrappers, there have already been

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24017652 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24019904 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24019256 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24017456 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,249 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24019815 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24018579 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24020676 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24021621 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala --- @@ -0,0 +1,344 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24020938 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaClusterSuite.scala --- @@ -0,0 +1,73 @@ +/* + * Licensed

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24021039 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24030347 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24031550 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,249 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24033509 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r24031208 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,174 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4384#issuecomment-73269084 Thanks for adding the java friendly kafka uitls methods. Your original reason for wanting Array[Leader] rather than Map[TopicAndPartition, Broker] was for java

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24252680 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24252952 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24253794 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -19,16 +19,35 @@ package

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24251761 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -36,14 +36,12 @@ import kafka.utils.VerifiableProperties

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72758821 Besides introducing 2 classes where 1 would do, it implies that there are (or could be) multiple implementations of the abstract class. You're not using

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72760900 To put it another way, the type you return has to be public. If you return a public abstract class, what are you going to do when someone else subclasses

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72754442 Like patrick said, I really don't see any reason not to just expose KafkaRDD. You can still hide its constructor without making a superflous abstract class

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72757731 Just make the simplified createRDD return a static type of RDD[(K, V)], that's what I'm saying. You're already going to have to deal with those other type

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72773257 To be clear, I'm ok with any solution that gives me access to what I need, which in this case are offsets. What's coming across as me feeling strongly about

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-72775631 Hey man, I'd rather talk about the code anyway. I think there's just something I'm missing as far as your underlying assumptions about interfaces go :) Thanks

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-22 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23418815 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DeterministicKafkaInputDStream.scala --- @@ -0,0 +1,123

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-22 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71122072 I need to know, perhaps even at the driver, what the ending offset is in order to be able to commit it. I also have several use cases where I want to end

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-22 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71120030 Yeah, it's pulled down every batch interval. That way you know exactly what the upper and lower bounds of the offsets are. On Thu, Jan 22, 2015 at 5:15 PM

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-22 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71131380 Point is, it's up to client code to commit, so that it can implement exactly-once semantics if necessary. Committing automatically at the end of compute would

[GitHub] spark pull request: #SPARK-2808 update kafka to version 0.8.2

2015-02-10 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-73728226 This will need some changes to KafkaCluster and possibly other things related to the new api... let me know if you want a hand. --- If your project is set up

[GitHub] spark pull request: #SPARK-2808 update kafka to version 0.8.2

2015-02-11 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-73959340 @helena I updated it, pr is at https://github.com/apache/spark/pull/4537 --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-11 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/4537 [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2 i don't think this should be merged until after 1.3.0 is final You can merge this pull request into a Git repository by running: $ git

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

2015-02-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4511#discussion_r24472248 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -40,43 +41,70 @@ class KafkaRDDSuite extends

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-74199642 I believe that it will not be merged into 1.3 On Feb 12, 2015 8:13 PM, zzcclp notificati...@github.com wrote: Will this RP be merged into 1.3.0

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-75187343 Spark 1.3.0 was already supposed to be frozen a while back as far as I know. My personal gut feeling is that it'd be better to wait for kafka 0.8.2.1

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-26 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23576495 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaRDD.scala --- @@ -0,0 +1,199 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23726952 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,116 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23737948 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,116 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23729727 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/OffsetRange.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23729934 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala --- @@ -130,7 +130,7 @@ abstract class

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23729871 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,116 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23746235 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/OffsetRange.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23746262 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,116 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23746302 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala --- @@ -130,7 +130,7 @@ abstract class

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23746360 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/KafkaCluster.scala --- @@ -0,0 +1,320 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71968759 packaging, makes sense method name, agreed, named it createNewStream for now offset range, see my explanation of the interface above. I think

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-28 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23748806 --- Diff: external/kafka/src/main/scala/org/apache/spark/rdd/kafka/OffsetRange.scala --- @@ -0,0 +1,70 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-31 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23889820 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-31 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23890348 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-31 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23890594 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23904918 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,249 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-02-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23904616 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -144,4 +150,249 @@ object KafkaUtils

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-27 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r23616474 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -788,6 +788,20 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-27 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71667244 Most of Either's problems can be fixed with a one-line implicit conversion to RightProjection. I've seen scalactic before, seems like overkill by comparison

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-26 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71564796 I'm not a big fan of either either :) The issue here is that KafkaCluster is potentially dealing with multiple exceptions due to multiple brokers

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-26 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-71565526 I think as long as offsets are available for advanced users that want them, relying on checkpointing for the happy path should be ok. Will probably be some

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-68804547 I'm hopeful that SPARK-4014 will be finalized soon, waiting on that before doing the refactor for preferred locations. That will involve changing the partition

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-05 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/3798#discussion_r22498572 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -788,6 +788,20 @@ abstract class RDD[T: ClassTag

[GitHub] spark pull request: [SPARK-4964] [Streaming] Exactly-once semantic...

2015-01-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3798#issuecomment-69446001 I went ahead and implemented locality and checkpointing of generated rdds. Couple of points - still depends on SPARK-4014 eventually being merged

  1   2   3   4   5   6   7   >