[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126166935 streaming only pr is at https://github.com/apache/spark/pull/7772 --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-4229][Streaming] consistent hadoop conf...

2015-07-29 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/7772 [SPARK-4229][Streaming] consistent hadoop configuration, streaming only You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-126161394 Changing this jira to be streaming only and making another for thread safety issues still leaves all the inconsistent calls to new Configuration in SQL, and probably

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-125977592 If we're talking about this issue https://issues.apache.org/jira/browse/HADOOP-11209 unless there's something arcade about hadoop's jira, it looks lik

[GitHub] spark pull request: SPARK-9059 Update Direct Kafka Word count exam...

2015-07-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7467#issuecomment-122551255 I thought the idea for this ticket was to have separate examples for accessing offset ranges. The complexity of offsets doesn't really have anything

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-16 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-122076899 Except that those streaming changes call into SparkHadoopUtil, which was changed in that PR for thread safety reasons. HadoopRDD was changed so there was only 1 lock

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-07-16 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-122010820 Just to be clear, are we talking about removing just the one-line changes to SQLContext and JavaSQLContext? Everything else in the PR I think is necessary in

[GitHub] spark pull request: [SPARK-8865][STREAMING] FIX BUG: check key in ...

2015-07-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7254#issuecomment-120150880 In case it wasn't clear, LGTM I don't think it needs a test case, because it would be testing behavior that spark doesn't care about (we don&#x

[GitHub] spark pull request: [SPARK-8865][STREAMING] FIX BUG: check key in ...

2015-07-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7254#issuecomment-120150575 "zookeeper.connect" and "group.id" aren't necessary for anything in the kafka direct stream. But they're expected to be pres

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-08 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7185#issuecomment-119735193 Yes, you're understanding things correctly. Yes, scala works that way as well. On Wed, Jul 8, 2015 at 4:20 PM, amit-ramesh

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7185#issuecomment-118987704 In general, I feel like this is solving the problem of too many wrappers by adding more wrappers. I don't know that it's worth it just to get an instan

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/7185#discussion_r33977640 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -670,4 +670,17 @@ private class KafkaUtilsPythonHelper

[GitHub] spark pull request: [SPARK-8833][STREAMING][WIP] Kafka Direct API ...

2015-07-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7235#issuecomment-118871369 @guowei2 do you want to open a separate jira / pull request for that containsKey fix? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-8833][STREAMING][WIP] Kafka Direct API ...

2015-07-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7235#issuecomment-118864216 The containsKey thing is a good catch. But this basic idea has been discussed before and rejected: https://issues.apache.org/jira/browse/SPARK-6249

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-02 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/7185#issuecomment-118188461 I don't have a problem with the static method, especially if it prevents yet more wrapper classes. My concern was more about the difference in sema

[GitHub] spark pull request: [SPARK-8389][Streaming][PySpark] Expose KafkaR...

2015-07-02 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/7185#discussion_r33779900 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -670,4 +670,9 @@ private class KafkaUtilsPythonHelper

[GitHub] spark pull request: [SPARK-8127][Streaming][Kafka] KafkaRDD optimi...

2015-06-24 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-114982933 Cheers :) On Wed, Jun 24, 2015 at 2:06 PM, Tathagata Das wrote: > I forgot to say, thanks Cody! :) > > — > Reply t

[GitHub] spark pull request: [SPARK-8127][Streaming][Kafka] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113690742 fixed title --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8389] Example of gett...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6846#issuecomment-113685902 My original pr was against master On Fri, Jun 19, 2015 at 7:27 PM, Tathagata Das wrote: > Was this committed to branch-1.4? >

[GitHub] spark pull request: [SPARK-8390][Streaming][Kafka] fix docs relate...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6863#issuecomment-113663382 Title should be right now On Jun 19, 2015 4:26 PM, "Tathagata Das" wrote: > Could you update title to get the ordering right? >

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32872490 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -68,6 +68,21 @@ class KafkaRDDSuite extends

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6863#issuecomment-113632591 The word count examples don't have any need of accessing offsets. Wouldn't it be better to have separate examples? I don't want someone thi

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32859841 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -399,7 +418,7 @@ object KafkaUtils { val kc

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r32859740 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -158,15 +158,30 @@ object KafkaUtils

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6632#issuecomment-113562935 @tdas is there anything else you feel needs to be done on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8389] Example of gett...

2015-06-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6846#issuecomment-113526956 Thanks. Not sure if the python side of things is going to continue on SPARK-8389 or on SPARK-8337, but I think this takes care of the java side for now. --- If

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-06-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-113335277 As far as I know, its still an issue - by default, any checkpoint that relies on hdfs config (e.g. s3 password) won't recover On Jun 18, 2015 6:

[GitHub] spark pull request: [SPARK-8320] [Streaming] Add example in stream...

2015-06-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6862#issuecomment-113184801 I'm not a python programmer, but isn't the direct translation of that kafkaStreams = map(lambda _:KafkaUtils.createStream(...), range(0,

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...

2015-06-17 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6863#issuecomment-113022865 Yeah, the spacing in that document in general is a mess (mix of tabs and spaces, some 2 spaces between sentences, etc). I cleaned it up somewhat. Also further fixed

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...

2015-06-17 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/6863#issuecomment-112978856 I don't think a doc change caused a python test failure. On Jun 17, 2015 5:27 PM, "UCB AMPLab" wrote: > Merged build finished. Test FAI

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8390] fix docs relate...

2015-06-17 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/6863 [Streaming][Kafka][SPARK-8390] fix docs related to HasOffsetRanges You can merge this pull request into a Git repository by running: $ git pull https://github.com/koeninger/spark-1 SPARK

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8389] Example of gett...

2015-06-17 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6846#discussion_r32620989 --- Diff: external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaDirectKafkaStreamSuite.java --- @@ -89,6 +90,16 @@ public void testKafkaStream

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8389] Example of gett...

2015-06-16 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/6846 [Streaming][Kafka][SPARK-8389] Example of getting offset ranges out o… …f the existing java direct stream api You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: [Streaming][Kafka][SPARK-8127] KafkaRDD optimi...

2015-06-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/6632#discussion_r31868393 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -60,6 +62,49 @@ class KafkaRDD[ }.toArray

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-06-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-109378718 I sort of doubt that wait timeout was related to the merge, the only conflict was the single line of MimaExcludes --- If your project is set up for it, you can reply

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-06-04 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-108903935 The only thing I can think of mima complaining about is that this patch is removing a method... even though it's a method in a private class that is only use

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-06-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-108709503 Another ping on this, even if it misses 1.4 Seeing waitUntilLeaderOffset all over the place in test code I'm working on right now made me sad :( --- If

[GitHub] spark pull request: [Streaming][Kafka] Take advantage of offset ra...

2015-06-03 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/6632 [Streaming][Kafka] Take advantage of offset range info for size-relat… …ed KafkaRDD methods. Possible fix for [SPARK-7122], but probably a worthwhile optimization regardless. You can merge

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-05-13 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-101802768 @tdas just pinging on this to make sure it doesn't get lost in the shuffle, lmk if there's more explanation needed. --- If your project is set up for i

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-05-10 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-100741752 Is there anything you need to do that couldn't be accomplished by reading from / writing to ZK yourself? Is this just a question of convenient ap

[GitHub] spark pull request: [SPARK-7385][Core] Add RDD.foreachPartitionWit...

2015-05-08 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5927#issuecomment-100237401 I think if you're going to decide you really don't like withContext/withIndex etc they should be marked as deprecated, in addition to having a scaladoc re

[GitHub] spark pull request: [SPARK-7396][Streaming][Example] Update KafkaW...

2015-05-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5936#issuecomment-99482055 FWIW I ran the new version, and it works as well, I'm just concerned about how that exception was caused. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-7396][Streaming][Example] Update KafkaW...

2015-05-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5936#issuecomment-99478975 Not that there's necessarily anything wrong with updating the producer api being used... .. but I don't see how that exception has anything to do with t

[GitHub] spark pull request: [SPARK-7385][Core] Add RDD.foreachPartitionWit...

2015-05-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5927#issuecomment-99271021 @tdas yeah, Kafka transactional output was why I originally wanted to add it. Although that usage of taskcontext shown above is better than my

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-05-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5921#issuecomment-99229134 Pretty sure this error [error] oro#oro;2.0.8!oro.jar origin location must be absolute: isn't related to the commit --- If your project is set u

[GitHub] spark pull request: [Streaming][Kafka] cleanup tests from SPARK-28...

2015-05-05 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/5921 [Streaming][Kafka] cleanup tests from SPARK-2808 see if requiring producer acks eliminates the need for waitUntilLeaderOffset calls in tests You can merge this pull request into a Git repository

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-05-05 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-99081192 I just want to re-iterate from the jira discussion that I am thumbs-down to the idea of overloading the meaning of group.id, and thumbs-down to adding options that

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-05-01 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-98228003 At any rate, I tested it out locally against 0.8.1.1 server install, with the IdempotentExample job from my blogpost. Seemed to work ok. --- If your project is set

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-05-01 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-98212160 scalastyle passes locally for me... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-05-01 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4537#discussion_r29523447 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -53,14 +53,16 @@ class KafkaRDDSuite extends FunSuite

[GitHub] spark pull request: [SPARK-7255] [Streaming] [Documentation] Added...

2015-04-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5808#issuecomment-97930903 I think the code, the name of the conf, and the default setting are all correct. 1 retry means try initially, retry once, then give up, for a total of 2

[GitHub] spark pull request: [SPARK-7255] [Streaming] [Documentation] Added...

2015-04-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5808#issuecomment-97907134 Yknow, I just looked at the code for other reasons, and it is the number of retries, not tries (so the default setting of 1 means a max of 2 consecutive attempts

[GitHub] spark pull request: [SPARK-7255] [Streaming] [Documentation] Added...

2015-04-30 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/5808#issuecomment-97883079 Somewhat nitpicky, but I'd say something like "maximum number of consecutive trials", just to make it clear it's not a limit for the

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97580085 No, it works fine locally: Running PySpark tests. Output is in python/unit-tests.log. Testing with Python version: Python 2.6.6 Run

[GitHub] spark pull request: [SPARK-4122][STREAMING] Add a library that can...

2015-04-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/2994#issuecomment-97578847 I have a branch of the directStream api that caches consumers. It had no noticeable impact on processing time. Even at 100 partitions / 200ms batches on a

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97547514 At this point, it looks like the java and scala tests are passing, but the python tests are timing out. Console output looks like

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97453300 I can do some basic testing against a non-embedded 0.8.1 kafka install. We're not likely to upgrade for our production jobs, so if some of the peopl

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4537#discussion_r29342107 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -53,14 +53,16 @@ class KafkaRDDSuite extends FunSuite

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-29 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4537#discussion_r29341062 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaCluster.scala --- @@ -220,12 +220,22 @@ class KafkaCluster(val kafkaParams

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-28 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97123137 @srowen I imagine things are pretty busy, but can you verify this to test on jenkins? @luisobo you could build and test it out --- If your project is set up for

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-28 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-97085929 @zzcclp the 1.4.0 deadline is Friday, I believe. I fixed the merge conflicts and resolved the MiMa issue (as far as I know), it still passes local tests for me

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-22 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-95180356 Glad to hear it works for you. Fixing the default arguments for mima was straightforward, but there's a lot that has changed in the test code. Work's

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-15 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-93478250 That maybe makes a certain amount of sense... I'll try replacing the default arguments with multiple overloaded methods, see if that passes mima. --- If your pr

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-15 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-93472230 As far as I can tell from jenkins output, it failed during MiMa checks. But if I run sbt '; project streaming-kafka; mima-report-binary-i

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-04-03 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-89301901 @zzcclp if you want to help try and figure out how reproduce this test failure outside of Jenkins, go for it. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-03-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-83021357 Those tests have passed locally for me 3 times in a row... if I get time later I'll try to dig in --- If your project is set up for it, you can reply to this

[GitHub] spark pull request: [SPARK-4229] Create hadoop configuration in a ...

2015-03-18 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3543#issuecomment-82923505 I think its mostly a question of whether committers are comfortable with a PR that changes all of the uses of new Configuration. At this point it'd pro

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26150162 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -118,6 +123,7 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-10 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-78053440 I am now confused about what the purpose of this PR is. The jira seemed to indicate that the problem was "several third-party offset monitoring tools fail to mo

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26120607 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -84,6 +83,11 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26120488 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -118,6 +123,7 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26048624 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -118,6 +123,7 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-09 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26048829 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -84,6 +83,11 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-09 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4805#issuecomment-77882744 As it stands now, no offsets are stored by spark unless you're checkpointing. Does it really make sense to have an option to automatically store offse

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-08 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r26012160 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25991083 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990950 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990850 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/SparkKafkaUtils.scala --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990774 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/ReliableKafkaReceiver.scala --- @@ -239,21 +239,7 @@ class ReliableKafkaReceiver

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990749 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990695 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990683 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -158,4 +166,37 @@ class

[GitHub] spark pull request: [SPARK-6051][Streaming] Add ZooKeeper offest p...

2015-03-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4805#discussion_r25990568 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/DirectKafkaInputDStream.scala --- @@ -82,8 +83,12 @@ class

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-19 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-75187343 Spark 1.3.0 was already supposed to be frozen a while back as far as I know. My personal gut feeling is that it'd be better to wait for kafka 0.

[GitHub] spark pull request: [SPARK-5731][Streaming][Test] Fix incorrect te...

2015-02-13 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4597#issuecomment-74341376 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-12 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-74199642 I believe that it will not be merged into 1.3 On Feb 12, 2015 8:13 PM, "zzcclp" wrote: > Will this RP be merged into 1.3.0? > >

[GitHub] spark pull request: #SPARK-2808 update kafka to version 0.8.2

2015-02-11 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-73959340 @helena I updated it, pr is at https://github.com/apache/spark/pull/4537 --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-11 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/4537 [SPARK-2808][Streaming][Kafka] update kafka to 0.8.2 i don't think this should be merged until after 1.3.0 is final You can merge this pull request into a Git repository by running:

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

2015-02-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4511#discussion_r24472248 --- Diff: external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaRDDSuite.scala --- @@ -40,43 +41,70 @@ class KafkaRDDSuite extends

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

2015-02-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4511#discussion_r24459792 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -154,6 +154,19 @@ object KafkaUtils { jssc.ssc

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

2015-02-10 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4511#discussion_r24459700 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -211,12 +220,17 @@ object KafkaUtils { sc

[GitHub] spark pull request: [SPARK-4964] [Streaming] refactor createRDD to...

2015-02-10 Thread koeninger
GitHub user koeninger opened a pull request: https://github.com/apache/spark/pull/4511 [SPARK-4964] [Streaming] refactor createRDD to take leaders via map instead of array You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] spark pull request: #SPARK-2808 update kafka to version 0.8.2

2015-02-10 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-73728226 This will need some changes to KafkaCluster and possibly other things related to the new api... let me know if you want a hand. --- If your project is set up for it

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24268035 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24253794 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/OffsetRange.scala --- @@ -19,16 +19,35 @@ package

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on the pull request: https://github.com/apache/spark/pull/4384#issuecomment-73269084 Thanks for adding the java friendly kafka uitls methods. Your original reason for wanting Array[Leader] rather than Map[TopicAndPartition, Broker] was for java

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24252952 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24252680 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala --- @@ -179,121 +182,190 @@ object KafkaUtils { errs

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-06 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24251761 --- Diff: external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaRDD.scala --- @@ -36,14 +36,12 @@ import kafka.utils.VerifiableProperties

[GitHub] spark pull request: [SPARK-4964][Streaming][Kafka] More updates to...

2015-02-04 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/4384#discussion_r24142354 --- Diff: examples/scala-2.10/src/main/scala/org/apache/spark/examples/streaming/DirectKafkaWordCount.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed

<    1   2   3   4   5   6   7   >