python API in Spark-streaming-kafka spark 3.2.1

2022-03-07 Thread Wiśniewski Michał
Hi, I've read in the documentation, that since spark 3.2.1 python API for spark-streaming-kafka is back in the game. https://spark.apache.org/docs/3.2.1/streaming-programming-guide.html#advanced-sources But in the Kafka Integration Guide there is no documentation for the python API. https

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-06 Thread Sethupathi T
ocument >> https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html and >> re-iterating the issue again for better understanding. >> spark-streaming-kafka-0-10 >> <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> >> k

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-06 Thread Gabor Somogyi
need to use > spark streaming (KafkaUtils.createDirectStream) than structured streaming > by following this document > https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html and > re-iterating the issue again for better understanding. > spark-streaming-kafka-0-10 > <https:/

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Sethupathi T
for better understanding. spark-streaming-kafka-0-10 <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> kafka connector prefix "spark-executor" + group.id for executors, driver uses original group id. *Here is the code where executor construct executor specific g

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Sethupathi T
for better understanding. spark-streaming-kafka-0-10 <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> kafka connector prefix "spark-executor" + group.id for executors, driver uses original group id. *Here is the code where executor construct executor specific g

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Gabor Somogyi
pre-configured, authorized consumer group), there is a scenario where we > want to use spark streaming to consume from secured kafka. so we have > decided to use spark-streaming-kafka-0-10 > <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> > (it > supports SS

[Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Sethupathi T
Hi Team, We have secured Kafka cluster (which only allows to consume from the pre-configured, authorized consumer group), there is a scenario where we want to use spark streaming to consume from secured kafka. so we have decided to use spark-streaming-kafka-0-10 <https://spark.apache.org/d

Spark streaming kafka source delay occasionally

2019-08-15 Thread ans
using kafka consumer, 2 mins batch, tasks process take 2 ~ 5 seconds in general, but a part of tasks take more than 40 seconds. I guess *CachedKafkaConsumer#poll* could be problem. private def poll(timeout: Long): Unit = { val p = consumer.poll(timeout) val r = p.records(topicPartition)

Why "spark-streaming-kafka-0-10" is still experimental?

2019-04-04 Thread Doaa Medhat
Dears, I'm working on a project that should integrate spark streaming with kafka using java. Currently the official documentation is confusing, it's not clear whether "spark-streaming-kafka-0-10" is safe to be used in production environment or not. According to "Spark St

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-30 Thread Cody Koeninger
afka-clients-1.0.0.jar:na] >> > at >> > >> > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1091) >> > ~[kafka-clients-1.0.0.jar:na] >> > at >> > >> > org.apache.spark.streaming.kafka010.DirectK

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Guillermo Ortiz Fernández
ava:1091) > > ~[kafka-clients-1.0.0.jar:na] > > at > > > org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.paranoidPoll(DirectKafkaInputDStream.scala:169) > > ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2] > > at > > > org.apache.spa

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Cody Koeninger
fka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1091) > ~[kafka-clients-1.0.0.jar:na] > at > org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.paranoidPoll(DirectKafkaInputDStream.scala:169) > ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.

Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Guillermo Ortiz Fernández
(DirectKafkaInputDStream.scala:169) ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2] at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:188) ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2] at org.apache.spark.streaming.kafka010

Re: spark 2.3.1 with kafka spark-streaming-kafka-0-10 (java.lang.AbstractMethodError)

2018-06-28 Thread Peter Liu
Hello there, I just upgraded to spark 2.3.1 from spark 2.2.1, ran my streaming workload and got the error (java.lang.AbstractMethodError) never seen before; check the error stack attached in (a) bellow. anyone knows if spark 2.3.1 works well with kafka spark-streaming-kafka-0-10? this link

spark streaming kafka not displaying data in local eclipse

2018-01-16 Thread vr spark
Hi, I have a simple Java program to read data from kafka using spark streaming. When i run it from eclipse on my mac, it is connecting to the zookeeper, bootstrap nodes, But its not displaying any data. it does not give any error. it just shows 18/01/16 20:49:15 INFO Executor: Finished task

Spark Streaming Kafka

2017-11-10 Thread Frank Staszak
Hi All, I’m new to streaming avro records and am parsing Avro from a Kafka direct stream with spark streaming 2.1.1, I was wondering if anyone could please suggest an API for decoding Avro records with Scala? I’ve found KafkaAvroDecoder, twitter/bijection and the Avro library, each seem to

Spark Streaming + Kafka + Hive: delayed

2017-09-20 Thread toletum
Hello. I have a process (python) that reads a kafka queue, for each record it checks in a table. # Load table in memory table=sqlContext.sql("select id from table") table.cache() kafkaTopic.foreachRDD(processForeach) def processForeach (time, rdd): print(time) for k in rdd.collect (): if

Spark Streaming Kafka Job has strange behavior for certain tasks

2017-04-05 Thread Justin Miller
Greetings! I've been running various spark streaming jobs to persist data from kafka topics and one persister in particular seems to have issues. I've verified that the number of messages is the same per partition (roughly of course) and the volume of data is a fraction of the volume of other

Re: Spark streaming + kafka error with json library

2017-03-30 Thread Srikanth
Thanks for the tip. That worked. When would one use the assembly? On Wed, Mar 29, 2017 at 7:13 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Try depending on "spark-streaming-kafka-0-10_2.11" (not the assembly) > > On Wed, Mar 29, 2017 at 9:59 AM, Srikan

Re: Spark streaming + kafka error with json library

2017-03-29 Thread Tathagata Das
Try depending on "spark-streaming-kafka-0-10_2.11" (not the assembly) On Wed, Mar 29, 2017 at 9:59 AM, Srikanth <srikanth...@gmail.com> wrote: > Hello, > > I'm trying to use "org.json4s" % "json4s-native" library in a spark > streaming + k

Spark streaming + kafka error with json library

2017-03-29 Thread Srikanth
Hello, I'm trying to use "org.json4s" % "json4s-native" library in a spark streaming + kafka direct app. When I use the latest version of the lib I get an error similar to this <https://github.com/json4s/json4s/issues/316> The work around suggest there is to use ve

[ Spark Streaming & Kafka 0.10 ] Possible bug

2017-03-22 Thread Afshartous, Nick
Hi, I think I'm seeing a bug in the context of upgrading to using the Kafka 0.10 streaming API. Code fragments follow. -- Nick JavaInputDStream> rawStream = getDirectKafkaStream(); JavaDStream> messagesTuple =

Re: [Spark Streaming+Kafka][How-to]

2017-03-22 Thread Cody Koeninger
Glad you got it worked out. That's cool as long as your use case doesn't actually require e.g. partition 0 to always be scheduled to the same executor across different batches. On Tue, Mar 21, 2017 at 7:35 PM, OUASSAIDI, Sami wrote: > So it worked quite well with a

Re: [Spark Streaming+Kafka][How-to]

2017-03-21 Thread OUASSAIDI, Sami
So it worked quite well with a coalesce, I was able to find an solution to my problem : Altough not directly handling the executor a good roundaway was to assign the desired partition to a specific stream through assign strategy and coalesce to a single partition then repeat the same process for

Re: [Spark Streaming+Kafka][How-to]

2017-03-17 Thread Michael Armbrust
Another option that would avoid a shuffle would be to use assign and coalesce, running two separate streams. spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "...") .option("assign", """{t0: {"0": }, t1:{"0": x}}""") .load() .coalesce(1) .writeStream

Re: [Spark Streaming+Kafka][How-to]

2017-03-17 Thread OUASSAIDI, Sami
@Cody : Duly noted. @Michael Ambrust : A repartition is out of the question for our project as it would be a fairly expensive operation. We tried looking into targeting a specific executor so as to avoid this extra cost and directly have well partitioned data after consuming the kafka topics. Also

Re: [Spark Streaming+Kafka][How-to]

2017-03-17 Thread Michael Armbrust
Sorry, typo. Should be a repartition not a groupBy. > spark.readStream > .format("kafka") > .option("kafka.bootstrap.servers", "...") > .option("subscribe", "t0,t1") > .load() > .repartition($"partition") > .writeStream > .foreach(... code to write to cassandra ...) >

Re: [Spark Streaming+Kafka][How-to]

2017-03-16 Thread Michael Armbrust
I think it should be straightforward to express this using structured streaming. You could ensure that data from a given partition ID is processed serially by performing a group by on the partition column. spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "...")

Re: [Spark Streaming+Kafka][How-to]

2017-03-16 Thread Cody Koeninger
Spark just really isn't a good fit for trying to pin particular computation to a particular executor, especially if you're relying on that for correctness. On Thu, Mar 16, 2017 at 7:16 AM, OUASSAIDI, Sami wrote: > > Hi all, > > So I need to specify how an executor

[Spark Streaming+Kafka][How-to]

2017-03-16 Thread OUASSAIDI, Sami
Hi all, So I need to specify how an executor should consume data from a kafka topic. Let's say I have 2 topics : t0 and t1 with two partitions each, and two executors e0 and e1 (both can be on the same node so assign strategy does not work since in the case of a multi executor node it works

Re: Exception in spark streaming + kafka direct app

2017-02-07 Thread Srikanth
This is running in YARN cluster mode. It was restarted automatically and continued fine. I was trying to see what went wrong. AFAIK there were no task failure. Nothing in executor logs. The log I gave is in driver. After some digging, I did see that there was a rebalance in kafka logs around this

Re: Exception in spark streaming + kafka direct app

2017-02-07 Thread Tathagata Das
Does restarting after a few minutes solves the problem? Could be a transient issue that lasts long enough for spark task-level retries to all fail. On Tue, Feb 7, 2017 at 4:34 PM, Srikanth wrote: > Hello, > > I had a spark streaming app that reads from kafka running for a

Exception in spark streaming + kafka direct app

2017-02-07 Thread Srikanth
Hello, I had a spark streaming app that reads from kafka running for a few hours after which it failed with error *17/02/07 20:04:10 ERROR JobScheduler: Error generating jobs for time 148649785 ms java.lang.IllegalStateException: No current assignment for partition mt_event-5 at

Re: spark streaming kafka connector questions

2016-09-16 Thread 毅程
s for time > 1473491465000 ms (execution: 0.066 s) *(EVENT 2nd time process cost 0.066)* > > and the 2nd time processing of the event finished without really doing the > work. > > Help is hugely appreciated. > > > > -- > View this message in context: http://apache-spark-

Re: spark streaming kafka connector questions

2016-09-10 Thread Cody Koeninger
message in context: http://apache-spark-user-list. 1001560.n3.nabble.com/spark-streaming-kafka-connector- questions-tp27681p27687.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-

Re: spark streaming kafka connector questions

2016-09-10 Thread Cheng Yi
hed without really doing the work. Help is hugely appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafka-connector-questions-tp27681p27687.html Sent from the Apache Spark User List mailing list archive at Nabble.

Re: spark streaming kafka connector questions

2016-09-10 Thread 毅程
have tried to > pass > > following properties to KafkaUtils > > > > kafkaParams.put("auto.commit.enable", "true"); > > kafkaParams.put("auto.commit.interval.ms", "1000"); > > kafkaParams.put("zookeeper.session.timeout.ms", "6"); >

Re: spark streaming kafka connector questions

2016-09-08 Thread Cody Koeninger
auto.commit.enable", "true"); > kafkaParams.put("auto.commit.interval.ms", "1000"); > kafkaParams.put("zookeeper.session.timeout.ms", "6"); > kafkaParams.put("zookeeper.connection.timeout.ms", "6"); > > S

spark streaming kafka connector questions

2016-09-08 Thread Cheng Yi
t("auto.commit.interval.ms", "1000"); kafkaParams.put("zookeeper.session.timeout.ms", "6"); kafkaParams.put("zookeeper.connection.timeout.ms", "6"); Still not working. Help is greatly appreciated ! -- View this message in context

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin
GMT+08:00 Cody Koeninger <c...@koeninger.org>: > For 2.0, the kafka dstream support is in two separate subprojects > depending on which version of Kafka you are using > > spark-streaming-kafka-0-10 > or > spark-streaming-kafka-0-8 > > corresponding to brokers

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Cody Koeninger
For 2.0, the kafka dstream support is in two separate subprojects depending on which version of Kafka you are using spark-streaming-kafka-0-10 or spark-streaming-kafka-0-8 corresponding to brokers that are version 0.10+ or 0.8+ On Mon, Jul 25, 2016 at 12:29 PM, Reynold Xin <r...@databricks.

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Andy Davidson
support. We also use kafka Andy From: Marco Mistroni <mmistr...@gmail.com> Date: Monday, July 25, 2016 at 2:33 AM To: kevin <kiss.kevin...@gmail.com> Cc: "user @spark" <user@spark.apache.org>, "dev.spark" <d...@spark.apache.org> Subject: Re: where I can f

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Reynold Xin
iss.kevin...@gmail.com>: > >> hi,all : >> I try to run example org.apache.spark.examples.streaming.KafkaWordCount , >> I got error : >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/apache/spark/streaming/kafka/KafkaUtils$ >> at >> org.apache.spark.examples.

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Marco Mistroni
rk.examples.streaming.KafkaWordCount , >> I got error : >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/apache/spark/streaming/kafka/KafkaUtils$ >> at >> org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCoun

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin
I have compile it from source code 2016-07-25 12:05 GMT+08:00 kevin <kiss.kevin...@gmail.com>: > hi,all : > I try to run example org.apache.spark.examples.streaming.KafkaWordCount , > I got error : > Exception in thread "main" java.lang.NoClassDefFoundError: > o

where I can find spark-streaming-kafka for spark2.0

2016-07-24 Thread kevin
hi,all : I try to run example org.apache.spark.examples.streaming.KafkaWordCount , I got error : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$ at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCoun

Re: Spark streaming Kafka Direct API + Multiple consumers

2016-07-07 Thread Rabin Banerjee
(belonging to same consumer group) > > Regards, > Sam > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Kafka-Direct-API-Multiple-consumers-tp27305.html > Sent from the

Spark streaming Kafka Direct API + Multiple consumers

2016-07-07 Thread SamyaMaiti
Hi Team, Is there a way we can consume from Kafka using spark Streaming direct API using multiple consumers (belonging to same consumer group) Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Kafka-Direct-API-Multiple-consumers

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
you want to do this kind of thing, you will need to maintain your > >> own index from time to offset. > >> > >> On Wed, May 25, 2016 at 8:15 AM, trung kien <kient...@gmail.com> wrote: > >> > Hi all, > >> > > >> > Is there

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread Cody Koeninger
ent proposal for it but it has gotten pushed back to at least >> 0.10.1 >> >> If you want to do this kind of thing, you will need to maintain your >> own index from time to offset. >> >> On Wed, May 25, 2016 at 8:15 AM, trung kien <kient...@gmail.com> wrote: &

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
m> wrote: > > Hi all, > > > > Is there any way to re-compute using Spark Streaming - Kafka Direct > Approach > > from specific time? > > > > In some cases, I want to re-compute again from specific time (e.g > beginning > > of day)? is that possible? > > > > > > > > -- > > Thanks > > Kien >

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread Cody Koeninger
ent...@gmail.com> wrote: > Hi all, > > Is there any way to re-compute using Spark Streaming - Kafka Direct Approach > from specific time? > > In some cases, I want to re-compute again from specific time (e.g beginning > of day)? is that possible?

Re: Spark Streaming - Kafka - java.nio.BufferUnderflowException

2016-05-25 Thread Cody Koeninger
low error while trying to consume message from Kafka > through Spark streaming (Kafka direct API). This used to work OK when using > Spark standalone cluster manager. We're just switching to using Cloudera 5.7 > using Yarn to manage Spark cluster and started to see the below error. > >

Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien
Hi all, Is there any way to re-compute using Spark Streaming - Kafka Direct Approach from specific time? In some cases, I want to re-compute again from specific time (e.g beginning of day)? is that possible? -- Thanks Kien

Spark Streaming - Kafka - java.nio.BufferUnderflowException

2016-05-25 Thread Scott W
I'm running into below error while trying to consume message from Kafka through Spark streaming (Kafka direct API). This used to work OK when using Spark standalone cluster manager. We're just switching to using Cloudera 5.7 using Yarn to manage Spark cluster and started to see the below error

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Jason Nerothin
Let me be more detailed in my response: Kafka works on “at least once” semantics. Therefore, given your assumption that Kafka "will be operational", we can assume that at least once semantics will hold. At this point, it comes down to designing for consumer (really Spark Executor) resilience.

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Jason Nerothin
At that scale, it’s best not to do coordination at the application layer. How much of your data is transactional in nature {all, some, none}? By which I mean ACID-compliant. > On Apr 19, 2016, at 10:53 AM, Erwan ALLAIN wrote: > > Cody, you're right that was an

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Erwan ALLAIN
Cody, you're right that was an example. Target architecture would be 3 DCs :) Good point on ZK, I'll have to check that. About Spark, both instances will run at the same time but on different topics. That would be quite useless to have to 2DCs working on the same set of data. I just want, in case

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Cody Koeninger
Maybe I'm missing something, but I don't see how you get a quorum in only 2 datacenters (without splitbrain problem, etc). I also don't know how well ZK will work cross-datacenter. As far as the spark side of things goes, if it's idempotent, why not just run both instances all the time. On

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Erwan ALLAIN
I'm describing a disaster recovery but it can be used to make one datacenter offline for upgrade for instance. >From my point of view when DC2 crashes: *On Kafka side:* - kafka cluster will lose one or more broker (partition leader and replica) - partition leader lost will be reelected in the

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Jason Nerothin
It the main concern uptime or disaster recovery? > On Apr 19, 2016, at 9:12 AM, Cody Koeninger wrote: > > I think the bigger question is what happens to Kafka and your downstream data > store when DC2 crashes. > > From a Spark point of view, starting up a post-crash job in

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Cody Koeninger
I think the bigger question is what happens to Kafka and your downstream data store when DC2 crashes. >From a Spark point of view, starting up a post-crash job in a new data center isn't really different from starting up a post-crash job in the original data center. On Tue, Apr 19, 2016 at 3:32

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Erwan ALLAIN
Thanks Jason and Cody. I'll try to explain a bit better the Multi DC case. As I mentionned before, I'm planning to use one kafka cluster and 2 or more spark cluster distinct. Let's say we have the following DCs configuration in a nominal case. Kafka partitions are consumed uniformly by the 2

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-18 Thread Cody Koeninger
The current direct stream only handles exactly the partitions specified at startup. You'd have to restart the job if you changed partitions. https://issues.apache.org/jira/browse/SPARK-12177 has the ongoing work towards using the kafka 0.10 consumer, which would allow for dynamic topicparittions

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-18 Thread Jason Nerothin
Hi Erwan, You might consider InsightEdge: http://insightedge.io . It has the capability of doing WAN between data grids and would save you the work of having to re-invent the wheel. Additionally, RDDs can be shared between developers in the same DC. Thanks, Jason >

Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-18 Thread Erwan ALLAIN
Hello, I'm currently designing a solution where 2 distinct clusters Spark (2 datacenters) share the same Kafka (Kafka rack aware or manual broker repartition). The aims are - preventing DC crash: using kafka resiliency and consumer group mechanism (or else ?) - keeping consistent offset among

Re: java.nio.channels.ClosedChannelException in Spark Streaming KafKa Direct

2016-02-01 Thread Cody Koeninger
at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at

java.nio.channels.ClosedChannelException in Spark Streaming KafKa Direct

2016-02-01 Thread SRK
) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-nio-channels-ClosedChannelException-in-Spark-Streaming-KafKa-Direct-tp26124.html Sent from the Apache Spark

Recovery for Spark Streaming Kafka Direct with OffsetOutOfRangeException

2016-01-21 Thread Dan Dutrow
Hey Cody, I would have responded to the mailing list but it looks like this thread got aged off. I have the problem where one of my topics dumps more data than my spark job can keep up with. We limit the input rate with maxRatePerPartition Eventually, when the data is aged off, I get the

Re: Recovery for Spark Streaming Kafka Direct with OffsetOutOfRangeException

2016-01-21 Thread Cody Koeninger
Looks like this response did go to the list. As far as OffsetOutOfRange goes, right now that's an unrecoverable error, because it breaks the underlying invariants (e.g. that the number of messages in a partition is deterministic once the RDD is defined) If you want to do some hacking for your

Re: Spark Streaming + Kafka + scala job message read issue

2016-01-15 Thread vivek.meghanathan
nathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; duc.was.h...@gmail.com; user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue Hi Bryan, Yes we are using only 1 thread per topic as we have only one Kafka server with 1 partition. What kind of logs will tell us

RE: Spark Streaming + Kafka + scala job message read issue

2016-01-05 Thread vivek.meghanathan
Meghanathan (WT01 - NEP) Sent: 27 December 2015 11:08 To: Bryan <bryan.jeff...@gmail.com> Cc: Vivek Meghanathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; duc.was.h...@gmail.com; user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue Hi Bryan, Yes

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-26 Thread Bryan
...@gmail.com Cc: duc.was.h...@gmail.com; vivek.meghanat...@wipro.com; user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue Hi Brian,PhuDuc, All 8 jobs are consuming 8 different IN topics. 8 different Scala jobs running each topic map mentioned below has only 1 thread

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-26 Thread vivek.meghanathan
l.com> Cc: duc.was.h...@gmail.com<mailto:duc.was.h...@gmail.com>; vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark Streaming + Kafka + scala job message read issue Hi Brian,PhuDuc, All 8

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread vivek.meghanathan
Any help is highly appreciated, i am completely stuck here.. From: Vivek Meghanathan (WT01 - NEP) Sent: Thursday, December 24, 2015 7:50 PM To: Bryan; user@spark.apache.org Subject: RE: Spark Streaming + Kafka + scala job message read issue We are using

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread vivek.meghanathan
t; for Windows 10 phone From: PhuDuc Nguyen<mailto:duc.was.h...@gmail.com> Sent: Friday, December 25, 2015 3:35 PM To: vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark Streaming + Kafka +

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread Bryan
...@wipro.com Sent: Friday, December 25, 2015 2:18 PM To: bryan.jeff...@gmail.com; user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue Any help is highly appreciated, i am completely stuck here.. From: Vivek Meghanathan (WT01 - NEP) Sent: Thursday, December 24

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread PhuDuc Nguyen
gt; parse(line._2).extract[Search]) > > > > > > Regards, > Vivek M > > *From:* Bryan [mailto:bryan.jeff...@gmail.com] > *Sent:* 24 December 2015 17:20 > *To:* Vivek Meghanathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; > user@spark.apache.org > *Subjec

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread Bryan
Agreed. I did not see that they were using the same group name. Sent from Outlook Mail for Windows 10 phone From: PhuDuc Nguyen Sent: Friday, December 25, 2015 3:35 PM To: vivek.meghanat...@wipro.com Cc: user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-24 Thread Bryan
while missing data from other partitions. Regards, Bryan Jeffrey Sent from Outlook Mail for Windows 10 phone From: vivek.meghanat...@wipro.com Sent: Thursday, December 24, 2015 5:22 AM To: user@spark.apache.org Subject: Spark Streaming + Kafka + scala job message read issue Hi All, We

Spark Streaming + Kafka + scala job message read issue

2015-12-24 Thread vivek.meghanathan
Hi All, We are using Bitnami Kafka 0.8.2 + spark 1.5.2 in Google cloud platform. Our spark streaming job(consumer) not receiving all the messages sent to the specific topic. It receives 1 out of ~50 messages(added log in the job stream and identified). We are not seeing any errors in the

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-24 Thread vivek.meghanathan
anathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; user@spark.apache.org Subject: RE: Spark Streaming + Kafka + scala job message read issue Are you using a direct stream consumer, or the older receiver based consumer? If the latter, do the number of partitions you’ve specified fo

Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread Tao Li
I am using spark streaming kafka direct approach these days. I found that when I start the application, it always start consumer the latest offset. I hope that when application start, it consume from the offset last application consumes with the same kafka consumer group. It means I have

RE: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread Singh, Abhijeet
You need to maintain the offset yourself and rightly so in something like ZooKeeper. From: Tao Li [mailto:litao.bupt...@gmail.com] Sent: Tuesday, December 08, 2015 5:36 PM To: user@spark.apache.org Subject: Need to maintain the consumer offset by myself when using spark streaming kafka direct

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread Dibyendu Bhattacharya
on't want to. If the 2 > solutions above don't satisfy your requirements, then consider writing your > own; otherwise I would recommend using the supported features in Spark. > > HTH, > Duc > > > > On Tue, Dec 8, 2015 at 5:05 AM, Tao Li <litao.bupt...@gmail.com> w

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread PhuDuc Nguyen
, 2015 at 5:05 AM, Tao Li <litao.bupt...@gmail.com> wrote: > I am using spark streaming kafka direct approach these days. I found that > when I start the application, it always start consumer the latest offset. I > hope that when application start, it consume from the offset las

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-03 Thread Cody Koeninger
gt;>>>>>>> max retries are reached..in that case there might be data loss. >>>>>>>>>>>> >>>>>>>>>>>> 2.Catch that exception and somehow force things to "reset" for >&

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger
gt;> org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler@a48c5a8 >>>>>> rejected from java.util.concurrent.ThreadPoolExecutor@543258e0 >>>>>> [Terminated, >>>>>> pool size = 0, active threads = 0, queued tasks = 0, completed t

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya
on: >>>>>>> kafka.common.NotLeaderForPartitionException >>>>>>> >>>>>>> at >>>>>>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) >>>>>>> >>>>>>> >>>>>>> >

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger
onitor the log for >>>>>>>>>> this error, >>>>>>>>>> and if it occurs more than X times, kill the job, remove the >>>>

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger
; Set([test_stream,5])) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> java.lang.ClassNotFoundException: >>>>>>>> kafka.common.NotLeaderForPartitionException >>>>>>>> >>>&g

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya
m: >>>>>>>>> ArrayBuffer(kafka.common.UnknownException, >>>>>>>>> org.apache.spark.SparkException: Couldn't find leader offsets for &

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread swetha kasireddy
sk 7 >> in >> stage 33.0 failed 4 times, most recent failure: Lost task 7.3 in stage >> 33.0 >> (TID 283, 172.16.97.103): UnknownReason >> >> java.lang.ClassNotFoundException: kafka.common.OffsetOutOfRan

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread swetha kasireddy
>>>> 10 >>>> in stage 52.0 failed 4 times, most recent failure: Lost task 10.3 in >>>> stage >>>> 52.0 (TID 255, 172.16.97.97): UnknownReason >>>> >>>> Exception in thread "streaming-job-executor-0" java.lang.Error: >>>> java.l

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Cody Koeninger
tion >>> >>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) >>> >>> >>> >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task >>> 7 in >>> stage 33.0 failed 4 times, most recent failur

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread swetha kasireddy
ang.Error: >>> java.lang.InterruptedException >>> >>> Caused by: java.lang.InterruptedException >>> >>> java.lang.ClassNotFoundException: kafka.common.OffsetOutOfRangeException >>> >>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) &g

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Dibyendu Bhattacharya
= 0, active threads = 0, queued tasks = 0, completed tasks = >>>>> 12112] >>>>> >>>>> >>>>> >>>>> org.apache.spark.SparkException: Job aborted due to stage failure: >>>>> Task 10 >>>>> in stage 52.0 failed 4 times, most recent fa

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Cody Koeninger
n > > java.lang.ClassNotFoundException: kafka.common.OffsetOutOfRangeException > > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > > java.lang.ClassNotFoundException: kafka.common.OffsetOutOfRangeException > > at org.apache.spark.util.Utils$.logUncaughtExceptions(Uti

Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-11-30 Thread SRK
tOutOfRangeException at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Recovery-for-Spark-Streaming-Kafka-Direct-in-case-of-issues-with-Kafka-tp25524.html Sent from the Apache Spark User List

Re: Spark Streaming kafka directStream value decoder issue

2015-09-17 Thread srungarapu vamsi
data to be collected on the > driver (assuming you don’t want that…) > > val events = kafkaDStream.map { case(devId,byteArray)=> > KafkaGenericEvent.parseFrom(byteArray) } > > From: srungarapu vamsi > Date: Thursday, September 17, 2015 at 4:03 PM > To: user > Subject: Spa

  1   2   >