subject:"Spark Streaming Kafka"

python API in Spark-streaming-kafka spark 3.2.1

2022-03-07 Thread Wiśniewski Michał

Hi, I've read in the documentation, that since spark 3.2.1 python API for spark-streaming-kafka is back in the game. https://spark.apache.org/docs/3.2.1/streaming-programming-guide.html#advanced-sources But in the Kafka Integration Guide there is no documentation for the python API. https

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-06 Thread Sethupathi T

ocument >> https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html and >> re-iterating the issue again for better understanding. >> spark-streaming-kafka-0-10 >> <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> >> k

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-06 Thread Gabor Somogyi

need to use > spark streaming (KafkaUtils.createDirectStream) than structured streaming > by following this document > https://spark.apache.org/docs/2.2.0/streaming-kafka-0-10-integration.html and > re-iterating the issue again for better understanding. > spark-streaming-kafka-0-10 > <https:/

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Sethupathi T

for better understanding. spark-streaming-kafka-0-10 <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> kafka connector prefix "spark-executor" + group.id for executors, driver uses original group id. *Here is the code where executor construct executor specific g

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Sethupathi T

for better understanding. spark-streaming-kafka-0-10 <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> kafka connector prefix "spark-executor" + group.id for executors, driver uses original group id. *Here is the code where executor construct executor specific g

Re: [Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Gabor Somogyi

pre-configured, authorized consumer group), there is a scenario where we > want to use spark streaming to consume from secured kafka. so we have > decided to use spark-streaming-kafka-0-10 > <https://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html> > (it > supports SS

[Spark Streaming Kafka 0-10] - What was the reason for adding "spark-executor-" prefix to group id in executor configurations

2019-09-05 Thread Sethupathi T

Hi Team, We have secured Kafka cluster (which only allows to consume from the pre-configured, authorized consumer group), there is a scenario where we want to use spark streaming to consume from secured kafka. so we have decided to use spark-streaming-kafka-0-10 <https://spark.apache.org/d

Spark streaming kafka source delay occasionally

2019-08-15 Thread ans

using kafka consumer, 2 mins batch, tasks process take 2 ~ 5 seconds in general, but a part of tasks take more than 40 seconds. I guess *CachedKafkaConsumer#poll* could be problem. private def poll(timeout: Long): Unit = { val p = consumer.poll(timeout) val r = p.records(topicPartition)

Why "spark-streaming-kafka-0-10" is still experimental?

2019-04-04 Thread Doaa Medhat

Dears, I'm working on a project that should integrate spark streaming with kafka using java. Currently the official documentation is confusing, it's not clear whether "spark-streaming-kafka-0-10" is safe to be used in production environment or not. According to "Spark St

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-30 Thread Cody Koeninger

afka-clients-1.0.0.jar:na] >> > at >> > >> > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1091) >> > ~[kafka-clients-1.0.0.jar:na] >> > at >> > >> > org.apache.spark.streaming.kafka010.DirectK

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Guillermo Ortiz Fernández

ava:1091) > > ~[kafka-clients-1.0.0.jar:na] > > at > > > org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.paranoidPoll(DirectKafkaInputDStream.scala:169) > > ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2] > > at > > > org.apache.spa

Re: Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Cody Koeninger

fka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1091) > ~[kafka-clients-1.0.0.jar:na] > at > org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.paranoidPoll(DirectKafkaInputDStream.scala:169) > ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.

Spark Streaming - Kafka. java.lang.IllegalStateException: This consumer has already been closed.

2018-08-29 Thread Guillermo Ortiz Fernández

(DirectKafkaInputDStream.scala:169) ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2] at org.apache.spark.streaming.kafka010.DirectKafkaInputDStream.latestOffsets(DirectKafkaInputDStream.scala:188) ~[spark-streaming-kafka-0-10_2.11-2.0.2.jar:2.0.2] at org.apache.spark.streaming.kafka010

Re: spark 2.3.1 with kafka spark-streaming-kafka-0-10 (java.lang.AbstractMethodError)

2018-06-28 Thread Peter Liu

Hello there, I just upgraded to spark 2.3.1 from spark 2.2.1, ran my streaming workload and got the error (java.lang.AbstractMethodError) never seen before; check the error stack attached in (a) bellow. anyone knows if spark 2.3.1 works well with kafka spark-streaming-kafka-0-10? this link

spark streaming kafka not displaying data in local eclipse

2018-01-16 Thread vr spark

Hi, I have a simple Java program to read data from kafka using spark streaming. When i run it from eclipse on my mac, it is connecting to the zookeeper, bootstrap nodes, But its not displaying any data. it does not give any error. it just shows 18/01/16 20:49:15 INFO Executor: Finished task

Spark Streaming Kafka

2017-11-10 Thread Frank Staszak

Hi All, I’m new to streaming avro records and am parsing Avro from a Kafka direct stream with spark streaming 2.1.1, I was wondering if anyone could please suggest an API for decoding Avro records with Scala? I’ve found KafkaAvroDecoder, twitter/bijection and the Avro library, each seem to

Spark Streaming + Kafka + Hive: delayed

2017-09-20 Thread toletum

Hello. I have a process (python) that reads a kafka queue, for each record it checks in a table. # Load table in memory table=sqlContext.sql("select id from table") table.cache() kafkaTopic.foreachRDD(processForeach) def processForeach (time, rdd): print(time) for k in rdd.collect (): if

Spark Streaming Kafka Job has strange behavior for certain tasks

2017-04-05 Thread Justin Miller

Greetings! I've been running various spark streaming jobs to persist data from kafka topics and one persister in particular seems to have issues. I've verified that the number of messages is the same per partition (roughly of course) and the volume of data is a fraction of the volume of other

Re: Spark streaming + kafka error with json library

2017-03-30 Thread Srikanth

Thanks for the tip. That worked. When would one use the assembly? On Wed, Mar 29, 2017 at 7:13 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote: > Try depending on "spark-streaming-kafka-0-10_2.11" (not the assembly) > > On Wed, Mar 29, 2017 at 9:59 AM, Srikan

Re: Spark streaming + kafka error with json library

2017-03-29 Thread Tathagata Das

Try depending on "spark-streaming-kafka-0-10_2.11" (not the assembly) On Wed, Mar 29, 2017 at 9:59 AM, Srikanth <srikanth...@gmail.com> wrote: > Hello, > > I'm trying to use "org.json4s" % "json4s-native" library in a spark > streaming + k

Spark streaming + kafka error with json library

2017-03-29 Thread Srikanth

Hello, I'm trying to use "org.json4s" % "json4s-native" library in a spark streaming + kafka direct app. When I use the latest version of the lib I get an error similar to this <https://github.com/json4s/json4s/issues/316> The work around suggest there is to use ve

[ Spark Streaming & Kafka 0.10 ] Possible bug

2017-03-22 Thread Afshartous, Nick

Hi, I think I'm seeing a bug in the context of upgrading to using the Kafka 0.10 streaming API. Code fragments follow. -- Nick JavaInputDStream> rawStream = getDirectKafkaStream(); JavaDStream> messagesTuple =

Re: [Spark Streaming+Kafka][How-to]

2017-03-22 Thread Cody Koeninger

Glad you got it worked out. That's cool as long as your use case doesn't actually require e.g. partition 0 to always be scheduled to the same executor across different batches. On Tue, Mar 21, 2017 at 7:35 PM, OUASSAIDI, Sami wrote: > So it worked quite well with a

Re: [Spark Streaming+Kafka][How-to]

2017-03-21 Thread OUASSAIDI, Sami

So it worked quite well with a coalesce, I was able to find an solution to my problem : Altough not directly handling the executor a good roundaway was to assign the desired partition to a specific stream through assign strategy and coalesce to a single partition then repeat the same process for

Re: [Spark Streaming+Kafka][How-to]

2017-03-17 Thread Michael Armbrust

Another option that would avoid a shuffle would be to use assign and coalesce, running two separate streams. spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "...") .option("assign", """{t0: {"0": }, t1:{"0": x}}""") .load() .coalesce(1) .writeStream

Re: [Spark Streaming+Kafka][How-to]

2017-03-17 Thread OUASSAIDI, Sami

@Cody : Duly noted. @Michael Ambrust : A repartition is out of the question for our project as it would be a fairly expensive operation. We tried looking into targeting a specific executor so as to avoid this extra cost and directly have well partitioned data after consuming the kafka topics. Also

Re: [Spark Streaming+Kafka][How-to]

2017-03-17 Thread Michael Armbrust

Sorry, typo. Should be a repartition not a groupBy. > spark.readStream > .format("kafka") > .option("kafka.bootstrap.servers", "...") > .option("subscribe", "t0,t1") > .load() > .repartition($"partition") > .writeStream > .foreach(... code to write to cassandra ...) >

Re: [Spark Streaming+Kafka][How-to]

2017-03-16 Thread Michael Armbrust

I think it should be straightforward to express this using structured streaming. You could ensure that data from a given partition ID is processed serially by performing a group by on the partition column. spark.readStream .format("kafka") .option("kafka.bootstrap.servers", "...")

Re: [Spark Streaming+Kafka][How-to]

2017-03-16 Thread Cody Koeninger

Spark just really isn't a good fit for trying to pin particular computation to a particular executor, especially if you're relying on that for correctness. On Thu, Mar 16, 2017 at 7:16 AM, OUASSAIDI, Sami wrote: > > Hi all, > > So I need to specify how an executor

[Spark Streaming+Kafka][How-to]

2017-03-16 Thread OUASSAIDI, Sami

Hi all, So I need to specify how an executor should consume data from a kafka topic. Let's say I have 2 topics : t0 and t1 with two partitions each, and two executors e0 and e1 (both can be on the same node so assign strategy does not work since in the case of a multi executor node it works

Re: Exception in spark streaming + kafka direct app

2017-02-07 Thread Srikanth

This is running in YARN cluster mode. It was restarted automatically and continued fine. I was trying to see what went wrong. AFAIK there were no task failure. Nothing in executor logs. The log I gave is in driver. After some digging, I did see that there was a rebalance in kafka logs around this

Re: Exception in spark streaming + kafka direct app

2017-02-07 Thread Tathagata Das

Does restarting after a few minutes solves the problem? Could be a transient issue that lasts long enough for spark task-level retries to all fail. On Tue, Feb 7, 2017 at 4:34 PM, Srikanth wrote: > Hello, > > I had a spark streaming app that reads from kafka running for a

Exception in spark streaming + kafka direct app

2017-02-07 Thread Srikanth

Hello, I had a spark streaming app that reads from kafka running for a few hours after which it failed with error *17/02/07 20:04:10 ERROR JobScheduler: Error generating jobs for time 148649785 ms java.lang.IllegalStateException: No current assignment for partition mt_event-5 at

Re: spark streaming kafka connector questions

2016-09-16 Thread 毅程

s for time > 1473491465000 ms (execution: 0.066 s) *(EVENT 2nd time process cost 0.066)* > > and the 2nd time processing of the event finished without really doing the > work. > > Help is hugely appreciated. > > > > -- > View this message in context: http://apache-spark-

Re: spark streaming kafka connector questions

2016-09-10 Thread Cody Koeninger

message in context: http://apache-spark-user-list. 1001560.n3.nabble.com/spark-streaming-kafka-connector- questions-tp27681p27687.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-

Re: spark streaming kafka connector questions

2016-09-10 Thread Cheng Yi

hed without really doing the work. Help is hugely appreciated. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kafka-connector-questions-tp27681p27687.html Sent from the Apache Spark User List mailing list archive at Nabble.

Re: spark streaming kafka connector questions

2016-09-10 Thread 毅程

have tried to > pass > > following properties to KafkaUtils > > > > kafkaParams.put("auto.commit.enable", "true"); > > kafkaParams.put("auto.commit.interval.ms", "1000"); > > kafkaParams.put("zookeeper.session.timeout.ms", "6"); >

Re: spark streaming kafka connector questions

2016-09-08 Thread Cody Koeninger

auto.commit.enable", "true"); > kafkaParams.put("auto.commit.interval.ms", "1000"); > kafkaParams.put("zookeeper.session.timeout.ms", "6"); > kafkaParams.put("zookeeper.connection.timeout.ms", "6"); > > S

spark streaming kafka connector questions

2016-09-08 Thread Cheng Yi

t("auto.commit.interval.ms", "1000"); kafkaParams.put("zookeeper.session.timeout.ms", "6"); kafkaParams.put("zookeeper.connection.timeout.ms", "6"); Still not working. Help is greatly appreciated ! -- View this message in context

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin

GMT+08:00 Cody Koeninger <c...@koeninger.org>: > For 2.0, the kafka dstream support is in two separate subprojects > depending on which version of Kafka you are using > > spark-streaming-kafka-0-10 > or > spark-streaming-kafka-0-8 > > corresponding to brokers

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Cody Koeninger

For 2.0, the kafka dstream support is in two separate subprojects depending on which version of Kafka you are using spark-streaming-kafka-0-10 or spark-streaming-kafka-0-8 corresponding to brokers that are version 0.10+ or 0.8+ On Mon, Jul 25, 2016 at 12:29 PM, Reynold Xin <r...@databricks.

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Andy Davidson

support. We also use kafka Andy From: Marco Mistroni <mmistr...@gmail.com> Date: Monday, July 25, 2016 at 2:33 AM To: kevin <kiss.kevin...@gmail.com> Cc: "user @spark" <user@spark.apache.org>, "dev.spark" <d...@spark.apache.org> Subject: Re: where I can f

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Reynold Xin

iss.kevin...@gmail.com>: > >> hi,all : >> I try to run example org.apache.spark.examples.streaming.KafkaWordCount , >> I got error : >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/apache/spark/streaming/kafka/KafkaUtils$ >> at >> org.apache.spark.examples.

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread Marco Mistroni

rk.examples.streaming.KafkaWordCount , >> I got error : >> Exception in thread "main" java.lang.NoClassDefFoundError: >> org/apache/spark/streaming/kafka/KafkaUtils$ >> at >> org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCoun

Re: where I can find spark-streaming-kafka for spark2.0

2016-07-25 Thread kevin

I have compile it from source code 2016-07-25 12:05 GMT+08:00 kevin <kiss.kevin...@gmail.com>: > hi,all : > I try to run example org.apache.spark.examples.streaming.KafkaWordCount , > I got error : > Exception in thread "main" java.lang.NoClassDefFoundError: > o

where I can find spark-streaming-kafka for spark2.0

2016-07-24 Thread kevin

hi,all : I try to run example org.apache.spark.examples.streaming.KafkaWordCount , I got error : Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils$ at org.apache.spark.examples.streaming.KafkaWordCount$.main(KafkaWordCoun

Re: Spark streaming Kafka Direct API + Multiple consumers

2016-07-07 Thread Rabin Banerjee

(belonging to same consumer group) > > Regards, > Sam > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Kafka-Direct-API-Multiple-consumers-tp27305.html > Sent from the

Spark streaming Kafka Direct API + Multiple consumers

2016-07-07 Thread SamyaMaiti

Hi Team, Is there a way we can consume from Kafka using spark Streaming direct API using multiple consumers (belonging to same consumer group) Regards, Sam -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Kafka-Direct-API-Multiple-consumers

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien

you want to do this kind of thing, you will need to maintain your > >> own index from time to offset. > >> > >> On Wed, May 25, 2016 at 8:15 AM, trung kien <kient...@gmail.com> wrote: > >> > Hi all, > >> > > >> > Is there

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread Cody Koeninger

ent proposal for it but it has gotten pushed back to at least >> 0.10.1 >> >> If you want to do this kind of thing, you will need to maintain your >> own index from time to offset. >> >> On Wed, May 25, 2016 at 8:15 AM, trung kien <kient...@gmail.com> wrote: &

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien

m> wrote: > > Hi all, > > > > Is there any way to re-compute using Spark Streaming - Kafka Direct > Approach > > from specific time? > > > > In some cases, I want to re-compute again from specific time (e.g > beginning > > of day)? is that possible? > > > > > > > > -- > > Thanks > > Kien >

Re: Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread Cody Koeninger

ent...@gmail.com> wrote: > Hi all, > > Is there any way to re-compute using Spark Streaming - Kafka Direct Approach > from specific time? > > In some cases, I want to re-compute again from specific time (e.g beginning > of day)? is that possible?

Re: Spark Streaming - Kafka - java.nio.BufferUnderflowException

2016-05-25 Thread Cody Koeninger

low error while trying to consume message from Kafka > through Spark streaming (Kafka direct API). This used to work OK when using > Spark standalone cluster manager. We're just switching to using Cloudera 5.7 > using Yarn to manage Spark cluster and started to see the below error. > >

Spark Streaming - Kafka Direct Approach: re-compute from specific time

2016-05-25 Thread trung kien

Hi all, Is there any way to re-compute using Spark Streaming - Kafka Direct Approach from specific time? In some cases, I want to re-compute again from specific time (e.g beginning of day)? is that possible? -- Thanks Kien

Spark Streaming - Kafka - java.nio.BufferUnderflowException

2016-05-25 Thread Scott W

I'm running into below error while trying to consume message from Kafka through Spark streaming (Kafka direct API). This used to work OK when using Spark standalone cluster manager. We're just switching to using Cloudera 5.7 using Yarn to manage Spark cluster and started to see the below error

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Jason Nerothin

Let me be more detailed in my response: Kafka works on “at least once” semantics. Therefore, given your assumption that Kafka "will be operational", we can assume that at least once semantics will hold. At this point, it comes down to designing for consumer (really Spark Executor) resilience.

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Jason Nerothin

At that scale, it’s best not to do coordination at the application layer. How much of your data is transactional in nature {all, some, none}? By which I mean ACID-compliant. > On Apr 19, 2016, at 10:53 AM, Erwan ALLAIN wrote: > > Cody, you're right that was an

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Erwan ALLAIN

Cody, you're right that was an example. Target architecture would be 3 DCs :) Good point on ZK, I'll have to check that. About Spark, both instances will run at the same time but on different topics. That would be quite useless to have to 2DCs working on the same set of data. I just want, in case

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Cody Koeninger

Maybe I'm missing something, but I don't see how you get a quorum in only 2 datacenters (without splitbrain problem, etc). I also don't know how well ZK will work cross-datacenter. As far as the spark side of things goes, if it's idempotent, why not just run both instances all the time. On

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Erwan ALLAIN

I'm describing a disaster recovery but it can be used to make one datacenter offline for upgrade for instance. >From my point of view when DC2 crashes: *On Kafka side:* - kafka cluster will lose one or more broker (partition leader and replica) - partition leader lost will be reelected in the

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Jason Nerothin

It the main concern uptime or disaster recovery? > On Apr 19, 2016, at 9:12 AM, Cody Koeninger wrote: > > I think the bigger question is what happens to Kafka and your downstream data > store when DC2 crashes. > > From a Spark point of view, starting up a post-crash job in

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Cody Koeninger

I think the bigger question is what happens to Kafka and your downstream data store when DC2 crashes. >From a Spark point of view, starting up a post-crash job in a new data center isn't really different from starting up a post-crash job in the original data center. On Tue, Apr 19, 2016 at 3:32

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-19 Thread Erwan ALLAIN

Thanks Jason and Cody. I'll try to explain a bit better the Multi DC case. As I mentionned before, I'm planning to use one kafka cluster and 2 or more spark cluster distinct. Let's say we have the following DCs configuration in a nominal case. Kafka partitions are consumed uniformly by the 2

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-18 Thread Cody Koeninger

The current direct stream only handles exactly the partitions specified at startup. You'd have to restart the job if you changed partitions. https://issues.apache.org/jira/browse/SPARK-12177 has the ongoing work towards using the kafka 0.10 consumer, which would allow for dynamic topicparittions

Re: Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-18 Thread Jason Nerothin

Hi Erwan, You might consider InsightEdge: http://insightedge.io . It has the capability of doing WAN between data grids and would save you the work of having to re-invent the wheel. Additionally, RDDs can be shared between developers in the same DC. Thanks, Jason >

Spark Streaming / Kafka Direct Approach: Dynamic Partitionning / Multi DC Spark

2016-04-18 Thread Erwan ALLAIN

Hello, I'm currently designing a solution where 2 distinct clusters Spark (2 datacenters) share the same Kafka (Kafka rack aware or manual broker repartition). The aims are - preventing DC crash: using kafka resiliency and consumer group mechanism (or else ?) - keeping consistent offset among

Re: java.nio.channels.ClosedChannelException in Spark Streaming KafKa Direct

2016-02-01 Thread Cody Koeninger

at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at

java.nio.channels.ClosedChannelException in Spark Streaming KafKa Direct

2016-02-01 Thread SRK

) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-nio-channels-ClosedChannelException-in-Spark-Streaming-KafKa-Direct-tp26124.html Sent from the Apache Spark

Recovery for Spark Streaming Kafka Direct with OffsetOutOfRangeException

2016-01-21 Thread Dan Dutrow

Hey Cody, I would have responded to the mailing list but it looks like this thread got aged off. I have the problem where one of my topics dumps more data than my spark job can keep up with. We limit the input rate with maxRatePerPartition Eventually, when the data is aged off, I get the

Re: Recovery for Spark Streaming Kafka Direct with OffsetOutOfRangeException

2016-01-21 Thread Cody Koeninger

Looks like this response did go to the list. As far as OffsetOutOfRange goes, right now that's an unrecoverable error, because it breaks the underlying invariants (e.g. that the number of messages in a partition is deterministic once the RDD is defined) If you want to do some hacking for your

Re: Spark Streaming + Kafka + scala job message read issue

2016-01-15 Thread vivek.meghanathan

nathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; duc.was.h...@gmail.com; user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue Hi Bryan, Yes we are using only 1 thread per topic as we have only one Kafka server with 1 partition. What kind of logs will tell us

RE: Spark Streaming + Kafka + scala job message read issue

2016-01-05 Thread vivek.meghanathan

Meghanathan (WT01 - NEP) Sent: 27 December 2015 11:08 To: Bryan <bryan.jeff...@gmail.com> Cc: Vivek Meghanathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; duc.was.h...@gmail.com; user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue Hi Bryan, Yes

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-26 Thread Bryan

...@gmail.com Cc: duc.was.h...@gmail.com; vivek.meghanat...@wipro.com; user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue Hi Brian,PhuDuc, All 8 jobs are consuming 8 different IN topics. 8 different Scala jobs running each topic map mentioned below has only 1 thread

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-26 Thread vivek.meghanathan

l.com> Cc: duc.was.h...@gmail.com<mailto:duc.was.h...@gmail.com>; vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com>; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark Streaming + Kafka + scala job message read issue Hi Brian,PhuDuc, All 8

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread vivek.meghanathan

Any help is highly appreciated, i am completely stuck here.. From: Vivek Meghanathan (WT01 - NEP) Sent: Thursday, December 24, 2015 7:50 PM To: Bryan; user@spark.apache.org Subject: RE: Spark Streaming + Kafka + scala job message read issue We are using

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread vivek.meghanathan

t; for Windows 10 phone From: PhuDuc Nguyen<mailto:duc.was.h...@gmail.com> Sent: Friday, December 25, 2015 3:35 PM To: vivek.meghanat...@wipro.com<mailto:vivek.meghanat...@wipro.com> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: Spark Streaming + Kafka +

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread Bryan

...@wipro.com Sent: Friday, December 25, 2015 2:18 PM To: bryan.jeff...@gmail.com; user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue Any help is highly appreciated, i am completely stuck here.. From: Vivek Meghanathan (WT01 - NEP) Sent: Thursday, December 24

Re: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread PhuDuc Nguyen

gt; parse(line._2).extract[Search]) > > > > > > Regards, > Vivek M > > *From:* Bryan [mailto:bryan.jeff...@gmail.com] > *Sent:* 24 December 2015 17:20 > *To:* Vivek Meghanathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; > user@spark.apache.org > *Subjec

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-25 Thread Bryan

Agreed. I did not see that they were using the same group name. Sent from Outlook Mail for Windows 10 phone From: PhuDuc Nguyen Sent: Friday, December 25, 2015 3:35 PM To: vivek.meghanat...@wipro.com Cc: user@spark.apache.org Subject: Re: Spark Streaming + Kafka + scala job message read issue

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-24 Thread Bryan

while missing data from other partitions. Regards, Bryan Jeffrey Sent from Outlook Mail for Windows 10 phone From: vivek.meghanat...@wipro.com Sent: Thursday, December 24, 2015 5:22 AM To: user@spark.apache.org Subject: Spark Streaming + Kafka + scala job message read issue Hi All, We

Spark Streaming + Kafka + scala job message read issue

2015-12-24 Thread vivek.meghanathan

Hi All, We are using Bitnami Kafka 0.8.2 + spark 1.5.2 in Google cloud platform. Our spark streaming job(consumer) not receiving all the messages sent to the specific topic. It receives 1 out of ~50 messages(added log in the job stream and identified). We are not seeing any errors in the

RE: Spark Streaming + Kafka + scala job message read issue

2015-12-24 Thread vivek.meghanathan

anathan (WT01 - NEP) <vivek.meghanat...@wipro.com>; user@spark.apache.org Subject: RE: Spark Streaming + Kafka + scala job message read issue Are you using a direct stream consumer, or the older receiver based consumer? If the latter, do the number of partitions you’ve specified fo

Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread Tao Li

I am using spark streaming kafka direct approach these days. I found that when I start the application, it always start consumer the latest offset. I hope that when application start, it consume from the offset last application consumes with the same kafka consumer group. It means I have

RE: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread Singh, Abhijeet

You need to maintain the offset yourself and rightly so in something like ZooKeeper. From: Tao Li [mailto:litao.bupt...@gmail.com] Sent: Tuesday, December 08, 2015 5:36 PM To: user@spark.apache.org Subject: Need to maintain the consumer offset by myself when using spark streaming kafka direct

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread Dibyendu Bhattacharya

on't want to. If the 2 > solutions above don't satisfy your requirements, then consider writing your > own; otherwise I would recommend using the supported features in Spark. > > HTH, > Duc > > > > On Tue, Dec 8, 2015 at 5:05 AM, Tao Li <litao.bupt...@gmail.com> w

Re: Need to maintain the consumer offset by myself when using spark streaming kafka direct approach?

2015-12-08 Thread PhuDuc Nguyen

, 2015 at 5:05 AM, Tao Li <litao.bupt...@gmail.com> wrote: > I am using spark streaming kafka direct approach these days. I found that > when I start the application, it always start consumer the latest offset. I > hope that when application start, it consume from the offset las

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-03 Thread Cody Koeninger

gt;>>>>>>> max retries are reached..in that case there might be data loss. >>>>>>>>>>>> >>>>>>>>>>>> 2.Catch that exception and somehow force things to "reset" for >&

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger

gt;> org.apache.spark.streaming.CheckpointWriter$CheckpointWriteHandler@a48c5a8 >>>>>> rejected from java.util.concurrent.ThreadPoolExecutor@543258e0 >>>>>> [Terminated, >>>>>> pool size = 0, active threads = 0, queued tasks = 0, completed t

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya

on: >>>>>>> kafka.common.NotLeaderForPartitionException >>>>>>> >>>>>>> at >>>>>>> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) >>>>>>> >>>>>>> >>>>>>> >

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger

onitor the log for >>>>>>>>>> this error, >>>>>>>>>> and if it occurs more than X times, kill the job, remove the >>>>

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger

; Set([test_stream,5])) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> java.lang.ClassNotFoundException: >>>>>>>> kafka.common.NotLeaderForPartitionException >>>>>>>> >>>&g

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya

m: >>>>>>>>> ArrayBuffer(kafka.common.UnknownException, >>>>>>>>> org.apache.spark.SparkException: Couldn't find leader offsets for &

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread swetha kasireddy

sk 7 >> in >> stage 33.0 failed 4 times, most recent failure: Lost task 7.3 in stage >> 33.0 >> (TID 283, 172.16.97.103): UnknownReason >> >> java.lang.ClassNotFoundException: kafka.common.OffsetOutOfRan

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread swetha kasireddy

>>>> 10 >>>> in stage 52.0 failed 4 times, most recent failure: Lost task 10.3 in >>>> stage >>>> 52.0 (TID 255, 172.16.97.97): UnknownReason >>>> >>>> Exception in thread "streaming-job-executor-0" java.lang.Error: >>>> java.l

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Cody Koeninger

tion >>> >>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) >>> >>> >>> >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task >>> 7 in >>> stage 33.0 failed 4 times, most recent failur

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread swetha kasireddy

ang.Error: >>> java.lang.InterruptedException >>> >>> Caused by: java.lang.InterruptedException >>> >>> java.lang.ClassNotFoundException: kafka.common.OffsetOutOfRangeException >>> >>> at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) &g

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Dibyendu Bhattacharya

= 0, active threads = 0, queued tasks = 0, completed tasks = >>>>> 12112] >>>>> >>>>> >>>>> >>>>> org.apache.spark.SparkException: Job aborted due to stage failure: >>>>> Task 10 >>>>> in stage 52.0 failed 4 times, most recent fa

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Cody Koeninger

n > > java.lang.ClassNotFoundException: kafka.common.OffsetOutOfRangeException > > at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) > > java.lang.ClassNotFoundException: kafka.common.OffsetOutOfRangeException > > at org.apache.spark.util.Utils$.logUncaughtExceptions(Uti

Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-11-30 Thread SRK

tOutOfRangeException at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1699) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Recovery-for-Spark-Streaming-Kafka-Direct-in-case-of-issues-with-Kafka-tp25524.html Sent from the Apache Spark User List

Re: Spark Streaming kafka directStream value decoder issue

2015-09-17 Thread srungarapu vamsi

data to be collected on the > driver (assuming you don’t want that…) > > val events = kafkaDStream.map { case(devId,byteArray)=> > KafkaGenericEvent.parseFrom(byteArray) } > > From: srungarapu vamsi > Date: Thursday, September 17, 2015 at 4:03 PM > To: user > Subject: Spa

1 2 >

1 - 100 of 179 matches

Mail list logo