Re: Amazon Elastic Cache + Spark Streaming

2017-09-23 Thread Saravanan Nagarajan
sure Thanks! On Fri, Sep 22, 2017 at 5:36 PM, ayan guha wrote: > AWS Elastic Cache supports MemCach and Redis. Spark has a Redis connector > which I believe you can use to > connect to Elastic Cache. > > On Sat, Sep 23, 2017 at

Re: Amazon Elastic Cache + Spark Streaming

2017-09-22 Thread ayan guha
AWS Elastic Cache supports MemCach and Redis. Spark has a Redis connector which I believe you can use to connect to Elastic Cache. On Sat, Sep 23, 2017 at 5:08 AM, Saravanan Nagarajan wrote: > Hello, > > Anybody tried amazon elastic

Amazon Elastic Cache + Spark Streaming

2017-09-22 Thread Saravanan Nagarajan
Hello, Anybody tried amazon elastic cache.Just give me some pointers. Thanks!

Re: Checkpoints not cleaned using Spark streaming + watermarking + kafka

2017-09-22 Thread MathieuP
The expected setting to clean these files is : - spark.sql.streaming.minBatchesToRetain More info on structured streaming settings : https://github.com/jaceklaskowski/spark-structured-streaming-book/blob/master/spark-sql-streaming-properties.adoc -- Sent from:

Checkpoints not cleaned using Spark streaming + watermarking + kafka

2017-09-21 Thread MathieuP
Hi Spark Users ! :) I come to you with a question about checkpoints. I have a streaming application that consumes and produces to Kafka. The computation requires a window and watermarking. Since this is a streaming application with a Kafka output, a checkpoint is expected. The application runs

How to get time slice or the batch time for which the current micro batch is running in Spark Streaming

2017-09-20 Thread SRK
Hi, How to get the time slice or the batch time for which the current micro batch is running in Spark Streaming? Currently I am using System time which is causing the clearing keys feature of reduceByKeyAndWindow to not work properly. Thanks, Swetha -- Sent from: http://apache-spark-user-list

Spark Streaming + Kafka + Hive: delayed

2017-09-20 Thread toletum
Hello. I have a process (python) that reads a kafka queue, for each record it checks in a table. # Load table in memory table=sqlContext.sql("select id from table") table.cache() kafkaTopic.foreachRDD(processForeach) def processForeach (time, rdd): print(time) for k in rdd.collect (): if

Re: Chaining Spark Streaming Jobs

2017-09-18 Thread Michael Armbrust
t to S3. In my laptop, they > all point to local filesystem. > > I am using Spark2.2.0 > > Appreciate your help. > > regards > > Sunita > > > On Wed, Aug 23, 2017 at 2:30 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> If you use str

Re: Chaining Spark Streaming Jobs

2017-09-13 Thread Sunita Arvind
sing Spark2.2.0 > > Appreciate your help. > > regards > > Sunita > > > On Wed, Aug 23, 2017 at 2:30 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> If you use structured streaming and the file sink, you can have a >> subsequent st

Re: Chaining Spark Streaming Jobs

2017-09-13 Thread vincent gromakowski
n Wed, Aug 23, 2017 at 2:30 PM, Michael Armbrust <mich...@databricks.com> wrote: > If you use structured streaming and the file sink, you can have a > subsequent stream read using the file source. This will maintain exactly > once processing even if there are hiccups or failures. > > On Mon

spark streaming executor number still increase

2017-09-12 Thread zhan8610189
I use CDH spark(1.5.0-hadoop2.6.0) cluster, and write one spark streaming application, and start spark streaming using following command: spark-submit --master spark://:7077 --conf spark.cores.max=4 --num-executors 4 --total-executor-cores 4 --executor-cores 4 --executor-memory 2g --class

Re: Chaining Spark Streaming Jobs

2017-09-12 Thread Sunita Arvind
d using the file source. This will maintain exactly > once processing even if there are hiccups or failures. > > On Mon, Aug 21, 2017 at 2:02 PM, Sunita Arvind <sunitarv...@gmail.com> > wrote: > >> Hello Spark Experts, >> >> I have a design question w.r.t Spa

[Spark Streaming] - Stopped worker throws FileNotFoundException

2017-09-10 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode (version 2.1.1). The application is run with a spark-submit command with option "--deploy-mode&qu

[Spark Streaming] - Stopped worker throws FileNotFoundException

2017-09-09 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode (version 2.1.1). The application is run with a spark-submit command with option "-deploy-mode c

[Spark Streaming] - Stopped worker throws FileNotFoundException

2017-09-09 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode (version 2.1.1). The application is run with a spark-submit command with option "-deploy-mode c

Spark Streaming - Stopped worker throws FileNotFoundException

2017-09-09 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode (version 2.1.1). The application is run with a spark-submit command with option "-deploy-mode c

Re: [Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN)

2017-09-08 Thread Karthik Palaniappan
; t...@databricks.com Subject: Re: [Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN) Any ideas @Tathagata? I'd be happy to contribute a patch if you can point me in the right direction. From: Karthik Palaniappan <karthik...@hotmail.

Re: Chaining Spark Streaming Jobs

2017-09-08 Thread Sunita Arvind
ming and the file sink, you can have a >>> subsequent stream read using the file source. This will maintain exactly >>> once processing even if there are hiccups or failures. >>> >>> On Mon, Aug 21, 2017 at 2:02 PM, Sunita Arvind <sunitarv...@gmail.com> >>&

Re: Chaining Spark Streaming Jobs

2017-09-08 Thread Praneeth Gayam
on, Aug 21, 2017 at 2:02 PM, Sunita Arvind <sunitarv...@gmail.com> >> wrote: >> >>> Hello Spark Experts, >>> >>> I have a design question w.r.t Spark Streaming. I have a streaming job >>> that consumes protocol buffer encoded real time logs from a Kafk

Re: Chaining Spark Streaming Jobs

2017-09-07 Thread Sunita Arvind
tain exactly > once processing even if there are hiccups or failures. > > On Mon, Aug 21, 2017 at 2:02 PM, Sunita Arvind <sunitarv...@gmail.com> > wrote: > >> Hello Spark Experts, >> >> I have a design question w.r.t Spark Streaming. I have a streaming job >>

Re: [Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN)

2017-09-01 Thread Karthik Palaniappan
ct: RE: [Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN) You have to set spark.executor.instances=0 in a streaming application with dynamic allocation: https://github.com/tdas/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/sch

spark streaming and reactive streams

2017-08-30 Thread Mich Talebzadeh
hi, i just had the idea of reactive streams thrown in. i was wondering in practical terms what it adds to spark streaming. what we have been missing so to speak? thanks, Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <ht

[Spark Streaming] Application is stopped after stopping a worker

2017-08-28 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode. The application is run with a spark-submit command with option --deploy-mode client. The submit command

Re: Kill Spark Streaming JOB from Spark UI or Yarn

2017-08-27 Thread Matei Zaharia
The batches should all have the same application ID, so use that one. You can also find the application in the YARN UI to terminate it from there. Matei > On Aug 27, 2017, at 10:27 AM, KhajaAsmath Mohammed <mdkhajaasm...@gmail.com> > wrote: > > Hi, > > I a

RE: [Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN)

2017-08-25 Thread Karthik Palaniappan
You have to set spark.executor.instances=0 in a streaming application with dynamic allocation: https://github.com/tdas/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManager.scala#L207. I originally had it set to a positive value

Re: [Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN)

2017-08-24 Thread Akhil Das
: Connecting > to ResourceManager at hadoop-m/10.240.1.92:8032 > 17/08/22 19:35:00 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: > Submitted application application_1503036971561_0022 > 17/08/22 19:35:04 WARN org.apache.spark.streaming.StreamingContext: > Dynamic Allocation is enabled for this application. E

Re: Chaining Spark Streaming Jobs

2017-08-23 Thread Michael Armbrust
k Experts, > > I have a design question w.r.t Spark Streaming. I have a streaming job > that consumes protocol buffer encoded real time logs from a Kafka cluster > on premise. My spark application runs on EMR (aws) and persists data onto > s3. Before I persist, I need to strip header

ReduceByKeyAndWindow checkpoint recovery issues in Spark Streaming

2017-08-23 Thread SRK
) at org.apache.spark.streaming.dstream.ReducedWindowedDStream$$anonfun$4.apply(ReducedWindowedDStream.scala:130) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ReduceByKeyAndWindow-checkpoint-recovery-issues-in-Spark-Streaming-tp29100.html Sent from the Apache Spark

[Spark Streaming] Streaming Dynamic Allocation is broken (at least on YARN)

2017-08-22 Thread Karthik Palaniappan
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl: Submitted application application_1503036971561_0022 17/08/22 19:35:04 WARN org.apache.spark.streaming.StreamingContext: Dynamic Allocation is enabled for this application. Enabling Dynamic allocation for Spark Streaming applications can cause data loss

Chaining Spark Streaming Jobs

2017-08-21 Thread Sunita Arvind
Hello Spark Experts, I have a design question w.r.t Spark Streaming. I have a streaming job that consumes protocol buffer encoded real time logs from a Kafka cluster on premise. My spark application runs on EMR (aws) and persists data onto s3. Before I persist, I need to strip header and convert

Fwd: Issues when trying to recover a textFileStream from checkpoint in Spark streaming

2017-08-11 Thread swetha kasireddy
Swetha -- Forwarded message -- From: SRK <swethakasire...@gmail.com> Date: Thu, Aug 10, 2017 at 5:04 PM Subject: Issues when trying to recover a textFileStream from checkpoint in Spark streaming To: user@spark.apache.org Hi, I am facing issues while trying to recov

Issues when trying to recover a textFileStream from checkpoint in Spark streaming

2017-08-10 Thread SRK
wetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-trying-to-recover-a-textFileStream-from-checkpoint-in-Spark-streaming-tp29052.html Sent from the Apache Spark User

Spark streaming - Processing time keeps on increasing under following scenario

2017-08-10 Thread Ravi Gurram
Hi, I have a spark streaming task that basically does the following, 1. Read a batch using a custom receiver 2. Parse and apply transforms to the batch 3. Convert the raw fields to a bunch of features 4. Use a pre-built model to predict the class of each record

How to get the lag of a column in a spark streaming dataframe?

2017-08-08 Thread Prashanth Kumar Murali
I have data streaming into my spark scala application in this format idmark1 mark2 mark3 time uuid1 100 200 300 Tue Aug 8 14:06:02 PDT 2017 uuid1 100 200 300 Tue Aug 8 14:06:22 PDT 2017 uuid2 150 250 350 Tue Aug 8 14:06:32 PDT 2017 uuid2 150 250 350 Tue Aug 8

Re: Spark streaming: java.lang.ClassCastException: org.apache.spark.util.SerializableConfiguration ... on restart from checkpoint

2017-08-08 Thread dcam
://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-java-lang-ClassCastException-org-apache-spark-util-SerializableConfiguration-on-restt-tp25698p29044.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark Streaming job statistics

2017-08-08 Thread KhajaAsmath Mohammed
ail.com> wrote: > >> Hi, >> >> I am running spark streaming job which receives data from azure iot hub. >> I am not sure if the connection was successful and receving any data. does >> the input column show how much data it has read if the connection was >> successful? >> >> [image: Inline image 1] >> > >

Re: Spark Streaming job statistics

2017-08-08 Thread Riccardo Ferrari
Hi, Have you tried to check the "Streaming" tab menu? Best, On Tue, Aug 8, 2017 at 4:15 PM, KhajaAsmath Mohammed < mdkhajaasm...@gmail.com> wrote: > Hi, > > I am running spark streaming job which receives data from azure iot hub. I > am not sure if the connection

Spark Streaming job statistics

2017-08-08 Thread KhajaAsmath Mohammed
Hi, I am running spark streaming job which receives data from azure iot hub. I am not sure if the connection was successful and receving any data. does the input column show how much data it has read if the connection was successful? [image: Inline image 1]

Re: Spark Streaming: Async action scheduling inside foreachRDD

2017-08-04 Thread Sathish Kumaran Vairavelu
Forkjoinpool with task support would help in this case. Where u can create a thread pool with configured number of threads ( make sure u have enough cores) and submit job I mean actions to the pool On Fri, Aug 4, 2017 at 8:54 AM Raghavendra Pandey < raghavendra.pan...@gmail.com> wrote: > Did you

Re: Spark Streaming: Async action scheduling inside foreachRDD

2017-08-04 Thread Raghavendra Pandey
Did you try SparkContext.addSparkListener? On Aug 3, 2017 1:54 AM, "Andrii Biletskyi" wrote: > Hi all, > > What is the correct way to schedule multiple jobs inside foreachRDD method > and importantly await on result to ensure those jobs have completed >

Spark Streaming failed jobs due to hardware issue

2017-08-04 Thread Alapati VenuGopal
Hi, We are running a Spark Streaming application with Kafka Direct Stream with Spark version 1.6. It has run for few days without any error or failed tasks and then there was an error creating a directory in one machine as follows: Job aborted due to stage failure: Task 1 in stage 158757.0

Spark Streaming: Async action scheduling inside foreachRDD

2017-08-02 Thread Andrii Biletskyi
Hi all, What is the correct way to schedule multiple jobs inside foreachRDD method and importantly await on result to ensure those jobs have completed successfully? E.g.: kafkaDStream.foreachRDD{ rdd => val rdd1 = rdd.map(...) val rdd2 = rdd1.map(...) val job1Future = Future{

Cannot connect to Python process in Spark Streaming

2017-08-01 Thread canan chen
I run pyspark streaming example queue_streaming.py. But run into the following error, does anyone know what might be wrong ? Thanks ERROR [2017-08-02 08:29:20,023] ({Stop-StreamingContext} Logging.scala[logError]:91) - Cannot connect to Python process. It's probably dead. Stopping

Re: Spark Streaming with long batch / window duration

2017-07-28 Thread emceemouli
://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-long-batch-window-duration-tp10191p29005.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark Streaming as a Service

2017-07-28 Thread ajit roshen
We have few Spark Streaming Apps running on our AWS Spark 2.1 Yarn cluster. We currently log on to the Master Node of the cluster and start the App using "spark-submit", calling the jar. We would like to open up this to our users, so that they can submit their own Apps, but we would n

[Spark streaming-Mesos-cluster mode] java.lang.RuntimeException: Stream jar not found

2017-07-26 Thread RCinna
Hello, I have a spark streaming job using hdfs and checkpointing components and running well on a standalone spark cluster with multi nodes, both in client and cluster deploy mode. I would like to switch with Mesos cluster manager and submit job as cluster deploy mode. First launch of the app

Conflict resolution for data in spark streaming

2017-07-24 Thread Biplob Biswas
Hi, I have a situation where updates are coming from 2 different data sources, this data at times are arriving in the same batch defined in streaming context duration parameter of 500 ms (recommended in spark according to the documentation). Now that is not the problem, the problem is that when

Spark Streaming: Blocks and Partitions

2017-07-20 Thread Kalim, Faria
Hi, Just a quick clarification question: from what I understand, blocks in a batch together form a single RDD which is partitioned (usually using the HashPartitioner) across multiple tasks. First, is this correct? Second, the partitioner is called every single time a new task is created. Is

[Spark Streaming] How to make this code work?

2017-07-18 Thread Noppanit Charassinvichai
I'm super new to Spark and I'm writing this job to parse nginx log to ORC file format so it can be read from Presto. We wrote LogLine2Json which parse a line of nginx log to json. And that works fine. val sqs = streamContext.receiverStream(new SQSReceiver("elb") //.credentials("key",

Spark Streaming handling Kafka exceptions

2017-07-17 Thread Jean-Francois Gosselin
How can I handle an error with Kafka with my DirectStream (network issue, zookeeper or broker going down) ? For example when the consumer fails to connect with Kafka (at startup) I only get a DEBUG log (not even an ERROR) and no exception are thrown ... I'm using Spark 2.1.1 and spark-streaming

Re: Does mapWithState need checkpointing to be specified in Spark Streaming?

2017-07-16 Thread Yuval.Itzchakov
Yes, you do. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-mapWithState-need-checkpointing-to-be-specified-in-Spark-Streaming-tp28858p28862.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Does mapWithState need checkpointing to be specified in Spark Streaming?

2017-07-13 Thread swetha kasireddy
gt; >> >> -- >> View this message in context: http://apache-spark-user-list. >> 1001560.n3.nabble.com/Does-mapWithState-need-checkpointing- >> to-be-specified-in-Spark-Streaming-tp28858.html >> Sent from the Apache Spark User List mailing list archive at N

Re: Does mapWithState need checkpointing to be specified in Spark Streaming?

2017-07-13 Thread Tathagata Das
y checkpointing for mapWithState just like we do for > updateStateByKey? > > Thanks, > Swetha > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Does-mapWithState-need- > checkpointing-to-be-specified-in-Spark-Streaming-tp2

Does mapWithState need checkpointing to be specified in Spark Streaming?

2017-07-13 Thread SRK
Hi, Do we need to specify checkpointing for mapWithState just like we do for updateStateByKey? Thanks, Swetha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-mapWithState-need-checkpointing-to-be-specified-in-Spark-Streaming-tp28858.html Sent from

UnpicklingError while using spark streaming

2017-07-13 Thread lovemoon
| down votefavorite | spark2.1.1 & python2.7.11 I want to union another rdd in Dstream.transform() like below: sc = SparkContext() ssc = StreamingContext(sc, 1) init_rdd = sc.textFile('file:///home/zht/PycharmProjects/test/text_file.txt') lines = ssc.socketTextStream('localhost', )

Implementing Dynamic Sampling in a Spark Streaming Application

2017-07-12 Thread N B
Hi all, Spark has had a backpressure implementation since 1.5 that helps to stabilize a Spark Streaming application in terms of keeping the processing time/batch under control and less than the batch interval. This implementation leaves excess records in the source (Kafka, Flume etc) and they get

Spark streaming does not seem to clear MapPartitionsRDD and ShuffledRDD that are persisted after the use of updateStateByKey and reduceByKeyAndWindow with inverse functions even after checkpointing th

2017-07-11 Thread SRK
Hi, Spark streaming does not seem to clear MapPartitionsRDD and ShuffledRDD that are persisted after the use of updateStateByKey and reduceByKeyAndWindow with inverse functions even after checkpointing the data. Any idea as to why thing happens? Is there a way that I can set a time out to clear

[Spark Streaming] - ERROR Error cleaning broadcast Exception

2017-07-11 Thread Nipun Arora
Hi All, I get the following error while running my spark streaming application, we have a large application running multiple stateful (with mapWithState) and stateless operations. It's getting difficult to isolate the error since spark itself hangs and the only error we see is in the spark log

Re: Spark streaming giving me a bunch of WARNINGS, please help meunderstand them

2017-07-10 Thread Cody Koeninger
ext, some configuration may not take effect. > I wanted to restart the spark streaming app, so stopped the running > and issued a new spark submit. Why and how it will use a existing > SparkContext? > => you are using checkpoint to restore the sparkcontext. > =>

Spark streaming application is failing after running for few hours

2017-07-10 Thread shyla deshpande
My Spark streaming application is failing after running for few hours. After it fails, when I check the storage tab, I see that MapWithStateRDD is less than 100%. Is this is reason why it is failing? What does MapWithStateRDD 90% cached mean. Does this mean I lost 10% or the 10% is spilled

Re: Event time aggregation is possible in Spark Streaming ?

2017-07-10 Thread Swapnil Chougule
ail.com> > wrote: > >> Hello, >> >> I want to know whether event time aggregation in spark streaming. I could >> see it's possible in structured streaming. As I am working on conventional >> spark streaming, I need event time aggregation in it. I checked bu

Re: Event time aggregation is possible in Spark Streaming ?

2017-07-10 Thread Michael Armbrust
Event-time aggregation is only supported in Structured Streaming. On Sat, Jul 8, 2017 at 4:18 AM, Swapnil Chougule <the.swapni...@gmail.com> wrote: > Hello, > > I want to know whether event time aggregation in spark streaming. I could > see it's possible in structured streamin

Re: Spark streaming giving me a bunch of WARNINGS, please help meunderstand them

2017-07-10 Thread shyla deshpande
WARN Use an existing SparkContext, some configuration may not take effect. I wanted to restart the spark streaming app, so stopped the running and issued a new spark submit. Why and how it will use a existing SparkContext? => you are using checkpoint to restore the sparkcont

Re: Spark streaming giving me a bunch of WARNINGS, please help meunderstand them

2017-07-10 Thread ??????????
It seems you are usibg kafka 0.10. See my comments below. ---Original--- From: "shyla deshpande"<deshpandesh...@gmail.com> Date: 2017/7/10 08:17:10 To: "user"<user@spark.apache.org>; Subject: Spark streaming giving me a bunch of WARNINGS, please help meunderst

Spark streaming giving me a bunch of WARNINGS, please help me understand them

2017-07-09 Thread shyla deshpande
WARN Use an existing SparkContext, some configuration may not take effect. I wanted to restart the spark streaming app, so stopped the running and issued a new spark submit. Why and how it will use a existing SparkContext? WARN Spark is not running in local mode, therefore

Re: Spark streaming, Storage tab questions

2017-07-09 Thread anna stax
On Sun, Jul 9, 2017 at 4:33 PM, anna stax <annasta...@gmail.com> wrote: > Does each row represent the state of my app at different time? > > When the fraction cached is 90% and the size on Disk is 0, does that mean > 10% of the data is lost. Its neither in memory now disk? >

Event time aggregation is possible in Spark Streaming ?

2017-07-08 Thread Swapnil Chougule
Hello, I want to know whether event time aggregation in spark streaming. I could see it's possible in structured streaming. As I am working on conventional spark streaming, I need event time aggregation in it. I checked but didn't get any relevant documentation. Thanks in advance Regards

Re: How to reduce the amount of data that is getting written to the checkpoint from Spark Streaming

2017-07-03 Thread Yuval.Itzchakov
-to-reduce-the-amount-of-data-that-is-getting-written-to-the-checkpoint-from-Spark-Streaming-tp28798p28820.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr

Structured Streaming UI similar to Spark Streaming

2017-07-02 Thread Yuval.Itzchakov
Hi, Today, Spark Streaming exposes an extensive, detailed graphs of input rate, processing time and delay. I was wondering, is there any plan to integrate such a graph for Structured Streaming? Now with Kafka support and implementation of stateful aggregations in Spark 2.2, it's becoming a very

Re: How to reduce the amount of data that is getting written to the checkpoint from Spark Streaming

2017-07-02 Thread Yuval.Itzchakov
You can't. Spark doesn't let you fiddle with the data being checkpoint, as it's an internal implementation detail. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-reduce-the-amount-of-data-that-is-getting-written-to-the-checkpoint-from-Spark

Re: spark streaming socket read issue

2017-06-30 Thread Shixiong(Ryan) Zhu
Could you show the codes that start the StreamingQuery from Dataset?. If you don't call `writeStream.start(...)`, it won't run anything. On Fri, Jun 30, 2017 at 6:47 AM, pradeepbill <pradeep.b...@gmail.com> wrote: > hi there, I have a spark streaming issue that i am not able to f

spark streaming socket read issue

2017-06-30 Thread pradeepbill
hi there, I have a spark streaming issue that i am not able to figure out , below code reads from a socket, but I don't see any input going into the job, I have nc -l running, and dumping data though, not sure why my spark job is not able to read data from 10.176.110.112:.Please advice

How to reduce the amount of data that is getting written to the checkpoint from Spark Streaming

2017-06-27 Thread SRK
Hi, I have checkpoints enabled in Spark streaming and I use updateStateByKey and reduceByKeyAndWindow with inverse functions. How do I reduce the amount of data that I am writing to the checkpoint or clear out the data that I dont care? Thanks! -- View this message in context: http://apache

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-26 Thread ??????????
m>;"vincent gromakowski"<vincent.gromakow...@gmail.com>; Subject: Re: What is the real difference between Kafka streaming and Spark Streaming? Also another difference I see is some thing like Spark Sql where there are logical plans, physical plans, Code generation and all those opti

Re: Spark Streaming reduceByKeyAndWindow with inverse function seems toiterate over all the keys in the window even though they are not presentin the current batch

2017-06-26 Thread ??????????
uot;user"<user@spark.apache.org>; Subject: Spark Streaming reduceByKeyAndWindow with inverse function seems toiterate over all the keys in the window even though they are not presentin the current batch Hi, We have reduceByKeyAndWindow with inverse function feature in our Streaming job to

Re: Spark Streaming reduceByKeyAndWindow with inverse function seems to iterate over all the keys in the window even though they are not present in the current batch

2017-06-26 Thread Tathagata Das
t; > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Spark-Streaming-reduceByKeyAndWindow-with- > inverse-function-seems-to-iterate-over-all-the-keys-in-theh-tp287

Spark Streaming reduceByKeyAndWindow with inverse function seems to iterate over all the keys in the window even though they are not present in the current batch

2017-06-26 Thread SRK
in the current batch. Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-reduceByKeyAndWindow-with-inverse-function-seems-to-iterate-over-all-the-keys-in-theh-tp28792.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

2017-06-26 Thread N B
et1 ++set2". I suggest creating a >>>>> new HashMap in the function (and add both maps into it), rather than >>>>> mutating one of them. >>>>> >>>>> On Tue, Jun 6, 2017 at 11:30 AM, SRK <swethakasire...@gmail.com> >>>>

Re: Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire
We are also doing transformations, thats the reason using spark streaming. Does Spark streaming support tumbling windows? I was thinking I can use a window operation to writing into HDFS. Thanks On Sun, Jun 25, 2017 at 10:23 PM, ayan guha <guha.a...@gmail.com> wrote: > I would sugge

Re: Spark streaming persist to hdfs question

2017-06-25 Thread ayan guha
I would suggest to use Flume, if possible, as it has in built HDFS log rolling capabilities On Mon, Jun 26, 2017 at 1:09 PM, Naveen Madhire <vmadh...@umail.iu.edu> wrote: > Hi, > > I am using spark streaming with 1 minute duration to read data from kafka > topic, app

Spark streaming persist to hdfs question

2017-06-25 Thread Naveen Madhire
Hi, I am using spark streaming with 1 minute duration to read data from kafka topic, apply transformations and persist into HDFS. The application is creating a new directory every 1 minute with many partition files(= nbr of partitions). What parameter should I need to change/configure to persist

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

2017-06-22 Thread Tathagata Das
ture, and it >>>>> seems you are actually mutating it in "set1 ++set2". I suggest creating a >>>>> new HashMap in the function (and add both maps into it), rather than >>>>> mutating one of them. >>>>> >>>>> On Tue, Jun 6, 20

Re: Exception which using ReduceByKeyAndWindow in Spark Streaming.

2017-06-22 Thread swetha kasireddy
hMap in the function (and add both maps into it), rather than >>>> mutating one of them. >>>> >>>> On Tue, Jun 6, 2017 at 11:30 AM, SRK <swethakasire...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> I see

Re: Spark Streaming - Increasing number of executors slows down processing rate

2017-06-20 Thread Biplob Biswas
400 and everything else remains the same > except we reduce memory to 11GB, we see the time to process 0 records(end > of each batch) increases 10times to 50Second and some cases it goes to 103 > seconds. > > > Spark Streaming configs that we are setting are > > Batchw

Spark Streaming - Increasing number of executors slows down processing rate

2017-06-19 Thread Mal Edwin
see the time to process 0 records(end of each batch) increases 10times to  50Second and some cases it goes to 103 seconds. Spark Streaming configs that we are setting are Batchwindow = 60 seconds Backpressure.enabled = true spark.memory.fraction=0.3 (we store more data in our own data struct

Re: how many topics spark streaming can handle

2017-06-19 Thread Bryan Jeffrey
Hello Ashok, We're consuming from more than 10 topics in some Spark streaming applications. Topic management is a concern (what is read from where, etc), but I have seen no issues from Spark itself. Regards, Bryan Jeffrey Get Outlook for Android On Mon, Jun 19, 2017

Re: how many topics spark streaming can handle

2017-06-19 Thread Ashok Kumar
lusters and the type of processing you are trying to do. On Mon, Jun 19, 2017 at 12:00 PM, Ashok Kumar <ashok34...@yahoo.com.invalid> wrote: Hi Gurus, Within one Spark streaming process how many topics can be handled? I have not tried more than one topic. Thanks

Re: how many topics spark streaming can handle

2017-06-19 Thread Michael Armbrust
I don't think that there is really a Spark specific limit here. It would be a function of the size of your spark / kafka clusters and the type of processing you are trying to do. On Mon, Jun 19, 2017 at 12:00 PM, Ashok Kumar <ashok34...@yahoo.com.invalid> wrote: > Hi Gurus, > > W

how many topics spark streaming can handle

2017-06-19 Thread Ashok Kumar
Hi Gurus, Within one Spark streaming process how many topics can be handled? I have not tried more than one topic. Thanks

Spark streaming data loss

2017-06-19 Thread vasanth kumar
Hi, I have spark kafka streaming job running in Yarn cluster mode with spark.task.maxFailures=4 (default) spark.yarn.max.executor.failures=8 number of executor=1 spark.streaming.stopGracefullyOnShutdown=false checkpointing enabled - When there is RuntimeException in a batch in executor then

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-15 Thread Michael Armbrust
ka to Kinesis and had to only change >> a few lines of code! I think its likely that in the future Spark will also >> have connectors for Google's PubSub and Azure's streaming offerings. >> >> Regarding latency, there has been a lot of discussion about the inherent >

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-15 Thread kant kodali
re's streaming offerings. > > Regarding latency, there has been a lot of discussion about the inherent > latencies of micro-batch. Fortunately, we were very careful to leave > batching out of the user facing API, and as we demo'ed last week, this > makes it possible for the Spark Stream

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-14 Thread Michael Armbrust
r facing API, and as we demo'ed last week, this makes it possible for the Spark Streaming to achieve sub-millisecond latencies <https://www.youtube.com/watch?v=qAZ5XUz32yM>. Watch SPARK-20928 <https://issues.apache.org/jira/browse/SPARK-20928> for more on this effort to eliminate

Re: Spark Streaming Design Suggestion

2017-06-14 Thread Shashi Vishwakarma
I agree Jorn and Satish. I think I should starting grouping similar kind of messages into single topic with some kind of id attached to it which can be pulled from spark streaming application. I can try reducing no of topic to significant lower but still at the end I can expect 50+ topics

Re: Spark Streaming Design Suggestion

2017-06-14 Thread satish lalam
gt; > > On 13. Jun 2017, at 22:03, Shashi Vishwakarma <shashi.vish...@gmail.com> > wrote: > > > > Hi > > > > I have to design a spark streaming application with below use case. I am > looking for best possible approach for this. > > > > I have

Re: Spark Streaming Design Suggestion

2017-06-13 Thread Jörn Franke
be a nightmare to operate > On 13. Jun 2017, at 22:03, Shashi Vishwakarma <shashi.vish...@gmail.com> > wrote: > > Hi > > I have to design a spark streaming application with below use case. I am > looking for best possible approach for this. > > I ha

Spark Streaming Design Suggestion

2017-06-13 Thread Shashi Vishwakarma
Hi I have to design a spark streaming application with below use case. I am looking for best possible approach for this. I have application which pushing data into 1000+ different topics each has different purpose . Spark streaming will receive data from each topic and after processing

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-13 Thread Paolo Patierno
I think that a big advantage to not use Spark Streaming when your solution is already based on Kafka is that you don't have to deal with another cluster. I mean ... Imagine that your solution is already based on Kafka as ingestion systems for your events and then you need to do some real time

RE: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-12 Thread Mohammed Guller
at the latency requirements of my application. For most applications, the near real time capabilities of Spark Streaming might be good enough. For others, it may not. For example, if I was building a high-frequency trading application, where I want to process individual trades as soon

Re: What is the real difference between Kafka streaming and Spark Streaming?

2017-06-11 Thread kant kodali
d send the > results out to many sinks. Look up "Kafka Connect" > > Regarding Event at a time vs Micro-batch. I hear arguments from a group of > people saying Spark Streaming is real time and other group of people is > Kafka streaming is the true real time. so do we say Micr

<    1   2   3   4   5   6   7   8   9   10   >