from:"\"gaurav sharma\""

Debug what is replication Level of which RDD

2016-01-23 Thread gaurav sharma

Hi All, I have enabled replication for my RDDs. I see on the Storage tab of the Spark UI, which mentions the replication level 2x or 1x. But the names given are mappedRDD, shuffledRDD, I am not able to debug which of my RDD is 2n replicated, and which one is 1x. Please help. Regards, Gaurab

Spark RDD DAG behaviour understanding in case of checkpointing

2016-01-23 Thread gaurav sharma

Hi Tathagata/Cody, I am facing a challenge in Production with DAG behaviour during checkpointing in spark streaming - Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~ 100 GB of data Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to parallelise processing - c

Re: Multiple Spark Streaming Jobs on Single Master

2015-10-23 Thread gaurav sharma

AM, Augustus Hong wrote: > How did you specify number of cores each executor can use? > > Be sure to use this when submitting jobs with spark-submit: > *--total-executor-cores > 100.* > > Other options won't work from my experience. > > On Fri, Oct 23, 2015 at

Re: Multiple Spark Streaming Jobs on Single Master

2015-10-23 Thread gaurav sharma

Hi, I created 2 workers on same machine each with 4 cores and 6GB ram I submitted first job, and it allocated 2 cores on each of the worker processes, and utilized full 4 GB ram for each executor process When i submit my second job it always say in WAITING state. Cheers!! On Tue, Oct 20, 20

Re: Worker Machine running out of disk for Long running Streaming process

2015-09-15 Thread gaurav sharma

dically (say every 10 mins) run System.gc() on the driver. >> The cleaning up shuffles is tied to the garbage collection. >> >> >> On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma >> wrote: >> >>> Hi All, >>> >>> >>> I have a

Worker Machine running out of disk for Long running Streaming process

2015-08-21 Thread gaurav sharma

Hi All, I have a 24x7 running Streaming Process, which runs on 2 hour windowed data The issue i am facing is my worker machines are running OUT OF DISK space I checked that the SHUFFLE FILES are not getting cleaned up. /log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438

Re: Writing streaming data to cassandra creates duplicates

2015-08-04 Thread gaurav sharma

Ideally the 2 messages read from kafka must differ on some parameter atleast, or else they are logically same As a solution to your problem, if the message content is same, u cud create a new field UUID, which might play the role of partition key while inserting the 2 messages in Cassandra Msg1 -

Re: Spark Streaming Kafka could not find leader offset for Set()

2015-07-30 Thread gaurav sharma

I have run into similar excpetions ERROR DirectKafkaInputDStream: ArrayBuffer(java.net.SocketTimeoutException, org.apache.spark.SparkException: Couldn't find leader offsets for Set([AdServe,1])) and the issue has happened on Kafka Side, where my broker offsets go out of sync, or do not return l

Re: createDirectStream and Stats

2015-07-12 Thread gaurav sharma

Hi guys, I too am facing similar challenge with directstream. I have 3 Kafka Partitions. and running spark on 18 cores, with parallelism level set to 48. I am running simple map-reduce job on incoming stream. Though the reduce stage takes milliseconds-seconds for around 15 million packets, the

Re: Worker dies with java.io.IOException: Stream closed

2015-07-12 Thread gaurav sharma

ite to /opt/ on that machine as its one machine always > throwing up. > > Thanks > Best Regards > > On Sat, Jul 11, 2015 at 11:18 PM, gaurav sharma > wrote: > >> Hi All, >> >> I am facing this issue in my production environment. >> >> My worke

Worker dies with java.io.IOException: Stream closed

2015-07-11 Thread gaurav sharma

Hi All, I am facing this issue in my production environment. My worker dies by throwing this exception. But i see the space is available on all the partitions on my disk I did NOT see any abrupt increase in DIsk IO, which might have choked the executor to write on to the stderr file. But still m

Re: How does one decide no of executors/cores/memory allocation?

2015-06-15 Thread gaurav sharma

When you submit a job, spark breaks down it into stages, as per DAG. the stages run transformations or actions on the rdd's. Each rdd constitutes of N partitions. The tasks creates by spark to execute the stage are equal to the number of partitions. Every task is executed on the cored utilized by

Spark Streaming - Can i BIND Spark Executor to Kafka Partition Leader

2015-06-12 Thread gaurav sharma

Hi, I am using Kafka Spark cluster for real time aggregation analytics use case in production. *Cluster details* *6 nodes*, each node running 1 Spark and kafka processes each. Node1 -> 1 Master , 1 Worker, 1 Driver, 1 Kafka process Node 2,3,4,5,6 -> 1 Worker prcocess each

Re: How to pass arguments dynamically, that needs to be used in executors

2015-06-12 Thread gaurav sharma

ence the broadcast variable in you workers. It will get > shipped once to all nodes in the cluster and can be referenced by them. > > HTH. > > -Todd > > On Thu, Jun 11, 2015 at 8:23 AM, gaurav sharma > wrote: > >> Hi, >> >> I am using Kafka Spark cluste

How to pass arguments dynamically, that needs to be used in executors

2015-06-11 Thread gaurav sharma

Hi, I am using Kafka Spark cluster for real time aggregation analytics use case in production. Cluster details 6 nodes, each node running 1 Spark and kafka processes each. Node1 -> 1 Master , 1 Worker, 1 Driver, 1 Kafka process Node 2,3,4,5,6 -> 1 Worker prcocess each

Debug what is replication Level of which RDD

Spark RDD DAG behaviour understanding in case of checkpointing

Re: Multiple Spark Streaming Jobs on Single Master

Re: Multiple Spark Streaming Jobs on Single Master

Re: Worker Machine running out of disk for Long running Streaming process

Worker Machine running out of disk for Long running Streaming process

Re: Writing streaming data to cassandra creates duplicates

Re: Spark Streaming Kafka could not find leader offset for Set()

Re: createDirectStream and Stats

Re: Worker dies with java.io.IOException: Stream closed

Worker dies with java.io.IOException: Stream closed

Re: How does one decide no of executors/cores/memory allocation?

Spark Streaming - Can i BIND Spark Executor to Kafka Partition Leader

Re: How to pass arguments dynamically, that needs to be used in executors

How to pass arguments dynamically, that needs to be used in executors

15 matches

Site Navigation

Mail list logo

Footer information