Hi All,
I have enabled replication for my RDDs.
I see on the Storage tab of the Spark UI, which mentions the replication
level 2x or 1x.
But the names given are mappedRDD, shuffledRDD, I am not able to debug
which of my RDD is 2n replicated, and which one is 1x.
Please help.
Regards,
Gaurab
Hi Tathagata/Cody,
I am facing a challenge in Production with DAG behaviour during
checkpointing in spark streaming -
Step 1 : Read data from Kafka every 15 min - call this KafkaStreamRDD ~
100 GB of data
Step 2 : Repartition KafkaStreamRdd from 5 to 100 partitions to parallelise
processing - c
AM, Augustus Hong
wrote:
> How did you specify number of cores each executor can use?
>
> Be sure to use this when submitting jobs with spark-submit:
> *--total-executor-cores
> 100.*
>
> Other options won't work from my experience.
>
> On Fri, Oct 23, 2015 at
Hi,
I created 2 workers on same machine each with 4 cores and 6GB ram
I submitted first job, and it allocated 2 cores on each of the worker
processes, and utilized full 4 GB ram for each executor process
When i submit my second job it always say in WAITING state.
Cheers!!
On Tue, Oct 20, 20
dically (say every 10 mins) run System.gc() on the driver.
>> The cleaning up shuffles is tied to the garbage collection.
>>
>>
>> On Fri, Aug 21, 2015 at 2:59 AM, gaurav sharma
>> wrote:
>>
>>> Hi All,
>>>
>>>
>>> I have a
Hi All,
I have a 24x7 running Streaming Process, which runs on 2 hour windowed data
The issue i am facing is my worker machines are running OUT OF DISK space
I checked that the SHUFFLE FILES are not getting cleaned up.
/log/spark-2b875d98-1101-4e61-86b4-67c9e71954cc/executor-5bbb53c1-cee9-4438
Ideally the 2 messages read from kafka must differ on some parameter
atleast, or else they are logically same
As a solution to your problem, if the message content is same, u cud create
a new field UUID, which might play the role of partition key while
inserting the 2 messages in Cassandra
Msg1 -
I have run into similar excpetions
ERROR DirectKafkaInputDStream: ArrayBuffer(java.net.SocketTimeoutException,
org.apache.spark.SparkException: Couldn't find leader offsets for
Set([AdServe,1]))
and the issue has happened on Kafka Side, where my broker offsets go out of
sync, or do not return l
Hi guys,
I too am facing similar challenge with directstream.
I have 3 Kafka Partitions.
and running spark on 18 cores, with parallelism level set to 48.
I am running simple map-reduce job on incoming stream.
Though the reduce stage takes milliseconds-seconds for around 15 million
packets, the
ite to /opt/ on that machine as its one machine always
> throwing up.
>
> Thanks
> Best Regards
>
> On Sat, Jul 11, 2015 at 11:18 PM, gaurav sharma
> wrote:
>
>> Hi All,
>>
>> I am facing this issue in my production environment.
>>
>> My worke
Hi All,
I am facing this issue in my production environment.
My worker dies by throwing this exception.
But i see the space is available on all the partitions on my disk
I did NOT see any abrupt increase in DIsk IO, which might have choked the
executor to write on to the stderr file.
But still m
When you submit a job, spark breaks down it into stages, as per DAG. the
stages run transformations or actions on the rdd's. Each rdd constitutes of
N partitions. The tasks creates by spark to execute the stage are equal to
the number of partitions. Every task is executed on the cored utilized by
Hi,
I am using Kafka Spark cluster for real time aggregation analytics use case
in production.
*Cluster details*
*6 nodes*, each node running 1 Spark and kafka processes each.
Node1 -> 1 Master , 1 Worker, 1 Driver,
1 Kafka process
Node 2,3,4,5,6 -> 1 Worker prcocess each
ence the broadcast variable in you workers. It will get
> shipped once to all nodes in the cluster and can be referenced by them.
>
> HTH.
>
> -Todd
>
> On Thu, Jun 11, 2015 at 8:23 AM, gaurav sharma
> wrote:
>
>> Hi,
>>
>> I am using Kafka Spark cluste
Hi,
I am using Kafka Spark cluster for real time aggregation analytics use case
in production.
Cluster details
6 nodes, each node running 1 Spark and kafka processes each.
Node1 -> 1 Master , 1 Worker, 1 Driver,
1 Kafka process
Node 2,3,4,5,6 -> 1 Worker prcocess each
15 matches
Mail list logo