Re: how can evenly distribute my records in all partition

2015-11-18 Thread prateek arora
Hi
Thanks for the help.
In my Case ...
I want to perform operation on 30 record per second using spark streaming.
and difference between key of records is around 33-34 ms and my RDD that
have 30 records already have 4 partition.
and right now my algo take around 400 ms to perform operation on 1 record .
so i want to distribute my records evenly so every executor perform
operation only on one record and my 1 second batch will be completed
without delay.


On Tue, Nov 17, 2015 at 7:50 PM, Sonal Goyal  wrote:

> Think about how you want to distribute your data and how your keys are
> spread currently. Do you want to compute something per day, per week etc.
> Based on that, return a partition number. You could use mod 30 or some such
> function to get the partitions.
> On Nov 18, 2015 5:17 AM, "prateek arora" 
> wrote:
>
>> Hi
>> I am trying to implement custom partitioner using this link
>> http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
>> ( in link example key value is from 0 to (noOfElement - 1))
>>
>> but not able to understand how i  implement  custom partitioner  in my
>> case:
>>
>> my parent RDD have 4 partition and RDD key is : TimeStamp and Value is
>> JPEG Byte Array
>>
>>
>> Regards
>> Prateek
>>
>>
>> On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu  wrote:
>>
>>> Please take a look at the following for example:
>>>
>>> ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala
>>> ./core/src/main/scala/org/apache/spark/Partitioner.scala
>>>
>>> Cheers
>>>
>>> On Tue, Nov 17, 2015 at 9:24 AM, prateek arora <
>>> prateek.arora...@gmail.com> wrote:
>>>
>>>> Hi
>>>> Thanks
>>>> I am new in spark development so can you provide some help to write a
>>>> custom partitioner to achieve this.
>>>> if you have and link or example to write custom partitioner please
>>>> provide to me.
>>>>
>>>> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan <
>>>> sabarish.sasidha...@manthan.com> wrote:
>>>>
>>>>> You can write your own custom partitioner to achieve this
>>>>>
>>>>> Regards
>>>>> Sab
>>>>> On 17-Nov-2015 1:11 am, "prateek arora" 
>>>>> wrote:
>>>>>
>>>>>> Hi
>>>>>>
>>>>>> I have a RDD with 30 record ( Key/value pair ) and running 30
>>>>>> executor . i
>>>>>> want to reparation this RDD in to 30 partition so every partition
>>>>>> get one
>>>>>> record and assigned to one executor .
>>>>>>
>>>>>> when i used rdd.repartition(30) its repartition my rdd in 30
>>>>>> partition but
>>>>>> some partition get 2 record , some get 1 record and some not getting
>>>>>> any
>>>>>> record .
>>>>>>
>>>>>> is there any way in spark so i can evenly distribute my record in all
>>>>>> partition .
>>>>>>
>>>>>> Regards
>>>>>> Prateek
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>> Nabble.com.
>>>>>>
>>>>>> -
>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>
>>>>>>
>>>>
>>>
>>


Re: how can evenly distribute my records in all partition

2015-11-17 Thread Sonal Goyal
Think about how you want to distribute your data and how your keys are
spread currently. Do you want to compute something per day, per week etc.
Based on that, return a partition number. You could use mod 30 or some such
function to get the partitions.
On Nov 18, 2015 5:17 AM, "prateek arora"  wrote:

> Hi
> I am trying to implement custom partitioner using this link
> http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
> ( in link example key value is from 0 to (noOfElement - 1))
>
> but not able to understand how i  implement  custom partitioner  in my
> case:
>
> my parent RDD have 4 partition and RDD key is : TimeStamp and Value is
> JPEG Byte Array
>
>
> Regards
> Prateek
>
>
> On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu  wrote:
>
>> Please take a look at the following for example:
>>
>> ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala
>> ./core/src/main/scala/org/apache/spark/Partitioner.scala
>>
>> Cheers
>>
>> On Tue, Nov 17, 2015 at 9:24 AM, prateek arora <
>> prateek.arora...@gmail.com> wrote:
>>
>>> Hi
>>> Thanks
>>> I am new in spark development so can you provide some help to write a
>>> custom partitioner to achieve this.
>>> if you have and link or example to write custom partitioner please
>>> provide to me.
>>>
>>> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan <
>>> sabarish.sasidha...@manthan.com> wrote:
>>>
>>>> You can write your own custom partitioner to achieve this
>>>>
>>>> Regards
>>>> Sab
>>>> On 17-Nov-2015 1:11 am, "prateek arora" 
>>>> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I have a RDD with 30 record ( Key/value pair ) and running 30 executor
>>>>> . i
>>>>> want to reparation this RDD in to 30 partition so every partition  get
>>>>> one
>>>>> record and assigned to one executor .
>>>>>
>>>>> when i used rdd.repartition(30) its repartition my rdd in 30 partition
>>>>> but
>>>>> some partition get 2 record , some get 1 record and some not getting
>>>>> any
>>>>> record .
>>>>>
>>>>> is there any way in spark so i can evenly distribute my record in all
>>>>> partition .
>>>>>
>>>>> Regards
>>>>> Prateek
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> -
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>
>>
>


Re: how can evenly distribute my records in all partition

2015-11-17 Thread prateek arora
Hi
I am trying to implement custom partitioner using this link
http://stackoverflow.com/questions/23127329/how-to-define-custom-partitioner-for-spark-rdds-of-equally-sized-partition-where
( in link example key value is from 0 to (noOfElement - 1))

but not able to understand how i  implement  custom partitioner  in my case:

my parent RDD have 4 partition and RDD key is : TimeStamp and Value is JPEG
Byte Array


Regards
Prateek


On Tue, Nov 17, 2015 at 9:28 AM, Ted Yu  wrote:

> Please take a look at the following for example:
>
> ./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala
> ./core/src/main/scala/org/apache/spark/Partitioner.scala
>
> Cheers
>
> On Tue, Nov 17, 2015 at 9:24 AM, prateek arora  > wrote:
>
>> Hi
>> Thanks
>> I am new in spark development so can you provide some help to write a
>> custom partitioner to achieve this.
>> if you have and link or example to write custom partitioner please
>> provide to me.
>>
>> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan <
>> sabarish.sasidha...@manthan.com> wrote:
>>
>>> You can write your own custom partitioner to achieve this
>>>
>>> Regards
>>> Sab
>>> On 17-Nov-2015 1:11 am, "prateek arora" 
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> I have a RDD with 30 record ( Key/value pair ) and running 30 executor
>>>> . i
>>>> want to reparation this RDD in to 30 partition so every partition  get
>>>> one
>>>> record and assigned to one executor .
>>>>
>>>> when i used rdd.repartition(30) its repartition my rdd in 30 partition
>>>> but
>>>> some partition get 2 record , some get 1 record and some not getting any
>>>> record .
>>>>
>>>> is there any way in spark so i can evenly distribute my record in all
>>>> partition .
>>>>
>>>> Regards
>>>> Prateek
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> -
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>
>


Re: how can evenly distribute my records in all partition

2015-11-17 Thread Ted Yu
Please take a look at the following for example:

./core/src/main/scala/org/apache/spark/api/python/PythonPartitioner.scala
./core/src/main/scala/org/apache/spark/Partitioner.scala

Cheers

On Tue, Nov 17, 2015 at 9:24 AM, prateek arora 
wrote:

> Hi
> Thanks
> I am new in spark development so can you provide some help to write a
> custom partitioner to achieve this.
> if you have and link or example to write custom partitioner please
> provide to me.
>
> On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan <
> sabarish.sasidha...@manthan.com> wrote:
>
>> You can write your own custom partitioner to achieve this
>>
>> Regards
>> Sab
>> On 17-Nov-2015 1:11 am, "prateek arora" 
>> wrote:
>>
>>> Hi
>>>
>>> I have a RDD with 30 record ( Key/value pair ) and running 30 executor .
>>> i
>>> want to reparation this RDD in to 30 partition so every partition  get
>>> one
>>> record and assigned to one executor .
>>>
>>> when i used rdd.repartition(30) its repartition my rdd in 30 partition
>>> but
>>> some partition get 2 record , some get 1 record and some not getting any
>>> record .
>>>
>>> is there any way in spark so i can evenly distribute my record in all
>>> partition .
>>>
>>> Regards
>>> Prateek
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> -
>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>
>>>
>


Re: how can evenly distribute my records in all partition

2015-11-17 Thread prateek arora
Hi
Thanks
I am new in spark development so can you provide some help to write a
custom partitioner to achieve this.
if you have and link or example to write custom partitioner please provide
to me.

On Mon, Nov 16, 2015 at 6:13 PM, Sabarish Sasidharan <
sabarish.sasidha...@manthan.com> wrote:

> You can write your own custom partitioner to achieve this
>
> Regards
> Sab
> On 17-Nov-2015 1:11 am, "prateek arora" 
> wrote:
>
>> Hi
>>
>> I have a RDD with 30 record ( Key/value pair ) and running 30 executor . i
>> want to reparation this RDD in to 30 partition so every partition  get one
>> record and assigned to one executor .
>>
>> when i used rdd.repartition(30) its repartition my rdd in 30 partition but
>> some partition get 2 record , some get 1 record and some not getting any
>> record .
>>
>> is there any way in spark so i can evenly distribute my record in all
>> partition .
>>
>> Regards
>> Prateek
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>


Re: how can evenly distribute my records in all partition

2015-11-16 Thread Sabarish Sasidharan
You can write your own custom partitioner to achieve this

Regards
Sab
On 17-Nov-2015 1:11 am, "prateek arora"  wrote:

> Hi
>
> I have a RDD with 30 record ( Key/value pair ) and running 30 executor . i
> want to reparation this RDD in to 30 partition so every partition  get one
> record and assigned to one executor .
>
> when i used rdd.repartition(30) its repartition my rdd in 30 partition but
> some partition get 2 record , some get 1 record and some not getting any
> record .
>
> is there any way in spark so i can evenly distribute my record in all
> partition .
>
> Regards
> Prateek
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


how can evenly distribute my records in all partition

2015-11-16 Thread prateek arora
Hi

I have a RDD with 30 record ( Key/value pair ) and running 30 executor . i
want to reparation this RDD in to 30 partition so every partition  get one
record and assigned to one executor .

when i used rdd.repartition(30) its repartition my rdd in 30 partition but
some partition get 2 record , some get 1 record and some not getting any
record .

is there any way in spark so i can evenly distribute my record in all
partition .

Regards
Prateek



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-can-evenly-distribute-my-records-in-all-partition-tp25394.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org