Re: Research ideas using spark

Akhil Das Tue, 14 Jul 2015 23:33:17 -0700

Try to repartition it to a higher number (at least 3-4 times the total # of
cpu cores). What operation are you doing? It may happen that if you are
doing a join/groupBy sort of operation that task which is taking time is
having all the values, in that case you need to use a Partitioner which
will evenly distribute the keys across machines to speed up things.


Thanks
Best Regards

On Tue, Jul 14, 2015 at 11:12 AM, shahid ashraf <sha...@trialx.com> wrote:

> hi
>
> I have a 10 node cluster  i loaded the data onto hdfs, so the no. of
> partitions i get is 9. I am running a spark application , it gets stuck on
> one of tasks, looking at the UI it seems application is not using all nodes
> to do calculations. attached is the screen shot of tasks, it seems tasks
> are put on each node more then once. looking at tasks 8 tasks get completed
> under 7-8 minutes and one task takes around 30 minutes so causing the delay
> in results.
>
>
> On Tue, Jul 14, 2015 at 10:48 AM, Shashidhar Rao <
> raoshashidhar...@gmail.com> wrote:
>
>> Hi,
>>
>> I am doing my PHD thesis on large scale machine learning e.g  Online
>> learning, batch and mini batch learning.
>>
>> Could somebody help me with ideas especially in the context of Spark and
>> to the above learning methods.
>>
>> Some ideas like improvement to existing algorithms, implementing new
>> features especially the above learning methods and algorithms that have not
>> been implemented etc.
>>
>> If somebody could help me with some ideas it would really accelerate my
>> work.
>>
>> Plus few ideas on research papers regarding Spark or Mahout.
>>
>> Thanks in advance.
>>
>> Regards
>>
>
>
>
> --
> with Regards
> Shahid Ashraf
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

Re: Research ideas using spark

Reply via email to