Re: the spark worker assignment Question?

Andrew Ash Thu, 02 Jan 2014 10:00:08 -0800

That sounds right Mayur.

Also in 0.8.1 I hear there's a new repartition method that you might be
able to use to further distribute the data.  But if your data is so small
that it fits in just a couple blocks, why are you using 20 machines just to
process a quarter GB of data?  Is the computation on each bit extremely
intensive?



On Thu, Jan 2, 2014 at 12:39 PM, Mayur Rustagi <[email protected]>wrote:

> I have experienced a similar issue. The easiest fix I found was to
> increase the replication of the data being used in the worker to the number
> of workers you want to use for processing. The RDD seem to created on all
> the machines where the blocks are replicated. Please correct me if I am
> wrong.
>
> Regards
> Mayur
>
> Mayur Rustagi
> Ph: +919632149971
> h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
> https://twitter.com/mayur_rustagi
>
>
>
> On Thu, Jan 2, 2014 at 10:46 PM, Andrew Ash <[email protected]> wrote:
>
>> Hi lihu,
>>
>> Maybe the data you're accessing is in in HDFS and only resides on 4 of
>> your 20 machines because it's only about 4 blocks (at default 64MB / block
>> that's around a quarter GB).  Where is your source data located and how is
>> it stored?
>>
>> Andrew
>>
>>
>> On Thu, Jan 2, 2014 at 7:53 AM, lihu <[email protected]> wrote:
>>
>>> Hi,
>>>    I run  spark on a cluster with 20 machine, but when I start an
>>> application use the spark-shell, there only 4 machine is working , the
>>> other with just idle, without memery and cpu used, I watch this through
>>> webui.
>>>
>>>    I wonder the other machine maybe  busy, so i watch the machines using
>>>  "top" and "free" command, but this is not。
>>>
>>>   * So I just wonder why not spark assignment work to all all the 20
>>> machine? this is not a good resource usage.*
>>>
>>>
>>>
>>>
>>>
>>
>

Re: the spark worker assignment Question?

Reply via email to