Ted, thanks for the reply.

Yeah, there were just three nodes with hdfs and spark workers colocated.
There were actually one more with spark master (standalone) and namenode.
And I've thrown one more spark worker node, which sees whole hdfs pretty
well, but doesn't have colocated datanode process.

Unfortunately I can't provide code snippet as that's part of my proprietary
web service application, but the job that's running is just mllib (old api)
RandomForest regression training.

--
Be well!
Jean Morozov

On Sun, Mar 6, 2016 at 2:20 AM, Ted Yu <[email protected]> wrote:

> bq.  I haven't added one more HDFS node to a hadoop cluster
>
> Does each of three nodes colocate with hdfs data nodes ?
> The absence of 4th data node might have something to do with the partition
> allocation.
>
> Can you show your code snippet ?
>
> Thanks
>
> On Sat, Mar 5, 2016 at 2:54 PM, Eugene Morozov <[email protected]
> > wrote:
>
>> Hi,
>>
>> My cluster (standalone deployment) consisting of 3 worker nodes was in
>> the middle of computations, when I added one more worker node. I can see
>> that new worker is registered in master and that my job actually get one
>> more executor. I have configured default parallelism as 12 and thus I see
>> that each of three nodes holds 4 RDD blocks. I have expected that by adding
>> one more node those partitions would be reassigned into 4 nodes instead of
>> 3, but I don't see that. Even though all partitions are cached in memory of
>> those 3 nodes and most probably have their locations or preferred locations
>> known, I'd still expect partitions be reassigned. Are my expectations
>> incorrect?
>>
>> I do also have an HDFS setup on those the same 3 worker nodes, but when I
>> added one more node to the spark cluster I haven't added one more HDFS node
>> to a hadoop cluster. That in my mind shouldn't be the reason, but who knows?
>>
>> Thanks.
>> --
>> Be well!
>> Jean Morozov
>>
>
>

Reply via email to