Re: Performance problem and profiling

David King Wed, 30 Nov 2011 09:39:52 -0800

> It would be hard to give you any pointers without seeing the script that
> you are using.


I did link the script:

>> There are about 15 thousand JSON files totalling 2.1gb (uncompressed), so
>> it's not that big. And the code is, I think, pretty simple. Take a look:
>> http://pastebin.com/3y7e2ZTq . The loader mentioned there is pretty
>> simple too, it's basically a hack of ElephantBird's JSON loader to dive
>> deeper into the JSON and make bags out of JSON lists in addition to simpler
>> maps that EB does http://pastebin.com/dFKX3AJc

> Lastly, do not consider execution speed on your laptop as a benchmark.
> Hadoop gets it's power by running in the distributed mode on multiple
> nodes. Local mode will generally perform much worse then single threaded
> process since it's trying to mimic what happens on the cluster which
> requires quite a bit of coordination between mappers and reducers.

5+ hours over 10 nodes in mapreduce mode is pretty damning, local mode slowness 
or no.

Re: Performance problem and profiling

Reply via email to