> It would be hard to give you any pointers without seeing the script that > you are using.
I did link the script: >> There are about 15 thousand JSON files totalling 2.1gb (uncompressed), so >> it's not that big. And the code is, I think, pretty simple. Take a look: >> http://pastebin.com/3y7e2ZTq . The loader mentioned there is pretty >> simple too, it's basically a hack of ElephantBird's JSON loader to dive >> deeper into the JSON and make bags out of JSON lists in addition to simpler >> maps that EB does http://pastebin.com/dFKX3AJc > Lastly, do not consider execution speed on your laptop as a benchmark. > Hadoop gets it's power by running in the distributed mode on multiple > nodes. Local mode will generally perform much worse then single threaded > process since it's trying to mimic what happens on the cluster which > requires quite a bit of coordination between mappers and reducers. 5+ hours over 10 nodes in mapreduce mode is pretty damning, local mode slowness or no.
