I'm seeing some weird behavior and wanted some feedback.
I have a fairly large, multi-hour job that operates over about 5TB of data.
It builds it out into a ranked category index of about 25000 categories
sorted by rank, descending.
I want to write this to a file but it's not actually writing any data.
if I run myrdd.take(100) ... that works fine and prints data to a file.
If I run
myrdd.write.json(), it takes the same amount of time, and then writes a
local file with a SUCCESS file but no actual partition data in the file.
There's only one small file with SUCCESS.
Any advice on how to debug this?
We’re hiring if you know of any awesome Java Devops or Linux Operations
Location: *San Francisco, CA*
… or check out my Google+ profile