I'm seeing some weird behavior and wanted some feedback.

I have a fairly large, multi-hour job that operates over about 5TB of data.

It builds it out into a ranked category index of about 25000 categories
sorted by rank, descending.

I want to write this to a file but it's not actually writing any data.

if I run myrdd.take(100) ... that works fine and prints data to a file.

If I run

myrdd.write.json(), it takes the same amount of time, and then writes a
local file with a SUCCESS file but no actual partition data in the file.
There's only one small file with SUCCESS.

Any advice on how to debug this?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>

Reply via email to