I'm seeing some weird behavior and wanted some feedback.

I have a fairly large, multi-hour job that operates over about 5TB of data.

It builds it out into a ranked category index of about 25000 categories
sorted by rank, descending.

I want to write this to a file but it's not actually writing any data.

if I run myrdd.take(100) ... that works fine and prints data to a file.

If I run

myrdd.write.json(), it takes the same amount of time, and then writes a
local file with a SUCCESS file but no actual partition data in the file.
There's only one small file with SUCCESS.

Any advice on how to debug this?


We’re hiring if you know of any awesome Java Devops or Linux Operations

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile

Reply via email to