Re: take() works on RDD but .write.json() does not work in 2.0.0

2016-09-19 Thread Kevin Burton
I tried with write.json and write.csv.  The write.text method won't work
because I have more than one column and refuses to execute.

Doesn't seem to work on any data.

On Sat, Sep 17, 2016 at 10:52 PM, Hyukjin Kwon  wrote:

> Hi Kevin,
>
> I have few questions on this.
>
> Does that only not work with write.json() ? I just wonder if write.text,
> csv or another API does not work as well and it is a JSON specific issue.
>
> Also, does that work with small data? I want to make sure if this happen
> only on large data.
>
> Thanks!
>
>
>
> 2016-09-18 6:42 GMT+09:00 Kevin Burton :
>
>> I'm seeing some weird behavior and wanted some feedback.
>>
>> I have a fairly large, multi-hour job that operates over about 5TB of
>> data.
>>
>> It builds it out into a ranked category index of about 25000 categories
>> sorted by rank, descending.
>>
>> I want to write this to a file but it's not actually writing any data.
>>
>> if I run myrdd.take(100) ... that works fine and prints data to a file.
>>
>> If I run
>>
>> myrdd.write.json(), it takes the same amount of time, and then writes a
>> local file with a SUCCESS file but no actual partition data in the file.
>> There's only one small file with SUCCESS.
>>
>> Any advice on how to debug this?
>>
>> --
>>
>> We’re hiring if you know of any awesome Java Devops or Linux Operations
>> Engineers!
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> … or check out my Google+ profile
>> 
>>
>>
>


-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile



Re: take() works on RDD but .write.json() does not work in 2.0.0

2016-09-17 Thread Hyukjin Kwon
Hi Kevin,

I have few questions on this.

Does that only not work with write.json() ? I just wonder if write.text,
csv or another API does not work as well and it is a JSON specific issue.

Also, does that work with small data? I want to make sure if this happen
only on large data.

Thanks!



2016-09-18 6:42 GMT+09:00 Kevin Burton :

> I'm seeing some weird behavior and wanted some feedback.
>
> I have a fairly large, multi-hour job that operates over about 5TB of data.
>
> It builds it out into a ranked category index of about 25000 categories
> sorted by rank, descending.
>
> I want to write this to a file but it's not actually writing any data.
>
> if I run myrdd.take(100) ... that works fine and prints data to a file.
>
> If I run
>
> myrdd.write.json(), it takes the same amount of time, and then writes a
> local file with a SUCCESS file but no actual partition data in the file.
> There's only one small file with SUCCESS.
>
> Any advice on how to debug this?
>
> --
>
> We’re hiring if you know of any awesome Java Devops or Linux Operations
> Engineers!
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
>
>


take() works on RDD but .write.json() does not work in 2.0.0

2016-09-17 Thread Kevin Burton
I'm seeing some weird behavior and wanted some feedback.

I have a fairly large, multi-hour job that operates over about 5TB of data.

It builds it out into a ranked category index of about 25000 categories
sorted by rank, descending.

I want to write this to a file but it's not actually writing any data.

if I run myrdd.take(100) ... that works fine and prints data to a file.

If I run

myrdd.write.json(), it takes the same amount of time, and then writes a
local file with a SUCCESS file but no actual partition data in the file.
There's only one small file with SUCCESS.

Any advice on how to debug this?

-- 

We’re hiring if you know of any awesome Java Devops or Linux Operations
Engineers!

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile