Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Sofiane Cherchalli Wed, 10 May 2017 05:38:29 -0700

I've put the csv in the worker node since the job is run in the worker. I
didn't put the csv in the master because I believe it doesn't run jobs.


If I put the csv in the zeppelin node with the same path as the worker, it
reads the csv and writes a _SUCCESS file locally. The job is run on the
worker too but doesn't terminate. The result is saved under a _temporary
directory in the worker.

worker - ls -laRt /data/02.csv/


02.csv/:
total 0
drwxr-xr-x. 3 root root 24 Apr 28 09:55 .
drwxr-xr-x. 3 root root 15 Apr 28 09:55 _temporary
drwxr-xr-x. 3 root root 64 Apr 28 09:55 ..

02.csv/_temporary:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 0
drwxr-xr-x. 3 root root  15 Apr 28 09:55 .
drwxr-xr-x. 3 root root  24 Apr 28 09:55 ..

02.csv/_temporary/0:
total 0
drwxr-xr-x. 5 root root 106 Apr 28 09:56 .
drwxr-xr-x. 2 root root   6 Apr 28 09:56 _temporary
drwxr-xr-x. 2 root root 129 Apr 28 09:56 task_20170428095632_0005_m_000000
drwxr-xr-x. 2 root root 129 Apr 28 09:55 task_20170428095516_0002_m_000000
drwxr-xr-x. 3 root root  15 Apr 28 09:55 ..

02.csv/_temporary/0/_temporary:
total 0
drwxr-xr-x. 2 root root   6 Apr 28 09:56 .
drwxr-xr-x. 5 root root 106 Apr 28 09:56 ..

02.csv/_temporary/0/task_20170428095632_0005_m_000000:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:56
.part-00000-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:56
part-00000-e39ebc76-5343-407e-b42e-c33e69b8fd1a.csv
drwxr-xr-x. 2 root root   129 Apr 28 09:56 .

02.csv/_temporary/0/task_20170428095516_0002_m_000000:
total 52
drwxr-xr-x. 5 root root   106 Apr 28 09:56 ..
-rw-r--r--. 1 root root   376 Apr 28 09:55
.part-00000-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv.crc
-rw-r--r--. 1 root root 46605 Apr 28 09:55
part-00000-c2ac5299-26f6-4b23-a74b-b3dc96464271.csv


zeppelin - ls -laRt 02.csv/


02.csv/:
total 12
drwxr-sr-x    2 root     10000700      4096 Apr 28 09:56 .
-rw-r--r--    1 root     10000700         8 Apr 28 09:56 ._SUCCESS.crc
-rw-r--r--    1 root     10000700         0 Apr 28 09:56 _SUCCESS
drwxrwsr-x    5 root     10000700      4096 Apr 28 09:56 ..




El El mié, 10 may 2017 a las 14:06, Meethu Mathew <meethu.mat...@flytxt.com>
escribió:

> Try putting the csv in the same path in all the nodes or in a mount point
> path which is accessible by all the nodes
>
> Regards,
>
>
> Meethu Mathew
>
>
> On Wed, May 10, 2017 at 3:36 PM, Sofiane Cherchalli <sofian...@gmail.com>
> wrote:
>
>> Yes, I already tested with spark-shell and pyspark , with the same result.
>>
>> Can't I use Linux filesystem to read CSV, such as file:///data/file.csv.
>> My understanding is that the job is sent and is interpreted in the worker,
>> isn't it?
>>
>> Thanks.
>>
>> El El mar, 9 may 2017 a las 20:23, Jongyoul Lee <jongy...@gmail.com>
>> escribió:
>>
>>> Could you test if it works with spark-shell?
>>>
>>> On Sun, May 7, 2017 at 5:22 PM, Sofiane Cherchalli <sofian...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have a standalone cluster, one master and one worker, running in
>>>> separate nodes. Zeppelin is running is in a separate node too in client
>>>> mode.
>>>>
>>>> When I run a notebook that reads a CSV file located in the worker
>>>> node with Spark-CSV package, Zeppelin tries to read the CSV locally and
>>>> fails because the CVS is in the worker node and not in Zeppelin node.
>>>>
>>>> Is this the expected behavior?
>>>>
>>>> Thanks.
>>>>
>>>
>>>
>>>
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>>
>>
>

Re: Spark-CSV - Zeppelin tries to read CSV locally in Standalon mode

Reply via email to