Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

2017-11-22 Thread Vadim Semenov
The error message seems self-explanatory, try to figure out what's the disk
quota you have for your user.

On Wed, Nov 22, 2017 at 8:23 AM, Chetan Khatri 
wrote:

> Anybody reply on this ?
>
> On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri <
> chetan.opensou...@gmail.com> wrote:
>
>>
>> Hello Spark Users,
>>
>> I am getting below error, when i am trying to write dataset to parquet
>> location. I have enough disk space available. Last time i was facing same
>> kind of error which were resolved by increasing number of cores at hyper
>> parameters. Currently result set data size is almost 400Gig with below
>> hyper parameters
>>
>> Driver memory: 4g
>> Executor Memory: 16g
>> Executor cores=12
>> num executors= 8
>>
>> Still it's failing, any Idea ? that if i increase executor memory and
>> number of executors.  it could get resolved ?
>>
>>
>> 17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception
>> while reverting partial writes to file /mapr/chetan/local/david.com/t
>> mp/hadoop/nm-local-dir/usercache/david-khurana/appcache/
>> application_1509639363072_10572/blockmgr-008604e6-37cb-
>> 421f-8cc5-e94db75684e7/12/temp_shuffle_ae885911-a1ef-
>> 404f-9a6a-ded544bb5b3c
>> java.io.IOException: Disk quota exceeded
>> at java.io.FileOutputStream.close0(Native Method)
>> at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
>> at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
>> at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
>> at java.io.FileOutputStream.close(FileOutputStream.java:354)
>> at org.apache.spark.storage.TimeTrackingOutputStream.close(Time
>> TrackingOutputStream.java:72)
>> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
>> at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStr
>> eam.java:178)
>> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
>> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
>> at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$
>> anon$2.close(UnsafeRowSerializer.scala:96)
>> at org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$
>> close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108)
>> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:
>> 1316)
>> at org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlo
>> ckObjectWriter.scala:107)
>> at org.apache.spark.storage.DiskBlockObjectWriter.revertPartial
>> WritesAndClose(DiskBlockObjectWriter.scala:159)
>> at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.s
>> top(BypassMergeSortShuffleWriter.java:234)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap
>> Task.scala:85)
>> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap
>> Task.scala:47)
>> at org.apache.spark.scheduler.Task.run(Task.scala:86)
>> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.
>> scala:274)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> 17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way
>> RPC.
>> java.io.IOException: Failed to connect to /192.168.123.43:58889
>> at org.apache.spark.network.client.TransportClientFactory.creat
>> eClient(TransportClientFactory.java:228)
>> at org.apache.spark.network.client.TransportClientFactory.creat
>> eClient(TransportClientFactory.java:179)
>> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpc
>> Env.scala:197)
>> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:
>> 191)
>> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:
>> 187)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1142)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>> Caused by: java.net.ConnectException: Connection refused: /
>> 192.168.123.43:58889
>> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl
>> .java:717)
>> at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect
>> (NioSocketChannel.java:224)
>> at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fi
>> nishConnect(AbstractNioChannel.java:289)
>> at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEven
>> tLoop.java:528)
>> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimiz
>> ed(NioEventLoop.java:468)
>> at io.netty.channel.nio.NioEventLoop.processSelecte

Re: Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

2017-11-22 Thread Chetan Khatri
Anybody reply on this ?

On Tue, Nov 21, 2017 at 3:36 PM, Chetan Khatri 
wrote:

>
> Hello Spark Users,
>
> I am getting below error, when i am trying to write dataset to parquet
> location. I have enough disk space available. Last time i was facing same
> kind of error which were resolved by increasing number of cores at hyper
> parameters. Currently result set data size is almost 400Gig with below
> hyper parameters
>
> Driver memory: 4g
> Executor Memory: 16g
> Executor cores=12
> num executors= 8
>
> Still it's failing, any Idea ? that if i increase executor memory and
> number of executors.  it could get resolved ?
>
>
> 17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception
> while reverting partial writes to file /mapr/chetan/local/david.com/
> tmp/hadoop/nm-local-dir/usercache/david-khurana/appcache/application_
> 1509639363072_10572/blockmgr-008604e6-37cb-421f-8cc5-
> e94db75684e7/12/temp_shuffle_ae885911-a1ef-404f-9a6a-ded544bb5b3c
> java.io.IOException: Disk quota exceeded
> at java.io.FileOutputStream.close0(Native Method)
> at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
> at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
> at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
> at java.io.FileOutputStream.close(FileOutputStream.java:354)
> at org.apache.spark.storage.TimeTrackingOutputStream.close(
> TimeTrackingOutputStream.java:72)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
> at net.jpountz.lz4.LZ4BlockOutputStream.close(
> LZ4BlockOutputStream.java:178)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
> at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
> at org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$
> anon$2.close(UnsafeRowSerializer.scala:96)
> at org.apache.spark.storage.DiskBlockObjectWriter$$
> anonfun$close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108)
> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.
> scala:1316)
> at org.apache.spark.storage.DiskBlockObjectWriter.close(
> DiskBlockObjectWriter.scala:107)
> at org.apache.spark.storage.DiskBlockObjectWriter.
> revertPartialWritesAndClose(DiskBlockObjectWriter.scala:159)
> at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.
> stop(BypassMergeSortShuffleWriter.java:234)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> ShuffleMapTask.scala:85)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(
> ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at org.apache.spark.executor.Executor$TaskRunner.run(
> Executor.scala:274)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> 17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way
> RPC.
> java.io.IOException: Failed to connect to /192.168.123.43:58889
> at org.apache.spark.network.client.TransportClientFactory.
> createClient(TransportClientFactory.java:228)
> at org.apache.spark.network.client.TransportClientFactory.
> createClient(TransportClientFactory.java:179)
> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(
> NettyRpcEnv.scala:197)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.
> scala:191)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.
> scala:187)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused: /
> 192.168.123.43:58889
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(
> SocketChannelImpl.java:717)
> at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(
> NioSocketChannel.java:224)
> at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.
> finishConnect(AbstractNioChannel.java:289)
> at io.netty.channel.nio.NioEventLoop.processSelectedKey(
> NioEventLoop.java:528)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(
> NioEventLoop.java:468)
> at io.netty.channel.nio.NioEventLoop.processSelectedKeys(
> NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at io.netty.util.concurrent.SingleThreadEventExecutor$2.
> run(SingleThreadEventExecutor.java:111)
>   ... 1 more
>


Spark Writing to parquet directory : java.io.IOException: Disk quota exceeded

2017-11-21 Thread Chetan Khatri
Hello Spark Users,

I am getting below error, when i am trying to write dataset to parquet
location. I have enough disk space available. Last time i was facing same
kind of error which were resolved by increasing number of cores at hyper
parameters. Currently result set data size is almost 400Gig with below
hyper parameters

Driver memory: 4g
Executor Memory: 16g
Executor cores=12
num executors= 8

Still it's failing, any Idea ? that if i increase executor memory and
number of executors.  it could get resolved ?


17/11/21 04:29:37 ERROR storage.DiskBlockObjectWriter: Uncaught exception
while reverting partial writes to file /mapr/chetan/local/
david.com/tmp/hadoop/nm-local-dir/usercache/david-khurana/appcache/application_1509639363072_10572/blockmgr-008604e6-37cb-421f-8cc5-e94db75684e7/12/temp_shuffle_ae885911-a1ef-404f-9a6a-ded544bb5b3c
java.io.IOException: Disk quota exceeded
at java.io.FileOutputStream.close0(Native Method)
at java.io.FileOutputStream.access$000(FileOutputStream.java:53)
at java.io.FileOutputStream$1.close(FileOutputStream.java:356)
at java.io.FileDescriptor.closeAll(FileDescriptor.java:212)
at java.io.FileOutputStream.close(FileOutputStream.java:354)
at
org.apache.spark.storage.TimeTrackingOutputStream.close(TimeTrackingOutputStream.java:72)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at
net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStream.java:178)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at
org.apache.spark.sql.execution.UnsafeRowSerializerInstance$$anon$2.close(UnsafeRowSerializer.scala:96)
at
org.apache.spark.storage.DiskBlockObjectWriter$$anonfun$close$2.apply$mcV$sp(DiskBlockObjectWriter.scala:108)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1316)
at
org.apache.spark.storage.DiskBlockObjectWriter.close(DiskBlockObjectWriter.scala:107)
at
org.apache.spark.storage.DiskBlockObjectWriter.revertPartialWritesAndClose(DiskBlockObjectWriter.scala:159)
at
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.stop(BypassMergeSortShuffleWriter.java:234)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/11/21 04:29:37 WARN netty.OneWayOutboxMessage: Failed to send one-way
RPC.
java.io.IOException: Failed to connect to /192.168.123.43:58889
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at
org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: /
192.168.123.43:58889
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at
io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at
io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
  ... 1 more