Hi Sarath,

By any chance have you resolved this issue ?

Thanks,
Padma CH

On Tue, Apr 28, 2015 at 11:20 PM, sarath [via Apache Spark User List] <
ml-node+s1001560n22694...@n3.nabble.com> wrote:

>
> I am trying to train a large dataset consisting of 8 million data points
> and 20 million features using SVMWithSGD. But it is failing after running
> for some time. I tried increasing num-partitions, driver-memory,
> executor-memory, driver-max-resultSize. Also I tried by reducing the size
> of dataset from 8 million to 25K (keeping number of features same 20 M).
> But after using the entire 64GB driver memory for 20 to 30 min it failed.
>
> I'm using a cluster of 8 nodes (each with 8 cores and 64G RAM).
> executor-memory - 60G
> driver-memory - 60G
> num-executors - 64
> And other default settings
>
> This is the error log :
>
> 15/04/20 11:51:09 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 15/04/20 11:51:29 WARN BLAS: Failed to load implementation from:
> com.github.fommil.netlib.NativeSystemBLAS
> 15/04/20 11:51:29 WARN BLAS: Failed to load implementation from:
> com.github.fommil.netlib.NativeRefBLAS
> 15/04/20 11:56:11 WARN TransportChannelHandler: Exception in connection
> from xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         .......
> 15/04/20 11:56:11 ERROR TransportResponseHandler: Still have 7 requests
> outstanding when connection from xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029 is
> closed
> 15/04/20 11:56:11 ERROR OneForOneBlockFetcher: Failed while starting block
> fetches
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         .......
> 15/04/20 11:56:11 ERROR OneForOneBlockFetcher: Failed while starting block
> fetches
> java.io.IOException: Connection reset by peer
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>         ...........
> 15/04/20 11:56:12 ERROR RetryingBlockFetcher: Exception while beginning
> fetch of 1 outstanding blocks
> java.io.IOException: Failed to connect to
> xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029
>         at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
>
>         at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
>
>         at
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
>
>         at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>
>         at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
>
>         at
> org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:87)
>
>         at
> org.apache.spark.storage.ShuffleBlockFetcherIterator.sendRequest(ShuffleBlockFetcherIterator.scala:149)
>
>         at
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:290)
>
>         at
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:53)
>
>         at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>         at
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
>
>         at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>
>         at
> org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:91)
>         at
> org.apache.spark.shuffle.hash.HashShuffleReader.read(HashShuffleReader.scala:44)
>
>         at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:92)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>         at
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>         at org.apache.spark.scheduler.Task.run(Task.scala:64)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused:
> xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
>
>         at
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
>
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>
>         ... 1 more
> 15/04/20 11:56:15 ERROR RetryingBlockFetcher: Exception while beginning
> fetch of 1 outstanding blocks
> java.io.IOException: Failed to connect to
> xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029
>         at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
>
>
> Caused by: java.net.ConnectException: Connection refused:
> xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
>
>         at
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
>
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>
>         ... 1 more
> 15/04/20 11:56:27 ERROR ShuffleBlockFetcherIterator: Failed to get
> block(s) from xxx.xxx.xxx.net:41029
> java.io.IOException: Failed to connect to
> xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029
>         at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
>
>         at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
>
>         at
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
>
>         at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>
>         at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
>
>         at
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused:
> xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
>
>         at
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
>
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>
>         ... 1 more
> 15/04/20 11:56:30 ERROR ShuffleBlockFetcherIterator: Failed to get
> block(s) from xxx.xxx.xxx.net:41029
> java.io.IOException: Failed to connect to
> xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029
>         at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:191)
>
>         at
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:156)
>
>         at
> org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:78)
>
>         at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
>
>         at
> org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
>
>         at
> org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused:
> xxx.xxx.xxx.net/xxx.xxx.xxx.xxx:41029
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>         at
> io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:208)
>
>         at
> io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:287)
>
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>
>         at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>
>         ... 1 more
>
> ------------------------------
> If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-SVMWithSGD-is-failing-for-large-dataset-tp22694.html
> To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=bGVhcm5pbmdzLmNoaXR0dXJpQGdtYWlsLmNvbXwxfC03NzExMjUwMg==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/MLLib-SVMWithSGD-is-failing-for-large-dataset-tp22694p27304.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to