astralidea created ZEPPELIN-652:
-----------------------------------

             Summary: Zeppelin use spark-sql read hive have too many CLOSE_WAIT
                 Key: ZEPPELIN-652
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-652
             Project: Zeppelin
          Issue Type: Bug
          Components: zeppelin-interpreter
    Affects Versions: 0.6.0
         Environment: spark-1.6.0
net.ipv4.ip_forward = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.default.accept_source_route = 0
kernel.core_uses_pid = 1
kernel.sysrq = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmall = 30455753932
kernel.shmmax = 124746768108748
kernel.core_uses_pid = 0
net.core.somaxconn = 10240
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 300
net.ipv4.tcp_synack_retries = 3
net.ipv4.tcp_syn_retries = 3
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.ipv4.tcp_no_metrics_save = 0
net.core.netdev_max_backlog = 2500
net.ipv4.tcp_max_syn_backlog = 4196
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
vm.swappiness = 0
ulimit -n 65536

            Reporter: astralidea


when use zeppelin read 300G  data task by spark-sql
hive cluster is about 300 machine.
the first time when running the job CLOSE_WAIT increasing a lot.
and next time will throw out an error like:
java.net.SocketException: Too many open files
        at sun.nio.ch.Net.socket0(Native Method)
        at sun.nio.ch.Net.socket(Net.java:411)
        at sun.nio.ch.Net.socket(Net.java:404)
        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:105)
        at 
sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
        at java.nio.channels.SocketChannel.open(SocketChannel.java:145)
        at 
org.apache.hadoop.net.StandardSocketFactory.createSocket(StandardSocketFactory.java:62)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:523)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
        at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
        at org.apache.hadoop.ipc.Client.call(Client.java:1318)
        at org.apache.hadoop.ipc.Client.call(Client.java:1300)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy15.getBlockLocations(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:188)
        at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy16.getBlockLocations(Unknown Source)
        at 
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1064)
        at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1054)
        at 
org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1044)
        at 
org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:235)
        at 
org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:202)
        at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:195)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1212)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:290)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:286)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:286)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:763)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:355)
        at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:237)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1204)
        at 
org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1113)
        at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:237)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
so how to find where should close the socket?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to