Looks like it's a HDFS issue, pretty new. https://issues.apache.org/jira/browse/HDFS-6999
Jianshi On Thu, Sep 25, 2014 at 12:10 AM, Jianshi Huang <jianshi.hu...@gmail.com> wrote: > Hi Ted, > > See my previous reply to Debasish, all region servers are idle. I don't > think it's caused by hotspotting. > > Besides, only 6 out of 3000 tasks were stuck, and their inputs are about > only 80MB each. > > Jianshi > > On Wed, Sep 24, 2014 at 11:58 PM, Ted Yu <yuzhih...@gmail.com> wrote: > >> I was thinking along the same line. >> >> Jianshi: >> See >> http://hbase.apache.org/book.html#d0e6369 >> >> On Wed, Sep 24, 2014 at 8:56 AM, Debasish Das <debasish.da...@gmail.com> >> wrote: >> >>> HBase regionserver needs to be balanced....you might have some skewness >>> in row keys and one regionserver is under pressure....try finding that key >>> and replicate it using random salt >>> >>> On Wed, Sep 24, 2014 at 8:51 AM, Jianshi Huang <jianshi.hu...@gmail.com> >>> wrote: >>> >>>> Hi Ted, >>>> >>>> It converts RDD[Edge] to HBase rowkey and columns and insert them to >>>> HBase (in batch). >>>> >>>> BTW, I found batched Put actually faster than generating HFiles... >>>> >>>> >>>> Jianshi >>>> >>>> On Wed, Sep 24, 2014 at 11:49 PM, Ted Yu <yuzhih...@gmail.com> wrote: >>>> >>>>> bq. at com.paypal.risk.rds.dragon. >>>>> storage.hbase.HbaseRDDBatch$$anonfun$batchInsertEdges$3. >>>>> apply(HbaseRDDBatch.scala:179) >>>>> >>>>> Can you reveal what HbaseRDDBatch.scala does ? >>>>> >>>>> Cheers >>>>> >>>>> On Wed, Sep 24, 2014 at 8:46 AM, Jianshi Huang < >>>>> jianshi.hu...@gmail.com> wrote: >>>>> >>>>>> One of my big spark program always get stuck at 99% where a few tasks >>>>>> never finishes. >>>>>> >>>>>> I debugged it by printing out thread stacktraces, and found there're >>>>>> workers stuck at parquet.hadoop.ParquetFileReader.readNextRowGroup. >>>>>> >>>>>> Anyone had similar problem? I'm using Spark 1.1.0 built for HDP2.1. >>>>>> The parquet files are generated by pig using latest parquet-pig-bundle >>>>>> v1.6.0rc1. >>>>>> >>>>>> From Spark 1.1.0's pom.xml, Spark is using parquet v1.4.3, will this >>>>>> be problematic? >>>>>> >>>>>> One of the weird behavior is that another program read and sort data >>>>>> read from the same parquet files and it works fine. The only difference >>>>>> seems the buggy program uses foreachPartition and the working program >>>>>> uses >>>>>> map. >>>>>> >>>>>> Here's the full stacktrace: >>>>>> >>>>>> "Executor task launch worker-3" >>>>>> java.lang.Thread.State: RUNNABLE >>>>>> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) >>>>>> at >>>>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:257) >>>>>> at >>>>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) >>>>>> at >>>>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) >>>>>> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) >>>>>> at >>>>>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) >>>>>> at >>>>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157) >>>>>> at >>>>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171) >>>>>> at >>>>>> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102) >>>>>> at >>>>>> org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:173) >>>>>> at >>>>>> org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:138) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:683) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:739) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:796) >>>>>> at >>>>>> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:837) >>>>>> at java.io.DataInputStream.readFully(DataInputStream.java:195) >>>>>> at java.io.DataInputStream.readFully(DataInputStream.java:169) >>>>>> at >>>>>> parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:599) >>>>>> at >>>>>> parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:360) >>>>>> at >>>>>> parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:100) >>>>>> at >>>>>> parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:172) >>>>>> at >>>>>> parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:130) >>>>>> at >>>>>> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:139) >>>>>> at >>>>>> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) >>>>>> at >>>>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>>>>> at >>>>>> scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) >>>>>> at >>>>>> scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388) >>>>>> at >>>>>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) >>>>>> at >>>>>> scala.collection.Iterator$GroupedIterator.takeDestructively(Iterator.scala:913) >>>>>> at >>>>>> scala.collection.Iterator$GroupedIterator.go(Iterator.scala:929) >>>>>> at >>>>>> scala.collection.Iterator$GroupedIterator.fill(Iterator.scala:969) >>>>>> at >>>>>> scala.collection.Iterator$GroupedIterator.hasNext(Iterator.scala:972) >>>>>> at scala.collection.Iterator$class.foreach(Iterator.scala:727) >>>>>> at >>>>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>>>>> at >>>>>> com.paypal.risk.rds.dragon.storage.hbase.HbaseRDDBatch$$anonfun$batchInsertEdges$3.apply(HbaseRDDBatch.scala:179) >>>>>> at >>>>>> com.paypal.risk.rds.dragon.storage.hbase.HbaseRDDBatch$$anonfun$batchInsertEdges$3.apply(HbaseRDDBatch.scala:167) >>>>>> at >>>>>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:767) >>>>>> at >>>>>> org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:767) >>>>>> at >>>>>> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1103) >>>>>> at >>>>>> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1103) >>>>>> at >>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) >>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:54) >>>>>> at >>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>>> at >>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>>> at java.lang.Thread.run(Thread.java:724) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Jianshi Huang >>>>>> >>>>>> LinkedIn: jianshi >>>>>> Twitter: @jshuang >>>>>> Github & Blog: http://huangjs.github.com/ >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Jianshi Huang >>>> >>>> LinkedIn: jianshi >>>> Twitter: @jshuang >>>> Github & Blog: http://huangjs.github.com/ >>>> >>> >>> >> > > > -- > Jianshi Huang > > LinkedIn: jianshi > Twitter: @jshuang > Github & Blog: http://huangjs.github.com/ > -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github & Blog: http://huangjs.github.com/