Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2
Hi, Calvin, I am running 24GB data Spark KMeans in a c3.2xlarge AWS instance with 30GB physical memory. Spark will cache data off-heap to Tachyon, the input data is also stored in Tachyon. Tachyon is configured to use 15GB memory, and use tired store. Tachyon underFS is /tmp. The only configuration I've changed is Tachyon data block size. Above experiment is a part of a research project. Best Regards, Jia On Thursday, January 28, 2016 at 9:11:19 PM UTC-6, Calvin Jia wrote: > > Hi, > > Thanks for the detailed information. How large is the dataset you are > running against? Also did you change any Tachyon configurations? > > Thanks, > Calvin > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2
Hey, Jia Zou I'm curious about this exception, the error log you showed that the exception is related to unlockBlock, could you upload your full master.log and worker.log under tachyon/logs directory? Best, Cheng 在 2016年1月29日星期五 UTC+8上午11:11:19,Calvin Jia写道: > > Hi, > > Thanks for the detailed information. How large is the dataset you are > running against? Also did you change any Tachyon configurations? > > Thanks, > Calvin > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2
Hi, Thanks for the detailed information. How large is the dataset you are running against? Also did you change any Tachyon configurations? Thanks, Calvin - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2
Dears, I keep getting below exception when using Spark 1.6.0 on top of Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH. Any suggestions will be appreciated, thanks! = Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 13 in stage 0.0 failed 4 times, most recent failure: Lost task 13.3 in stage 0.0 (TID 33, ip-10-73-198-35.ec2.internal): java.io.IOException: tachyon.org.apache.thrift.transport.TTransportException at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:416) at tachyon.client.block.LocalBlockInStream.close(LocalBlockInStream.java:87) at tachyon.client.file.FileInStream.close(FileInStream.java:105) at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:171) at java.io.DataInputStream.readInt(DataInputStream.java:388) at org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2325) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2356) at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2493) at org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:246) at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:193) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at org.apache.spark.rdd.RDD$$anonfun$zip$1$$anonfun$apply$31$$anon$1.hasNext(RDD.scala:851) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143) at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: tachyon.org.apache.thrift.transport.TTransportException at tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at tachyon.org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at tachyon.org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at tachyon.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at tachyon.thrift.WorkerService$Client.recv_unlockBlock(WorkerService.java:455) at tachyon.thrift.WorkerService$Client.unlockBlock(WorkerService.java:441) at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:413) ... 28 more
Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2
BTW. The tachyon worker log says following: 2015-12-27 01:33:44,599 ERROR WORKER_LOGGER (WorkerBlockMasterClient.java:getId) - java.net.SocketException: Connection reset org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) at org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) at org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at tachyon.thrift.BlockMasterService$Client.recv_workerGetWorkerId(BlockMasterService.java:235) at tachyon.thrift.BlockMasterService$Client.workerGetWorkerId(BlockMasterService.java:222) at tachyon.client.WorkerBlockMasterClient.getId(WorkerBlockMasterClient.java:103) at tachyon.worker.WorkerIdRegistry.registerWithBlockMaster(WorkerIdRegistry.java:59) at tachyon.worker.block.BlockWorker.(BlockWorker.java:200) at tachyon.worker.TachyonWorker.main(TachyonWorker.java:42) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:196) at java.net.SocketInputStream.read(SocketInputStream.java:122) at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 15 more On Wed, Jan 27, 2016 at 5:02 AM, Jia Zouwrote: > Dears, I keep getting below exception when using Spark 1.6.0 on top of > Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH. > > Any suggestions will be appreciated, thanks! > > = > > Exception in thread "main" org.apache.spark.SparkException: Job aborted > due to stage failure: Task 13 in stage 0.0 failed 4 times, most recent > failure: Lost task 13.3 in stage 0.0 (TID 33, > ip-10-73-198-35.ec2.internal): java.io.IOException: > tachyon.org.apache.thrift.transport.TTransportException > > at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:416) > > at > tachyon.client.block.LocalBlockInStream.close(LocalBlockInStream.java:87) > > at tachyon.client.file.FileInStream.close(FileInStream.java:105) > > at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:171) > > at java.io.DataInputStream.readInt(DataInputStream.java:388) > > at > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2325) > > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2356) > > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2493) > > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) > > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:246) > > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208) > > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) > > at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:193) > > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > > at > org.apache.spark.rdd.RDD$$anonfun$zip$1$$anonfun$apply$31$$anon$1.hasNext(RDD.scala:851) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) > > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595) > > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143) > > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143) > > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) > > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:89) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at >
Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2
BTW. At the end of the log, I also find a lot of errors like below: = 2016-01-27 11:47:18,515 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run) - Error occurred during processing of message. java.lang.NullPointerException at tachyon.worker.block.BlockLockManager.unlockBlock(BlockLockManager.java:142) at tachyon.worker.block.TieredBlockStore.unlockBlock(TieredBlockStore.java:148) at tachyon.worker.block.BlockDataManager.unlockBlock(BlockDataManager.java:476) at tachyon.worker.block.BlockServiceHandler.unlockBlock(BlockServiceHandler.java:232) at tachyon.thrift.WorkerService$Processor$unlockBlock.getResult(WorkerService.java:1150) at tachyon.thrift.WorkerService$Processor$unlockBlock.getResult(WorkerService.java:1135) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Wed, Jan 27, 2016 at 5:53 AM, Jia Zouwrote: > BTW. The tachyon worker log says following: > > > > 2015-12-27 01:33:44,599 ERROR WORKER_LOGGER > (WorkerBlockMasterClient.java:getId) - java.net.SocketException: Connection > reset > > org.apache.thrift.transport.TTransportException: java.net.SocketException: > Connection reset > > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) > > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) > > at > org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > > at > org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > > at > org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) > > at > org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) > > at > org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) > > at > org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) > > at > org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135) > > at > org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > > at > tachyon.thrift.BlockMasterService$Client.recv_workerGetWorkerId(BlockMasterService.java:235) > > at > tachyon.thrift.BlockMasterService$Client.workerGetWorkerId(BlockMasterService.java:222) > > at > tachyon.client.WorkerBlockMasterClient.getId(WorkerBlockMasterClient.java:103) > > at > tachyon.worker.WorkerIdRegistry.registerWithBlockMaster(WorkerIdRegistry.java:59) > > at tachyon.worker.block.BlockWorker.(BlockWorker.java:200) > > at tachyon.worker.TachyonWorker.main(TachyonWorker.java:42) > > Caused by: java.net.SocketException: Connection reset > > at java.net.SocketInputStream.read(SocketInputStream.java:196) > > at java.net.SocketInputStream.read(SocketInputStream.java:122) > > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235) > > at java.io.BufferedInputStream.read1(BufferedInputStream.java:275) > > at java.io.BufferedInputStream.read(BufferedInputStream.java:334) > > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) > > ... 15 more > > On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou wrote: > >> Dears, I keep getting below exception when using Spark 1.6.0 on top of >> Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH. >> >> Any suggestions will be appreciated, thanks! >> >> = >> >> Exception in thread "main" org.apache.spark.SparkException: Job aborted >> due to stage failure: Task 13 in stage 0.0 failed 4 times, most recent >> failure: Lost task 13.3 in stage 0.0 (TID 33, >> ip-10-73-198-35.ec2.internal): java.io.IOException: >> tachyon.org.apache.thrift.transport.TTransportException >> >> at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:416) >> >> at >> tachyon.client.block.LocalBlockInStream.close(LocalBlockInStream.java:87) >> >> at tachyon.client.file.FileInStream.close(FileInStream.java:105) >> >> at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:171) >> >> at java.io.DataInputStream.readInt(DataInputStream.java:388) >> >> at >> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2325) >> >> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2356) >> >> at
Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2
BTW. the error happens when configure Spark to read input file from Tachyon like following: /home/ubuntu/spark-1.6.0/bin/spark-submit --properties-file /home/ubuntu/HiBench/report/kmeans/spark/java/conf/sparkbench/spark.conf --class org.apache.spark.examples.mllib.JavaKMeans --master spark://ip -10-73-198-35:7077 /home/ubuntu/HiBench/src/sparkbench/target/sparkbench-5.0-SNAPSHOT-MR2-spark1.5-jar-with-dependencies.jar tachyon://localhost:19998/Kmeans/Input/samples 10 5 On Wed, Jan 27, 2016 at 5:02 AM, Jia Zouwrote: > Dears, I keep getting below exception when using Spark 1.6.0 on top of > Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH. > > Any suggestions will be appreciated, thanks! > > = > > Exception in thread "main" org.apache.spark.SparkException: Job aborted > due to stage failure: Task 13 in stage 0.0 failed 4 times, most recent > failure: Lost task 13.3 in stage 0.0 (TID 33, > ip-10-73-198-35.ec2.internal): java.io.IOException: > tachyon.org.apache.thrift.transport.TTransportException > > at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:416) > > at > tachyon.client.block.LocalBlockInStream.close(LocalBlockInStream.java:87) > > at tachyon.client.file.FileInStream.close(FileInStream.java:105) > > at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:171) > > at java.io.DataInputStream.readInt(DataInputStream.java:388) > > at > org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2325) > > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2356) > > at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2493) > > at > org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82) > > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:246) > > at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208) > > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) > > at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:193) > > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > > at > org.apache.spark.rdd.RDD$$anonfun$zip$1$$anonfun$apply$31$$anon$1.hasNext(RDD.scala:851) > > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369) > > at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595) > > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143) > > at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143) > > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) > > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > > at org.apache.spark.scheduler.Task.run(Task.scala:89) > > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: tachyon.org.apache.thrift.transport.TTransportException > > at > tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > > at > tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) > > at > tachyon.org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129) > > at > tachyon.org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101) > > at > tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) > > at > tachyon.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) > > at > tachyon.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) > > at > tachyon.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) > > at > tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) > > at > tachyon.thrift.WorkerService$Client.recv_unlockBlock(WorkerService.java:455) > > at tachyon.thrift.WorkerService$Client.unlockBlock(WorkerService.java:441) > > at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:413) > > ... 28 more > > >