Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-02-01 Thread Jia Zou
Hi, Calvin, I am running  24GB data Spark KMeans in a c3.2xlarge AWS 
instance with 30GB physical memory.
Spark will cache data off-heap to Tachyon, the input data is also stored in 
Tachyon.
Tachyon is configured to use 15GB memory, and use tired store.
Tachyon underFS is /tmp.

The only configuration I've changed is Tachyon data block size.

Above experiment is a part of a research project.

Best Regards,
Jia

On Thursday, January 28, 2016 at 9:11:19 PM UTC-6, Calvin Jia wrote:
>
> Hi,
>
> Thanks for the detailed information. How large is the dataset you are 
> running against? Also did you change any Tachyon configurations?
>
> Thanks,
> Calvin
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-29 Thread cc
Hey, Jia Zou

I'm curious about this exception, the error log you showed that the 
exception is related to unlockBlock, could you upload your full master.log 
and worker.log under tachyon/logs directory? 

Best,
Cheng

在 2016年1月29日星期五 UTC+8上午11:11:19,Calvin Jia写道:
>
> Hi,
>
> Thanks for the detailed information. How large is the dataset you are 
> running against? Also did you change any Tachyon configurations?
>
> Thanks,
> Calvin
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-28 Thread Calvin Jia
Hi,

Thanks for the detailed information. How large is the dataset you are 
running against? Also did you change any Tachyon configurations?

Thanks,
Calvin

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
Dears, I keep getting below exception when using Spark 1.6.0 on top of
Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH.

Any suggestions will be appreciated, thanks!

=

Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 13 in stage 0.0 failed 4 times, most recent failure:
Lost task 13.3 in stage 0.0 (TID 33, ip-10-73-198-35.ec2.internal):
java.io.IOException: tachyon.org.apache.thrift.transport.TTransportException

at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:416)

at tachyon.client.block.LocalBlockInStream.close(LocalBlockInStream.java:87)

at tachyon.client.file.FileInStream.close(FileInStream.java:105)

at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:171)

at java.io.DataInputStream.readInt(DataInputStream.java:388)

at
org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2325)

at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2356)

at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2493)

at
org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)

at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:246)

at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208)

at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)

at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)

at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:193)

at
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)

at
org.apache.spark.rdd.RDD$$anonfun$zip$1$$anonfun$apply$31$$anon$1.hasNext(RDD.scala:851)

at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)

at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)

at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143)

at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: tachyon.org.apache.thrift.transport.TTransportException

at
tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

at
tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at
tachyon.org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)

at
tachyon.org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)

at
tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at
tachyon.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)

at
tachyon.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)

at
tachyon.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)

at
tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)

at
tachyon.thrift.WorkerService$Client.recv_unlockBlock(WorkerService.java:455)

at tachyon.thrift.WorkerService$Client.unlockBlock(WorkerService.java:441)

at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:413)

... 28 more


Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
BTW. The tachyon worker log says following:



2015-12-27 01:33:44,599 ERROR WORKER_LOGGER
(WorkerBlockMasterClient.java:getId) - java.net.SocketException: Connection
reset

org.apache.thrift.transport.TTransportException: java.net.SocketException:
Connection reset

at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)

at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)

at
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)

at
org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)

at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)

at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)

at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)

at
org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135)

at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)

at
tachyon.thrift.BlockMasterService$Client.recv_workerGetWorkerId(BlockMasterService.java:235)

at
tachyon.thrift.BlockMasterService$Client.workerGetWorkerId(BlockMasterService.java:222)

at
tachyon.client.WorkerBlockMasterClient.getId(WorkerBlockMasterClient.java:103)

at
tachyon.worker.WorkerIdRegistry.registerWithBlockMaster(WorkerIdRegistry.java:59)

at tachyon.worker.block.BlockWorker.(BlockWorker.java:200)

at tachyon.worker.TachyonWorker.main(TachyonWorker.java:42)

Caused by: java.net.SocketException: Connection reset

at java.net.SocketInputStream.read(SocketInputStream.java:196)

at java.net.SocketInputStream.read(SocketInputStream.java:122)

at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)

at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)

at java.io.BufferedInputStream.read(BufferedInputStream.java:334)

at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)

... 15 more

On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou  wrote:

> Dears, I keep getting below exception when using Spark 1.6.0 on top of
> Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH.
>
> Any suggestions will be appreciated, thanks!
>
> =
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 13 in stage 0.0 failed 4 times, most recent
> failure: Lost task 13.3 in stage 0.0 (TID 33,
> ip-10-73-198-35.ec2.internal): java.io.IOException:
> tachyon.org.apache.thrift.transport.TTransportException
>
> at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:416)
>
> at
> tachyon.client.block.LocalBlockInStream.close(LocalBlockInStream.java:87)
>
> at tachyon.client.file.FileInStream.close(FileInStream.java:105)
>
> at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:171)
>
> at java.io.DataInputStream.readInt(DataInputStream.java:388)
>
> at
> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2325)
>
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2356)
>
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2493)
>
> at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
>
> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:246)
>
> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208)
>
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
>
> at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:193)
>
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>
> at
> org.apache.spark.rdd.RDD$$anonfun$zip$1$$anonfun$apply$31$$anon$1.hasNext(RDD.scala:851)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
>
> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
>
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143)
>
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> 

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
BTW. At the end of the log, I also find a lot of errors like below:

=

2016-01-27 11:47:18,515 ERROR server.TThreadPoolServer
(TThreadPoolServer.java:run) - Error occurred during processing of message.

java.lang.NullPointerException

at
tachyon.worker.block.BlockLockManager.unlockBlock(BlockLockManager.java:142)

at
tachyon.worker.block.TieredBlockStore.unlockBlock(TieredBlockStore.java:148)

at
tachyon.worker.block.BlockDataManager.unlockBlock(BlockDataManager.java:476)

at
tachyon.worker.block.BlockServiceHandler.unlockBlock(BlockServiceHandler.java:232)

at
tachyon.thrift.WorkerService$Processor$unlockBlock.getResult(WorkerService.java:1150)

at
tachyon.thrift.WorkerService$Processor$unlockBlock.getResult(WorkerService.java:1135)

at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)

at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)

at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)


On Wed, Jan 27, 2016 at 5:53 AM, Jia Zou  wrote:

> BTW. The tachyon worker log says following:
>
> 
>
> 2015-12-27 01:33:44,599 ERROR WORKER_LOGGER
> (WorkerBlockMasterClient.java:getId) - java.net.SocketException: Connection
> reset
>
> org.apache.thrift.transport.TTransportException: java.net.SocketException:
> Connection reset
>
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
>
> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>
> at
> org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
>
> at
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
>
> at
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
>
> at
> org.apache.thrift.protocol.TProtocolDecorator.readMessageBegin(TProtocolDecorator.java:135)
>
> at
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>
> at
> tachyon.thrift.BlockMasterService$Client.recv_workerGetWorkerId(BlockMasterService.java:235)
>
> at
> tachyon.thrift.BlockMasterService$Client.workerGetWorkerId(BlockMasterService.java:222)
>
> at
> tachyon.client.WorkerBlockMasterClient.getId(WorkerBlockMasterClient.java:103)
>
> at
> tachyon.worker.WorkerIdRegistry.registerWithBlockMaster(WorkerIdRegistry.java:59)
>
> at tachyon.worker.block.BlockWorker.(BlockWorker.java:200)
>
> at tachyon.worker.TachyonWorker.main(TachyonWorker.java:42)
>
> Caused by: java.net.SocketException: Connection reset
>
> at java.net.SocketInputStream.read(SocketInputStream.java:196)
>
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
>
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
>
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
>
> ... 15 more
>
> On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou  wrote:
>
>> Dears, I keep getting below exception when using Spark 1.6.0 on top of
>> Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH.
>>
>> Any suggestions will be appreciated, thanks!
>>
>> =
>>
>> Exception in thread "main" org.apache.spark.SparkException: Job aborted
>> due to stage failure: Task 13 in stage 0.0 failed 4 times, most recent
>> failure: Lost task 13.3 in stage 0.0 (TID 33,
>> ip-10-73-198-35.ec2.internal): java.io.IOException:
>> tachyon.org.apache.thrift.transport.TTransportException
>>
>> at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:416)
>>
>> at
>> tachyon.client.block.LocalBlockInStream.close(LocalBlockInStream.java:87)
>>
>> at tachyon.client.file.FileInStream.close(FileInStream.java:105)
>>
>> at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:171)
>>
>> at java.io.DataInputStream.readInt(DataInputStream.java:388)
>>
>> at
>> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2325)
>>
>> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2356)
>>
>> at 

Re: TTransportException when using Spark 1.6.0 on top of Tachyon 0.8.2

2016-01-27 Thread Jia Zou
BTW. the error happens when configure Spark to read input file from Tachyon
like following:

/home/ubuntu/spark-1.6.0/bin/spark-submit  --properties-file
/home/ubuntu/HiBench/report/kmeans/spark/java/conf/sparkbench/spark.conf
--class org.apache.spark.examples.mllib.JavaKMeans --master spark://ip
-10-73-198-35:7077
/home/ubuntu/HiBench/src/sparkbench/target/sparkbench-5.0-SNAPSHOT-MR2-spark1.5-jar-with-dependencies.jar
tachyon://localhost:19998/Kmeans/Input/samples 10 5

On Wed, Jan 27, 2016 at 5:02 AM, Jia Zou  wrote:

> Dears, I keep getting below exception when using Spark 1.6.0 on top of
> Tachyon 0.8.2. Tachyon is 93% used and configured as CACHE_THROUGH.
>
> Any suggestions will be appreciated, thanks!
>
> =
>
> Exception in thread "main" org.apache.spark.SparkException: Job aborted
> due to stage failure: Task 13 in stage 0.0 failed 4 times, most recent
> failure: Lost task 13.3 in stage 0.0 (TID 33,
> ip-10-73-198-35.ec2.internal): java.io.IOException:
> tachyon.org.apache.thrift.transport.TTransportException
>
> at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:416)
>
> at
> tachyon.client.block.LocalBlockInStream.close(LocalBlockInStream.java:87)
>
> at tachyon.client.file.FileInStream.close(FileInStream.java:105)
>
> at tachyon.hadoop.HdfsFileInputStream.read(HdfsFileInputStream.java:171)
>
> at java.io.DataInputStream.readInt(DataInputStream.java:388)
>
> at
> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2325)
>
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2356)
>
> at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2493)
>
> at
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
>
> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:246)
>
> at org.apache.spark.rdd.HadoopRDD$$anon$1.getNext(HadoopRDD.scala:208)
>
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
>
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
>
> at scala.collection.Iterator$JoinIterator.hasNext(Iterator.scala:193)
>
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>
> at
> org.apache.spark.rdd.RDD$$anonfun$zip$1$$anonfun$apply$31$$anon$1.hasNext(RDD.scala:851)
>
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:369)
>
> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1595)
>
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143)
>
> at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:1143)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>
> at
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
>
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
> at java.lang.Thread.run(Thread.java:745)
>
> Caused by: tachyon.org.apache.thrift.transport.TTransportException
>
> at
> tachyon.org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
>
> at
> tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
> tachyon.org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
>
> at
> tachyon.org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
>
> at
> tachyon.org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
>
> at
> tachyon.org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
>
> at
> tachyon.org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
>
> at
> tachyon.org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
>
> at
> tachyon.org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
>
> at
> tachyon.thrift.WorkerService$Client.recv_unlockBlock(WorkerService.java:455)
>
> at tachyon.thrift.WorkerService$Client.unlockBlock(WorkerService.java:441)
>
> at tachyon.worker.WorkerClient.unlockBlock(WorkerClient.java:413)
>
> ... 28 more
>
>
>