subject:"Re\: Spark will process _temporary folder on S3 is very slow and always cause failure"

RE: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-20 Thread Shuai Zheng

From: Aaron Davidson [mailto:ilike...@gmail.com] Sent: Tuesday, March 17, 2015 3:06 PM To: Imran Rashid Cc: Shuai Zheng; user@spark.apache.org Subject: Re: Spark will process _temporary folder on S3 is very slow and always cause failure Actually, this is the more relevant JIRA (which

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-17 Thread Aaron Davidson

Actually, this is the more relevant JIRA (which is resolved): https://issues.apache.org/jira/browse/SPARK-3595 6352 is about saveAsParquetFile, which is not in use here. Here is a DirectOutputCommitter implementation: https://gist.github.com/aarondav/c513916e72101bbe14ec and it can be configured

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-17 Thread Imran Rashid

I'm not super familiar w/ S3, but I think the issue is that you want to use a different output committers with "object" stores, that don't have a simple move operation. There have been a few other threads on S3 & outputcommitters. I think the most relevant for you is most probably this open JIRA:

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-16 Thread Akhil Das

If you use fileStream, there's an option to filter out files. In your case you can easily create a filter to remove _temporary files. In that case, you will have to move your codes inside foreachRDD of the dstream since the application will become a streaming app. Thanks Best Regards On Sat, Mar

RE: Spark will process _temporary folder on S3 is very slow and always cause failure

2015-03-13 Thread Shuai Zheng

And one thing forget to mention, even I have this exception and the result is not well format in my target folder (part of them are there, rest are under different folder structure of _tempoary folder). In the webUI of spark-shell, it is still be marked as successful step. I think this is a bug?

RE: Spark will process _temporary folder on S3 is very slow and always cause failure

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

Re: Spark will process _temporary folder on S3 is very slow and always cause failure

RE: Spark will process _temporary folder on S3 is very slow and always cause failure

5 matches

Site Navigation

Mail list logo

Footer information