Re: saveAsTextFile and tmp files generations in tasks

2015-04-15 Thread Imran Rashid
The temp file creation is controlled by a hadoop OutputCommitter, which is normally FileOutputCommitter by default. Its used in SparkHadoopWriter (which in turn is used by PairRDDFunctions.saveAsHadoopDataset). You could change the output committer to not use tmp files (eg. use this from Aaron

Re: Query regarding infering data types in pyspark

2015-04-15 Thread Davies Liu
It does not work now, could you file a jira for it? On Wed, Apr 15, 2015 at 9:29 AM, Suraj Shetiya surajshet...@gmail.com wrote: Thank you :) That worked. I had another query regarding date being used as filter. With the new df which has the column cast as date I am unable to apply a filter

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-15 Thread Joseph Bradley
+1 On Wed, Apr 15, 2015 at 5:40 PM, Tom Graves tgraves...@yahoo.com.invalid wrote: +1 tested on spark on yarn on hadoop 2.6 cluster with security. Tom On Sunday, April 5, 2015 6:25 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as

Re: [VOTE] Release Apache Spark 1.2.2

2015-04-15 Thread Sean McNamara
Ran tests on OS X +1 Sean On Apr 14, 2015, at 10:59 PM, Patrick Wendell pwend...@gmail.com wrote: I'd like to close this vote to coincide with the 1.3.1 release, however, it would be great to have more people test this release first. I'll leave it open for a bit longer and see if others

Re: saveAsTextFile and tmp files generations in tasks

2015-04-15 Thread Gil Vernik
Thanks a lot for the info on it. Does this explains 2 temp file generation per each task ( one temp that is renamed to another )? I understand why there is one temp file per task, but still not sure why there were 2 per each task, Thanks Gil. From: Imran Rashid iras...@cloudera.com To: