I think you could use `repartition` to make sure there would be no empty partitions.
You could also try `coalesce` to combine partitions , but it can't make sure there are no more empty partitions. Best Regards, Yi Tian tianyi.asiai...@gmail.com On Oct 18, 2014, at 20:30, jan.zi...@centrum.cz wrote: > Hi, > > I am developing program using Spark where I am using filter such as: > > cleanedData = distData.map(json_extractor.extract_json).filter(lambda x: x != > None and x != '') > cleanedData.saveAsTextFile(sys.argv[3]) > > > It happens to me that there is saved lot of empty files (probably from those > partitions that should have been filtered out). Is there some way, how to > prevent Spark from saving these empty files? > > Thank you in advance for any help. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org