I think you could use `repartition` to make sure there would be no empty 
partitions.

You could also try `coalesce` to combine partitions , but it can't make sure 
there are no more empty partitions.

Best Regards,

Yi Tian
tianyi.asiai...@gmail.com




On Oct 18, 2014, at 20:30, jan.zi...@centrum.cz wrote:

> Hi,
> 
> I am developing program using Spark where I am using filter such as:
>  
> cleanedData = distData.map(json_extractor.extract_json).filter(lambda x: x != 
> None and x != '')
> cleanedData.saveAsTextFile(sys.argv[3])
>  
>  
> It happens to me that there is saved lot of empty files (probably from those 
> partitions that should have been filtered out). Is there some way, how to 
> prevent Spark from saving these empty files?
>  
> Thank you in advance for any help.
>  
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to