conditional dataset output

lars . bachmann Thu, 08 Dec 2016 07:40:03 -0800

Hi,

let's assume I have a dataset and depending on the input data anddifferent filter operations this dataset can be empty. Now I want tooutput the dataset to HD, but I want that files are only created if thedataset is not empty. If the dataset is empty I don't want any files.The default way: dataset.write(...) will always create as many files asthe parallelism of this operator is configured - in case of an emptydataset all files would be empty as well. I thought about doingsomething like:


if (dataset.count() > 0) {
   dataset.write(...)
}

but I don't think thats the way to go, because dataset.count() triggersa execution of the (sub)program.

Is there a simple way how to avoid creating empty files for emptydatasets?


Regards,

Lars

conditional dataset output

Reply via email to