Hi there,

I have a pig script similar to the  one below.  When testing this on a cluster 
with an empty file, I see it taking ages to complete - It goes through all the 
commands and runs the jobs across the tracker.  In our test environment, we 
will (for various reasons) have an input directory with a file which is empty.  
Is there any way to tell pig to exit the script if there is no data available 
(or a command like EXIT IF data EMPTY?).  My UDF loader does return null as the 
tuple when it reads an empty file, so ideally I'd like it to stop processing.  
Is there any way to do this?

Thanks,
John.

define MyDataLoader com.example.test.MyDataLoader('config1');
raw = LOAD '$inputdir' USING MyDataLoader AS (date:chararray, 
values:bag{t:(policy:chararray)});

data1    = FOREACH raw GENERATE date, .....
data2    = GROUP data1 BY (date, .....
data3    = FOREACH data3 GENERATE group.date, .....

STORE data3 INTO 'MyDataStore'
</pre>****************************************************************************************<br>This
 email and any files transmitted with are confidential and intended solely for 
the<br>use of the individual or entity to whom they are addressed.  If you have 
received this<br>email in error then please delete it and notify the sender. Do 
not make a copy or forward<br>it to anyone.  This footnote also confirms that 
this email message has been swept for the<br>presence of computer 
viruses.<br><br>Adaptive Mobile Security Ltd, Ferry House, 48 Lower Mount 
Street, Dublin 2, Ireland<br>Directors: B. Collins, G. Maclachlan (UK), N. 
Grierson (UK), J. Ennis (UK), D. Summers (UK).<br>Registered in Ireland, 
Company No. 370343, VAT 
Reg.No.IE6390343O<br>****************************************************************************************</pre>

Reply via email to