Hi there,
I have a pig script similar to the one below. When testing this on a cluster
with an empty file, I see it taking ages to complete - It goes through all the
commands and runs the jobs across the tracker. In our test environment, we
will (for various reasons) have an input directory with a file which is empty.
Is there any way to tell pig to exit the script if there is no data available
(or a command like EXIT IF data EMPTY?). My UDF loader does return null as the
tuple when it reads an empty file, so ideally I'd like it to stop processing.
Is there any way to do this?
Thanks,
John.
define MyDataLoader com.example.test.MyDataLoader('config1');
raw = LOAD '$inputdir' USING MyDataLoader AS (date:chararray,
values:bag{t:(policy:chararray)});
data1 = FOREACH raw GENERATE date, .....
data2 = GROUP data1 BY (date, .....
data3 = FOREACH data3 GENERATE group.date, .....
STORE data3 INTO 'MyDataStore'
</pre>****************************************************************************************<br>This
email and any files transmitted with are confidential and intended solely for
the<br>use of the individual or entity to whom they are addressed. If you have
received this<br>email in error then please delete it and notify the sender. Do
not make a copy or forward<br>it to anyone. This footnote also confirms that
this email message has been swept for the<br>presence of computer
viruses.<br><br>Adaptive Mobile Security Ltd, Ferry House, 48 Lower Mount
Street, Dublin 2, Ireland<br>Directors: B. Collins, G. Maclachlan (UK), N.
Grierson (UK), J. Ennis (UK), D. Summers (UK).<br>Registered in Ireland,
Company No. 370343, VAT
Reg.No.IE6390343O<br>****************************************************************************************</pre>