I have a complex script which is working fine on pig 0.8.1 There are 200+ relations which have been used over and over in pig script. At the end I am storing selected 20+ to some output location using store command.
During the execution of same script on same set of data on PIG 0.11 its failing for the error TOO MANY FILES OPENED. I have checked the hadoop configuration, The max thread to read from network is quite higher than required (50k+) and none of my relation is creating so many output files. When I looked at the log I see logs like below repeated again and again for thousands of time before it failed due to TOO MANY FILES OPENED error. 2013-01-02 01:14:26,521 INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat: Total input paths to process : 1 2013-01-02 01:14:26,521 INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil: Total input paths to process : 1 2013-01-02 01:14:26,524 WARN org.apache.pig.impl.builtin.ReadScalars: No scalar field to read, returning null Job which is failing due to above reason is trying to read output of some intermediate job. I copied the temp output to hdfs and read it through InterStorage. It worked fine. Without failing I am able to read whole file. It did not give above message single time when I read it through test script. I would like to add one more thing here on temp output. The schema of the temp part file does not match any of the schema I have define/used in my piglet. (think must be created by PIG engine for internal optimization) At the last I tried each and individual store command. I found that there is a small relation XYZ, which just hold reference to single raw, if I omit that particular store than the pig script runs fine. However if I run it along with other store than script start to generate above logs and start failing. I executed store command for XYZ independently, it worked fine. All other store except XYZ, they worked fine. But when I try to execute them all, it fails. Here I am not able to debug it out which particular code could get converted which could cause above error. Or how can I debug it further to find out the root cause of No Scalar field to read, returning null. Any help/suggestion would be appreciated. -- Regards, Hitesh Patel Bardoli Fix:- 91 2622 222299 Mob:- 91 942 835 7400
