I'm running into an issue with pig 0.9.1. My top-level data directory contains several files and directories with restricted permissions, and my LoadFunc and input format ignore these directories if the user does not have permission to read them. Unfortunately pig's execution engine still throws an exception.

Example:

$ hadoop fs -ls /data
Found 2 items
drwxr-xr-x   - owner users            0 2011-11-16 06:47 /data/readable
drwxr-x---   - owner secure          0 2011-11-16 06:48 /data/secure

The /data/secure directory is readable only by users in the 'secure' group. Non-secure users encounter the following pig exception even though the loader and input format do not touch secure data:

REGISTER my-jar;
data = LOAD /data USING myLoader();
(do something..)

Caused by: org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=<removed>, access=READ_EXECUTE, inode="secure":owner:secure:rwxr-x---
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
        at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:669)
at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:280) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:791) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getPathLength(JobControlCompiler.java:794) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getTotalInputFileSize(JobControlCompiler.java:779) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.estimateNumberOfReducers(JobControlCompiler.java:739) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:587)
        ... 12 more


I think Pig should probably catch this exception and ignore unreadable directories when estimating the number of reducers.

Thanks,
--Adam

Reply via email to