You might be interested in https://issues.apache.org/jira/browse/HIVE-1662, using predicate on file-name vc to filter out inputs. For example,
select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME rlike '.*/srcbucket2[03].txt' But it's not committed, yet. Thanks, 2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) < [email protected]>: > Hi, > > I have implemented a few custom input formats in Hive. It seems like only > the getRecordReader() method of these input formats is being called though, > i.e. there is no way of overriding the listStatus() method and provide a > custom input filter. The only way I can set a file filter is by using the > mapred.input.pathFilter.class property which leaves me at using the same > filter for all input formats. I would like a way to specify a filter per > input format. Is there a way around this limitation? > > I am on Hive 0.10. I think I have seen that when running jobs locally that > the listStatus() method of my input formats are called but not when handing > over the job to a hadoop cluster. It seems like the listStatus is called on > hadoops CombineFileInputFormat instead. > > Thanks, > Petter >
