Hi Navis, thanks for pointing this one out! It would for sure be one way around it. In my use case it would require adding this extra where clause for particular tables. I guess I can create a view to make this more transparent.
Do you know why my import format is not used on the hadoop side? I'm sure this is by design but I wanted to understand why. Also, are you aware of any discussions supporting partitioning on file level rather than on directory level? Thanks, Petter 2014-03-04 7:32 GMT+01:00 Navis류승우 <[email protected]>: > You might be interested in https://issues.apache.org/jira/browse/HIVE-1662, > using predicate on file-name vc to filter out inputs. For example, > > select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME rlike > '.*/srcbucket2[03].txt' > > But it's not committed, yet. > > Thanks, > > > > 2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) < > [email protected]>: > > Hi, >> >> I have implemented a few custom input formats in Hive. It seems like only >> the getRecordReader() method of these input formats is being called though, >> i.e. there is no way of overriding the listStatus() method and provide a >> custom input filter. The only way I can set a file filter is by using the >> mapred.input.pathFilter.class property which leaves me at using the same >> filter for all input formats. I would like a way to specify a filter per >> input format. Is there a way around this limitation? >> >> I am on Hive 0.10. I think I have seen that when running jobs locally >> that the listStatus() method of my input formats are called but not when >> handing over the job to a hadoop cluster. It seems like the listStatus is >> called on hadoops CombineFileInputFormat instead. >> >> Thanks, >> Petter >> > >
