You can override input format by set hive.input.format=xxx. But *HiveInputFormat have some internal works(predicates, io contexts, etc.) for hive. So it would not be easy to implement new one (or overriding some methods). But you can try.
I've though I saw an issue for supporting custom location provider for partitioned table, but cannot find it. Might be a bogus signal. Thanks, Navis 2014-03-05 0:00 GMT+09:00 Petter von Dolwitz (Hem) < [email protected]>: > Hi Navis, > > thanks for pointing this one out! It would for sure be one way around it. > In my use case it would require adding this extra where clause for > particular tables. I guess I can create a view to make this more > transparent. > > Do you know why my import format is not used on the hadoop side? I'm sure > this is by design but I wanted to understand why. Also, are you aware of > any discussions supporting partitioning on file level rather than on > directory level? > > Thanks, > Petter > > > 2014-03-04 7:32 GMT+01:00 Navis류승우 <[email protected]>: > > You might be interested in https://issues.apache.org/jira/browse/HIVE-1662, >> using predicate on file-name vc to filter out inputs. For example, >> >> select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME >> rlike '.*/srcbucket2[03].txt' >> >> But it's not committed, yet. >> >> Thanks, >> >> >> >> 2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) < >> [email protected]>: >> >> Hi, >>> >>> I have implemented a few custom input formats in Hive. It seems like >>> only the getRecordReader() method of these input formats is being called >>> though, i.e. there is no way of overriding the listStatus() method and >>> provide a custom input filter. The only way I can set a file filter is by >>> using the mapred.input.pathFilter.class property which leaves me at using >>> the same filter for all input formats. I would like a way to specify a >>> filter per input format. Is there a way around this limitation? >>> >>> I am on Hive 0.10. I think I have seen that when running jobs locally >>> that the listStatus() method of my input formats are called but not when >>> handing over the job to a hadoop cluster. It seems like the listStatus is >>> called on hadoops CombineFileInputFormat instead. >>> >>> Thanks, >>> Petter >>> >> >> >
