Re: Limited capabilities of a custom input format

Navis류승우 Tue, 04 Mar 2014 22:02:23 -0800

You can override input format by set hive.input.format=xxx. But
*HiveInputFormat have some internal works(predicates, io contexts, etc.)
for hive. So it would not be easy to implement new one (or overriding some
methods). But you can try.


I've though I saw an issue for supporting custom location provider for
partitioned table, but cannot find it. Might be a bogus signal.

Thanks,
Navis


2014-03-05 0:00 GMT+09:00 Petter von Dolwitz (Hem) <
[email protected]>:

> Hi Navis,
>
> thanks for pointing this one out! It would for sure be one way around it.
> In my use case it would require adding this extra where clause for
> particular tables. I guess I can create a view to make this more
> transparent.
>
> Do you know why my import format is not used on the hadoop side? I'm sure
> this is by design but I wanted to understand why. Also, are you aware of
> any discussions supporting partitioning on file level rather than on
> directory level?
>
> Thanks,
> Petter
>
>
> 2014-03-04 7:32 GMT+01:00 Navis류승우 <[email protected]>:
>
> You might be interested in https://issues.apache.org/jira/browse/HIVE-1662,
>> using predicate on file-name vc to filter out inputs. For example,
>>
>> select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME
>> rlike '.*/srcbucket2[03].txt'
>>
>> But it's not committed, yet.
>>
>> Thanks,
>>
>>
>>
>> 2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) <
>> [email protected]>:
>>
>> Hi,
>>>
>>> I have implemented a few custom input formats in Hive. It seems like
>>> only the getRecordReader() method of these input formats is being called
>>> though, i.e. there is no way of overriding the listStatus() method and
>>> provide a custom input filter. The only way I can set a file filter is by
>>> using the mapred.input.pathFilter.class property which leaves me at using
>>> the same filter for all input formats. I would like a way to specify a
>>> filter per input format. Is there a way around this limitation?
>>>
>>> I am on Hive 0.10. I think I have seen that when running jobs locally
>>> that the listStatus() method of my input formats are called but not when
>>> handing over the job to a hadoop cluster. It seems like the listStatus is
>>> called on hadoops CombineFileInputFormat instead.
>>>
>>> Thanks,
>>> Petter
>>>
>>
>>
>

Re: Limited capabilities of a custom input format

Reply via email to