Re: Limited capabilities of a custom input format

Petter von Dolwitz (Hem) Tue, 04 Mar 2014 07:01:18 -0800

Hi Navis,

thanks for pointing this one out! It would for sure be one way around it.
In my use case it would require adding this extra where clause for
particular tables. I guess I can create a view to make this more
transparent.


Do you know why my import format is not used on the hadoop side? I'm sure
this is by design but I wanted to understand why. Also, are you aware of
any discussions supporting partitioning on file level rather than on
directory level?

Thanks,
Petter


2014-03-04 7:32 GMT+01:00 Navis류승우 <[email protected]>:

> You might be interested in https://issues.apache.org/jira/browse/HIVE-1662,
> using predicate on file-name vc to filter out inputs. For example,
>
> select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME rlike
> '.*/srcbucket2[03].txt'
>
> But it's not committed, yet.
>
> Thanks,
>
>
>
> 2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) <
> [email protected]>:
>
> Hi,
>>
>> I have implemented a few custom input formats in Hive. It seems like only
>> the getRecordReader() method of these input formats is being called though,
>> i.e. there is no way of overriding the listStatus() method and provide a
>> custom input filter. The only way I can set a file filter is by using the
>> mapred.input.pathFilter.class property which leaves me at using the same
>> filter for all input formats. I would like a way to specify a filter per
>> input format. Is there a way around this limitation?
>>
>> I am on Hive 0.10. I think I have seen that when running jobs locally
>> that the listStatus() method of my input formats are called but not when
>> handing over the job to a hadoop cluster. It seems like the listStatus is
>> called on hadoops CombineFileInputFormat instead.
>>
>> Thanks,
>> Petter
>>
>
>

Re: Limited capabilities of a custom input format

Reply via email to