Hi Navis,

thanks for pointing this one out! It would for sure be one way around it.
In my use case it would require adding this extra where clause for
particular tables. I guess I can create a view to make this more
transparent.

Do you know why my import format is not used on the hadoop side? I'm sure
this is by design but I wanted to understand why. Also, are you aware of
any discussions supporting partitioning on file level rather than on
directory level?

Thanks,
Petter


2014-03-04 7:32 GMT+01:00 Navis류승우 <[email protected]>:

> You might be interested in https://issues.apache.org/jira/browse/HIVE-1662,
> using predicate on file-name vc to filter out inputs. For example,
>
> select key,INPUT__FILE__NAME from srcbucket2 where INPUT__FILE__NAME rlike
> '.*/srcbucket2[03].txt'
>
> But it's not committed, yet.
>
> Thanks,
>
>
>
> 2014-03-03 23:14 GMT+09:00 Petter von Dolwitz (Hem) <
> [email protected]>:
>
> Hi,
>>
>> I have implemented a few custom input formats in Hive. It seems like only
>> the getRecordReader() method of these input formats is being called though,
>> i.e. there is no way of overriding the listStatus() method and provide a
>> custom input filter. The only way I can set a file filter is by using the
>> mapred.input.pathFilter.class property which leaves me at using the same
>> filter for all input formats. I would like a way to specify a filter per
>> input format. Is there a way around this limitation?
>>
>> I am on Hive 0.10. I think I have seen that when running jobs locally
>> that the listStatus() method of my input formats are called but not when
>> handing over the job to a hadoop cluster. It seems like the listStatus is
>> called on hadoops CombineFileInputFormat instead.
>>
>> Thanks,
>> Petter
>>
>
>

Reply via email to