Re: Hive Cli ORC table read error with limit option

Biswajit Nayak Mon, 18 Apr 2016 17:45:07 -0700

Hi All,

I seriously need help on this aspect. Any reference or pointer to
troubleshoot or fix this, could be helpful.


Regards
Biswa

On Fri, Mar 25, 2016 at 11:24 PM, Biswajit Nayak <biswa...@altiscale.com>
wrote:

> Prashanth,
>
> Apologies for the delay in response.
>
> Below is the orcfiledump of the empty orc file from a broken partition.
>
> *$ hive --orcfiledump /hive/*testdb*.db/*table_orc
> */year=2016/month=1/day=29/000000_0*
>
> *Structure for  /hive/*testdb*.db/*table_orc
> */year=2016/month=1/day=29/000000_0*
>
> *File Version: 0.12 with HIVE_8732*
>
> *16/03/25 17:49:09 INFO orc.ReaderImpl: Reading ORC rows from  /hive/*
> testdb*.db/*table_orc*/year=2016/month=1/day=29/000000_0 with {include:
> null, offset: 0, length: 9223372036854775807}*
>
> *16/03/25 17:49:09 INFO orc.RecordReaderFactory: Schema is not specified
> on read. Using file schema.*
>
> *Rows: 0*
>
> *Compression: SNAPPY*
>
> *Compression size: 262144*
>
> *Type: struct<>*
>
>
> *Stripe Statistics:*
>
>
> *File Statistics:*
>
> *  Column 0: count: 0 hasNull: false*
>
>
> *Stripes:*
>
>
> *File length: 49 bytes*
>
> *Padding length: 0 bytes*
>
> *Padding ratio: 0%*
>
> *$ *
>
>
> I still not able to figure it out whats causing this odd behaviour?
>
>
> Regards
> Biswa
>
> On Thu, Mar 10, 2016 at 3:12 PM, Prasanth Jayachandran <
> pjayachand...@hortonworks.com> wrote:
>
>> Alternatively you can send orcfiledump output for the empty orc file from
>> broken partition.
>>
>> Thanks
>> Prasanth
>>
>> On Mar 10, 2016, at 5:11 PM, Prasanth Jayachandran <
>> pjayachand...@hortonworks.com> wrote:
>>
>> Could you attach the emtpy orc files from one of the broken partition
>> somewhere? I can run some tests on it to see why its happening.
>>
>> Thanks
>> Prasanth
>>
>> On Mar 8, 2016, at 12:02 AM, Biswajit Nayak <biswa...@altiscale.com>
>> wrote:
>>
>> Both the parameters are set to false by default.
>>
>> *hive> set hive.optimize.index.filter;*
>> *hive.optimize.index.filter=false*
>> *hive> set hive.orc.splits.include.file.footer;*
>> *hive.orc.splits.include.file.footer=false*
>> *hive> *
>>
>> >>>I suspect this might be related to having 0 row files in the buckets
>> not
>> having any recorded schema.
>>
>> yes there are few files with 0 row, but the query works with other
>> partition (which has 0 row files). Out of 30 partition (for a month), 3-4
>> partition are having this issue. Even reload of the data does not yield
>> anything. Query works fine in MR now, but having issue in tez.
>>
>>
>>
>> On Tue, Mar 8, 2016 at 2:43 AM, Gopal Vijayaraghavan <gop...@apache.org>
>> wrote:
>>
>>>
>>> > c                varchar(2)
>>> ...
>>> > Num Buckets:         7
>>>
>>> I suspect this might be related to having 0 row files in the buckets not
>>> having any recorded schema.
>>>
>>> You can also experiment with hive.optimize.index.filter=false, to see if
>>> the zero row case is artificially produced via predicate push-down.
>>>
>>>
>>> That shouldn't be a problem unless you've turned on
>>> hive.orc.splits.include.file.footer=true (recommended to be false).
>>>
>>> Your row-locations don't actually match any Apache source jar in my
>>> builds, are there any other patches to consider?
>>>
>>> Cheers,
>>> Gopal
>>>
>>>
>>>
>>
>>
>>
>

Re: Hive Cli ORC table read error with limit option

Reply via email to