Pig and Hive both have support for compressed sequence files.

Regarding best format - if its just text log data (i.e. no
types/structure) then the best format to keep it in is in
text+compress. SequenceFiles help make it splittable but add a small
overhead in space and efficiency and none of the good codecs out there
are splittable on their own for compression (LZO is good, but needs
pre-indexing to be viewed splittable).

On Tue, Apr 9, 2013 at 10:21 PM, Mark <[email protected]> wrote:
> Actually, compressed sequence files may not work with Pig or Hive then right?
>
> On Apr 9, 2013, at 9:50 AM, Mark <[email protected]> wrote:
>
>> Forgetting Impala, what format would be best to use with daily logs?
>>
>> Block-compressed sequence files?
>>
>> On Apr 8, 2013, at 8:12 PM, Harsh J <[email protected]> wrote:
>>
>>> Hey Mark,
>>>
>>> Gzip codec creates extension .gzip, not .deflate (which is
>>> DeflateCodec). You may want to re-check your settings.
>>>
>>> Impala questions are best resolved at its current user and developer
>>> community at 
>>> https://groups.google.com/a/cloudera.org/forum/#!forum/impala-user.
>>> Impala does currently support LZO (and also Indexed LZO) compressed
>>> text files however, so you may want to try that as its splittable
>>> (compared to Gzip ones).
>>>
>>> On Tue, Apr 9, 2013 at 5:18 AM, Mark <[email protected]> wrote:
>>>> Trying to determine what the best format to use for storing daily logs. We 
>>>> recently switch from snappy (.snappy) to gzip (.deflate) but I'm wondering 
>>>> if there is something better? Our main clients for these daily logs are 
>>>> pig and hive using an external table. We were thinking about testing out 
>>>> impala but we see that it doesn't work with compressed text files. Any 
>>>> suggestions?
>>>>
>>>> Thanks
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>



-- 
Harsh J

Reply via email to