[Zebra] Can Zebra use HAR to reduce file/block count for namenode
-----------------------------------------------------------------

                 Key: PIG-1411
                 URL: https://issues.apache.org/jira/browse/PIG-1411
             Project: Pig
          Issue Type: New Feature
          Components: impl
    Affects Versions: 0.8.0
            Reporter: Gaurav Jain
            Priority: Minor
             Fix For: 0.8.0



Due to column group structure,  Zebra can create extra files for namenode to 
remember. That means namenode taking more memory for Zebra related files.

The goal is to reduce the no of files/blocks

The idea among various options is to use HAR ( Hadoop Archive ). Hadoop Archive 
reduces the block  and file count by copying data from small files ( 1M, 2M 
...) into a hdfs-block of larger size. Thus, reducing the total no. of blocks 
and files.


 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to