[jira] Commented: (PIG-1411) [Zebra] Can Zebra use HAR to reduce file/block count for namenode

Gaurav Jain (JIRA) Tue, 11 May 2010 10:44:03 -0700

    [ 
https://issues.apache.org/jira/browse/PIG-1411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866225#action_12866225
 ]


Gaurav Jain commented on PIG-1411:
----------------------------------


-- 128 Files with 128 blocks of 128M each
-- 128K bytes taken in namenode

-- With 2GB HAR block size, 128 files --> 8 files ( 16 blocks in one HAR part 
file )
-- ~80K bytes taken in namenode
-- As total number of hdfs blocks will remain same of size 128M

So, its a  ~50% improvement in namespace which is not huge and needs to be 
evaluated against performance loss of using HAR

WIth larger files, savings are not huge and performance should be taken into 
account before using HAR  

With larger blocks size for both HAR or HDFS, further gains are expected. But 
those have their own tradeoffs

> [Zebra] Can Zebra use HAR to reduce file/block count for namenode
> -----------------------------------------------------------------
>
>                 Key: PIG-1411
>                 URL: https://issues.apache.org/jira/browse/PIG-1411
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.8.0
>            Reporter: Gaurav Jain
>            Assignee: Gaurav Jain
>            Priority: Minor
>             Fix For: 0.8.0
>
>
> Due to column group structure,  Zebra can create extra files for namenode to 
> remember. That means namenode taking more memory for Zebra related files.
> The goal is to reduce the no of files/blocks
> The idea among various options is to use HAR ( Hadoop Archive ). Hadoop 
> Archive reduces the block  and file count by copying data from small files ( 
> 1M, 2M ...) into a hdfs-block of larger size. Thus, reducing the total no. of 
> blocks and files.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1411) [Zebra] Can Zebra use HAR to reduce file/block count for namenode

Reply via email to