Mayuresh's suggestion is how I am dealing with this problem.

I save metadata about big files in an HBase row, including the path to the
Hadoop file containing the actual video -- or any other binary -- data.

I save large files (> 2GB) as single files in HDFS.

I save "medium sized" files (1MB > 2GB) as Hadoop TFile records.  I
periodically merge smaller TFiles into 1 large TFile, and update the
corresponding hbase row to point to that new TFile.

Files <= 1MB are stored in the HBase table row.

Each hbase row holds
(a) the actual data -- for files <= 1MB
or (b)  HDFS.File.Path, -- for files > 2GB
or (c)  HDFS TFile.Path and TFile.Key -- for files between 1MB and 2GB in
size.

Bundling the 'medium sized files' (1MB - 2GB) in hadoop TFiles is my
solution to the 'small files problem'; it helps reduce the Hadoop
namenode's workload.
Regards,
Stan


2011/10/31 xtliwen <[email protected]>

> Hi everybody,
> When the client visit the video of a website through my website, it will
> be transcoded with our video codec server.As the time goes on,the
> quantities of the videos is quite large. Usuallly, a video can be tanscoded
> to serveral levels,so a original video is corresponding to multi transcoded
> videos . Now,we plan to record the video files with hbase. There are two
> problems with the hbase:
> 1 the video file is too large (100M avg.)
> 2 we require the transcoded video must can be read while it was been
> writing
>
> So,can anybody give some suggestions, Thanks.
>
> 2011-10-31
>
>
>
> regards
> xtliwen
>

Reply via email to