Mayuresh's suggestion is how I am dealing with this problem. I save metadata about big files in an HBase row, including the path to the Hadoop file containing the actual video -- or any other binary -- data.
I save large files (> 2GB) as single files in HDFS. I save "medium sized" files (1MB > 2GB) as Hadoop TFile records. I periodically merge smaller TFiles into 1 large TFile, and update the corresponding hbase row to point to that new TFile. Files <= 1MB are stored in the HBase table row. Each hbase row holds (a) the actual data -- for files <= 1MB or (b) HDFS.File.Path, -- for files > 2GB or (c) HDFS TFile.Path and TFile.Key -- for files between 1MB and 2GB in size. Bundling the 'medium sized files' (1MB - 2GB) in hadoop TFiles is my solution to the 'small files problem'; it helps reduce the Hadoop namenode's workload. Regards, Stan 2011/10/31 xtliwen <[email protected]> > Hi everybody, > When the client visit the video of a website through my website, it will > be transcoded with our video codec server.As the time goes on,the > quantities of the videos is quite large. Usuallly, a video can be tanscoded > to serveral levels,so a original video is corresponding to multi transcoded > videos . Now,we plan to record the video files with hbase. There are two > problems with the hbase: > 1 the video file is too large (100M avg.) > 2 we require the transcoded video must can be read while it was been > writing > > So,can anybody give some suggestions, Thanks. > > 2011-10-31 > > > > regards > xtliwen >
