Hi, I'm planning to crawl thousands of news rss feeds via MapReduce, and save each news article into HBase directly.
My concern is that Hadoop does not work well with a large number of small-size files, and if I insert every single news article (which is small-size apparently) into HBase, (without separately storing it into HDFS) I might end up with millions of files that are only several kilobytes in size. Or does HBase somehow automatically append each news article into a single file, so that it would have only a few files of large-size? Ed
