Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran: > I quite like the new Hadoop in Practice for a lot of that, especially > the answer to #2, "how to store the data", where he looks at all the > options
The Part 3 Big Data Patterns looks very interesting. I am going to read the book. Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran: > Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig > and Hive can work with that as well as rawer data kept in HDFS > directly But is that the best idea? HBase is great for random read and small range scan. But the Hive (SQL) performance is 4-5x slower than plain HDFS. [0] I guess first data (raw data) in HDFS and last data in HBase is a good idea. But how to store the data between individual mapreduce jobs? [0] Todd Lipcon http://de.slideshare.net/cloudera/chicago-data-summit-apache-hbase-an-introduction p.19 I don't benchmark the performance myself. >
