Re: reference architecture

Daniel Käfer Thu, 25 Oct 2012 15:17:50 -0700

Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran:
> I quite like the new Hadoop in Practice for a lot of that, especially
> the answer to #2, "how to store the data", where he looks at all the
> options


The Part 3 Big Data Patterns looks very interesting. I am going to read
the book.

Am Donnerstag, den 25.10.2012, 22:10 +0100 schrieb Steve Loughran:
> Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig
> and Hive can work with that as well as rawer data kept in HDFS
> directly

But is that the best idea? HBase is great for random read and small
range scan. But the Hive (SQL) performance is 4-5x slower than plain
HDFS. [0]

I guess first data (raw data) in HDFS and last data in HBase is a good
idea. But how to store the data between individual mapreduce jobs?


[0] Todd Lipcon
http://de.slideshare.net/cloudera/chicago-data-summit-apache-hbase-an-introduction
p.19 I don't benchmark the performance myself.
>

Re: reference architecture

Reply via email to