On 25 October 2012 20:24, Daniel Käfer <[email protected]> wrote:
> Hello all, > > I'm looking for a reference architecture for hadoop. The only result I > found is Lambda architecture from Nathan Marz[0]. > I quite like the new Hadoop in Practice for a lot of that, especially the answer to #2, "how to store the data", where he looks at all the options. Joining is the other big issue. http://steveloughran.blogspot.co.uk/2012/10/hadoop-in-practice-applied-hadoop.html Regarding storing DB data, HBase-on-HDFS is where people keep it; Pig and Hive can work with that as well as rawer data kept in HDFS directly > With architecture I mean answers to question like: > - How should I store the data? CSV, Thirft, ProtoBuf > - How should I model the data? ER-Model, Starschema, something new? > - normalized or denormalized or both (master data normalized, then > transformation to denormalized, like ETL) > - How should i combine database and HDFS-Files? > > Are there any other documented architectures for hadoop? > > Regards > Daniel Käfer > > > [0] http://www.manning.com/marz/ just a preprint yet, not completed > >
