Russell Jurney http://datasyndrome.com
On Oct 25, 2012, at 12:24 PM, "Daniel Käfer" <[email protected]> wrote: > Hello all, > > I'm looking for a reference architecture for hadoop. The only result I > found is Lambda architecture from Nathan Marz[0]. > > With architecture I mean answers to question like: > - How should I store the data? CSV, Thirft, ProtoBuf You should use Avro. > - How should I model the data? ER-Model, Starschema, something new? You should use document format. > - normalized or denormalized or both (master data normalized, then > transformation to denormalized, like ETL) Demoralized fully, into document format. > - How should i combine database and HDFS-Files? Don't. Put everything on HDFS. > > Are there any other documented architectures for hadoop? I really did make an example in my book. It is just one example, but you wanted answers to questions that always 'depend.' You can check it out in slides: http://www.slideshare.net/mobile/hortonworks/agile-analytics-applications-on-hadoop > > Regards > Daniel Käfer > > > [0] http://www.manning.com/marz/ just a preprint yet, not completed >
