I need help clearing something up. So I read this: http://nathanmarz.com/blog/how-to-beat-the-cap-theorem.html
And in it he says: “Likewise, writing bad data has a clear path to recovery: delete the bad data and precompute the queries again. Since data is immutable and the master dataset is append-only, writing bad data does not override or otherwise destroy good data.” That sentence makes no sense to me. Data is immutable – > master dataset is append-only – > delete the bad data What? He gives an example of in the batch layer you store raw files in HDFS. My understanding is that you can’t do row level deletes on files in HDFS (because it’s append-only). What am I missing here? Adaryl "Bob" Wakefield, MBA Principal Mass Street Analytics 913.938.6685 www.linkedin.com/in/bobwakefieldmba Twitter: @BobLovesData
