In the old world, data cleaning used to be a large part of the data warehouse 
load. Now that we’re working in a schemaless environment, I’m not sure where 
data cleansing is supposed to take place. NoSQL sounds fun because 
theoretically you just drop everything in but transactional systems that 
generate the data are still full of bugs and create junk data. 

My question is, where does data cleaning/master data management/CDI belong in a 
modern data architecture? Before it hit hits Hadoop? After?

B.

Reply via email to