Re: Schema design guideline, strict v.s. lenient

Andrew Ehrlich Sun, 22 Feb 2015 13:14:36 -0800

I have noticed that data consuming people will prefer flat recordsbecause they are easier to query. I have yet to find a good tool toquery unstructured records like JSON. A large amount of time and efforttherefore goes into the ETL process.

Maybe one could fork the data flow and send raw records to an "raw" binand send the the other fork through a process that conforms each recordsto a schema in a schema library.


On 2/10/15 5:01 PM, Wai Yip Tung wrote:

During our development of schema based data pipeline, we often runinto a debate. Should we make the schema tight and strict so that allthe application error can be tested and caught early? Or should wedesign the schema to be lenient, because inevitably the schema isgoing to be evolved and the data we have found in our system oftencontains variations despite our effort constraint it.
Slowly I observed that the difference in school of thought is largelyrelated to their role. The data producer, mainly the applicationdevelopers, wants the schema to be strict (e.g. required attribute, nounion of 'null'). They see this as a debugging tool. They expecterrors to be caught by the encoder during unit test. They expect theproduction system to raise alarm loudly if a bad build break things.
The consumers, mainly the data backend developers and the analysts,want the schema to be lenient. The backend developers often have toreprocess historical data. Strict schema is often incompatible andcause big problem in reading historical data. They aruge having somedata, even if slightly broken, is better than having no data.
We have been having difficulty to strike a balance. It leads me tothink perhaps we need more than a single schema in operation. Perhapsan application developer will create a strict schema. And the backendapplication will derive a lenient version from it in order to load allhistorical data successfully.
I am wondering if others have seen this kind of tension. Any thoughton how to address this?
Wai Yip

Re: Schema design guideline, strict v.s. lenient

Reply via email to