On Sun, Aug 12, 2012 at 7:42 PM, Bill Graham <[email protected]> wrote: > The benefit of having a schema associated with your data should not be > understated. I think when debating whether to use JSON or some other data > serialization format that has a schema (like Avro), you should choose the > later. The schema support alone will pay dividends over the long run.
I would argue it is one of those things that is overstated due to intuitive attractiveness. It is worth keeping in mind that explicit external schema is another cost in not just designing but also maintaining the system. As such, it is most useful for closely-coupled internal system, where one controls both ends. This may be the case for computing pipelines a single team owns. Put another way: both benefits and costs of schemas accumulate over long run, and the ratio ultimately determines which one wins. And yet it is very hard to forecast in advance. What can be said is that maintenance of no-schema is cheaper than mainteinance of schema. Value of schema, on the other hand, is much harder to estimate a priori. -+ Tatu +-
