I don't think Drill is supposed to "ignore" data. My understanding is that the reader will read the new fields which will cause a schema change, and depending on the query (if all operators involved can handle the schema change or not) the query should either succeed or fail. My understanding is that Drill will most likely fail rather than display incorrect results otherwise it's a bug that needs to be fixed. Sometimes, the reader itself my fail for example if you have a list of numbers and the first 1000 values are int, if any value after that is double or string, this will cause the json reader to fail.
On Thu, Jul 23, 2015 at 9:16 AM, Matt <[email protected]> wrote: > On 23 Jul 2015, at 10:53, Abdel Hakim Deneche wrote: > > When you try to read schema-less data, Drill will first investigate the >> 1000 rows to figure out a schema for your data, then it will use this >> schema for the remaining of the query. >> > > To clarify, if the JSON schema changes on the 1001st 1MMth record, is > Drill supposed to report an error, or ignore new data elements and only > consider those discovered in the first 1000 objects? > -- Abdelhakim Deneche Software Engineer <http://www.mapr.com/> Now Available - Free Hadoop On-Demand Training <http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>
