- never returns this: "yes", {"other":"true","all":"
false","sometimes":"yes"}
should have been:
- never returns this: "yes", {"other":"true","all":"
false","sometimes":"yes", "additional":"last entries only"}
Regards,
-Stefan
On Wed, Jul 22, 2015 at 10:52 PM, Stefán Baxter <[email protected]>
wrote:
> Hi,
>
> I keep coming across *quirks* in Drill that are quite time consuming to
> deal with and are now causing mounting concerns.
>
> This last one though is far more serious then the previous ones because it
> deals with loss of data.
>
> I'm working with a small(ish) dataset of around 1m records (which I'm more
> than happy to hand over to replicate this)
>
> The problem goes like this:
>
> 1. with dfs.tmp.`/test.json`
> - containing a structure like this (simplified);
> - 800k x
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
> - 100k
> x
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"}}
>
> 2. selecting: select some, t.others from dfs.tmp.`/test.json` as t;
> - returns only this for all the records: "yes",
> {"other":"true","all":"false","sometimes":"yes"}
> - never returns this:
> "yes", {"other":"true","all":"false","sometimes":"yes"}
>
> The query never returns returns this:
> "yes", {"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"} so the last entries in the file are incorrectly represented.
>
> To make matters a lot worse the the property is completely ignored in:
> create X as * from dfs.tmp.`/test.json` and the now parquet file does not
> include it at all.
>
> It looks, to me, that the dynamic schema discovery has stopped looking for
> schema changes and is quite set in it's way, so set in fact, that it's
> ignoring data.
>
> I'm guessing that this is potentially affecting more people than me.
>
> I believe I have produced this under 1.1 and 1.2-SNAPSHOT.
>
> Regards,
> -Stefan
>