- never returns this: "yes", {"other":"true","all":"
false","sometimes":"yes"}

should have been:

- never returns this: "yes", {"other":"true","all":"
false","sometimes":"yes", "additional":"last entries only"}

Regards,
 -Stefan

On Wed, Jul 22, 2015 at 10:52 PM, Stefán Baxter <[email protected]>
wrote:

> Hi,
>
> I keep coming across *quirks* in Drill that are quite time consuming to
> deal with and are now causing mounting concerns.
>
> This last one though is far more serious then the previous ones because it
> deals with loss of data.
>
> I'm working with a small(ish) dataset of around 1m records (which I'm more
> than happy to hand over to replicate this)
>
> The problem goes like this:
>
>    1. with dfs.tmp.`/test.json`
>    - containing a structure like this (simplified);
>    - 800k x
>    {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes"}}
>    - 100k
>    x 
> {"some":"yes","others":{"other":"true","all":"false","sometimes":"yes","additional":"last
>    entries only"}}
>
>    2. selecting: select some, t.others from dfs.tmp.`/test.json` as t;
>    - returns only this for all the records: "yes",
>    {"other":"true","all":"false","sometimes":"yes"}
>    - never returns this:
>    "yes", {"other":"true","all":"false","sometimes":"yes"}
>
> The query never returns returns this:
> "yes", {"other":"true","all":"false","sometimes":"yes","additional":"last
> entries only"} so the last entries in the file are incorrectly represented.
>
> To make matters a lot worse the the property is completely ignored in:
> create X as * from dfs.tmp.`/test.json` and the now parquet file does not
> include it at all.
>
> It looks, to me, that the dynamic schema discovery has stopped looking for
> schema changes and is quite set in it's way, so set in fact, that it's
> ignoring data.
>
> I'm guessing that this is potentially affecting more people than me.
>
> I believe I have produced this under 1.1 and 1.2-SNAPSHOT.
>
> Regards,
>  -Stefan
>

Reply via email to