I'm going to look into this next week. Here is a JIRA to follow the issue, comment, suggest solutions: https://issues.apache.org/jira/browse/DRILL-3806
On Fri, Sep 18, 2015 at 3:10 PM, Stefán Baxter <[email protected]> wrote: > Hi, > > I have nothing meaningful to add but I had share that this BigInt > assumption has caused more grief than any other single feature in Drill. > > I would go so far as to say that the "type intolerance" and lack of > "sensible conversion" is the biggest hurdle on the way to fulfilling the > "eliminate ETL and start working on your data right away" promise. > > Don't get me wrong. I love Drill and we plan on using quite a lot. > > Regards, > -Stefan > > On Fri, Sep 18, 2015 at 9:59 PM, Jacques Nadeau <[email protected]> > wrote: > > > This is a limitation of really late appearing fields. Right now, > depending > > on the situation, if a value doesn't show up in the first ~4k, we assume > > that the value is BigInt. I think a developer is working on improving > this > > behavior right now. I'll ping them to see when we might have a fix. > > > > -- > > Jacques Nadeau > > CTO and Co-Founder, Dremio > > > > On Fri, Sep 18, 2015 at 1:04 AM, Mustafa Engin Sözer < > > [email protected]> wrote: > > > > > Hi Andries, > > > > > > I've already tried writing where and case statements but none of them > > > worked unfortunately. It's still: > > > > > > DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar type > > > when you are using a ValueWriter of type NullableIntWriterImpl. > > > > > > When I try this on sqlline, I can confirm that the output of the query > is > > > written until 5665th row and then the Varchar value comes up and the > > query > > > fails. > > > > > > Used queries: > > > > > > 1) SELECT CASE WHEN field_a IS NULL THEN 1 ELSE 0 END from > > > dfs.poc.`poc.json` t; > > > 2) SELECT field_a FROM dfs.poc.`poc.json` where field_a is not null; (I > > > actually also need those rows but anyway, this is also not working) > > > > > > As you see from the first query, even if I do not select the field > itself > > > in the statement as part of the output, the problem occurs during the > > > reading of json file. (Specifically this field) > > > > > > On 17 September 2015 at 17:15, Andries Engelbrecht < > > > [email protected]> wrote: > > > > > > > Don’t know what the use case is for the queries you are trying to > run. > > > > > > > > See if these 2 workarounds can work for your needs. > > > > > > > > 1. The simplest, if you are not interested in the records where the > > field > > > > value is null or not null. > > > > Use a predicate to filter out the records. > > > > > > > > 2. Use a case statement before casting to a specific data type to > > handle > > > > the records with the null field differently. > > > > > > > > —Andries > > > > > > > > > > > > > On Sep 16, 2015, at 5:22 AM, Mustafa Engin Sözer < > > > > [email protected]> wrote: > > > > > > > > > > By the way, I forgot to mention that we use Drill 1.0 currently. > > > > > Additionall point: > > > > > > > > > > I've did some other tests and the point is that even if I change > that > > > > > varchar field into an integer (e.g. 123456) without the double > > quotes, > > > it > > > > > still does not work. The only way it works is if I set that also to > > > null. > > > > > That's really weird. I might be missing something here but can't > > figure > > > > out > > > > > what at the moment. > > > > > > > > > > On 16 September 2015 at 10:57, Mustafa Engin Sözer < > > > > > [email protected]> wrote: > > > > > > > > > >> Hi everyone, > > > > >> > > > > >> I'm having an issue here. I have the following sample set as json: > > > > >> > > > > >> { > > > > >> "field_a":null > > > > >> } > > > > >> { > > > > >> "field_a":"e900511b2bff6b9d33cc" > > > > >> } > > > > >> > > > > >> Due to some problems in the dataset, I had to already set: > > > > >> > > > > >> store.json.all_text_mode to true > > > > >> > > > > >> But even now, when i try to query the dataset, the following error > > is > > > > >> thrown: > > > > >> > > > > >> DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar > > > type > > > > >> when you are using a ValueWriter of type NullableIntWriterImpl. > > > > >> > > > > >> > > > > >> There are more than 40K rows already in this dataset (I've just > > > > summarized > > > > >> the related part here). The thing is up to > > > > >> "field_a":"e900511b2bff6b9d33cc", the value of field_a was always > > > null. > > > > And > > > > >> I presume that Drill already assigned a NullableInt type during > > schema > > > > >> recovery, thus when it faces a Varchar value at the 5665th record, > > it > > > > just > > > > >> fails. > > > > >> > > > > >> As far as I know, if I enclose the null value with double quotes, > > then > > > > >> it's not really a true null representation. At the end, the > question > > > > is: do > > > > >> you know what might be the problem and is there any workaround or > > > > setting > > > > >> to overcome this issue? > > > > >> > > > > >> Another thing is: when store.json.all_text_mode is set to true, > why > > > does > > > > >> Drill still recognize this field as NullableInt ? Shouldn't it > > > consider > > > > >> everything as Varchar already? > > > > >> > > > > >> Thanks a lot for your help. > > > > >> > > > > >> Cheers, > > > > >> -- > > > > >> > > > > >> *M. Engin Sözer* > > > > >> Junior Datawarehouse Manager > > > > >> [email protected] > > > > >> > > > > >> Goodgame Studios > > > > >> Theodorstr. 42-90, House 9 > > > > >> 22761 Hamburg, Germany > > > > >> Phone: +49 (0)40 219 880 -0 > > > > >> *www.goodgamestudios.com <http://www.goodgamestudios.com>* > > > > >> > > > > >> Goodgame Studios is a branch of Altigi GmbH > > > > >> Altigi GmbH, District court Hamburg, HRB 99869 > > > > >> Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, > > Fabian > > > > >> Ritter > > > > >> > > > > >> > > > > > > > > > > > > > > > -- > > > > > > > > > > *M. Engin Sözer* > > > > > Junior Datawarehouse Manager > > > > > [email protected] > > > > > > > > > > Goodgame Studios > > > > > Theodorstr. 42-90, House 9 > > > > > 22761 Hamburg, Germany > > > > > Phone: +49 (0)40 219 880 -0 > > > > > *www.goodgamestudios.com <http://www.goodgamestudios.com>* > > > > > > > > > > Goodgame Studios is a branch of Altigi GmbH > > > > > Altigi GmbH, District court Hamburg, HRB 99869 > > > > > Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, > > Fabian > > > > > Ritter > > > > > > > > > > > > > > > > > -- > > > > > > *M. Engin Sözer* > > > Junior Datawarehouse Manager > > > [email protected] > > > > > > Goodgame Studios > > > Theodorstr. 42-90, House 9 > > > 22761 Hamburg, Germany > > > Phone: +49 (0)40 219 880 -0 > > > *www.goodgamestudios.com <http://www.goodgamestudios.com>* > > > > > > Goodgame Studios is a branch of Altigi GmbH > > > Altigi GmbH, District court Hamburg, HRB 99869 > > > Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian > > > Ritter > > > > > > -- Julien
