I'm going to look into this next week.
Here is a JIRA to follow the issue, comment, suggest solutions:
https://issues.apache.org/jira/browse/DRILL-3806

On Fri, Sep 18, 2015 at 3:10 PM, Stefán Baxter <[email protected]>
wrote:

> Hi,
>
> I have nothing meaningful to add but I had share that this BigInt
> assumption has caused more grief than any other single feature in Drill.
>
> I would go so far as to say that the "type intolerance" and lack of
> "sensible conversion"  is the biggest hurdle on the way to fulfilling the
> "eliminate ETL and start working on your data right away" promise.
>
> Don't get me wrong. I love Drill and we plan on using quite a lot.
>
> Regards,
>   -Stefan
>
> On Fri, Sep 18, 2015 at 9:59 PM, Jacques Nadeau <[email protected]>
> wrote:
>
> > This is a limitation of really late appearing fields. Right now,
> depending
> > on the situation, if a value doesn't show up in the first ~4k, we assume
> > that the value is BigInt. I think a developer is working on improving
> this
> > behavior right now. I'll ping them to see when we might have a fix.
> >
> > --
> > Jacques Nadeau
> > CTO and Co-Founder, Dremio
> >
> > On Fri, Sep 18, 2015 at 1:04 AM, Mustafa Engin Sözer <
> > [email protected]> wrote:
> >
> > > Hi Andries,
> > >
> > > I've already tried writing where and case statements but none of them
> > > worked unfortunately. It's still:
> > >
> > > DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar type
> > > when you are using a ValueWriter of type NullableIntWriterImpl.
> > >
> > > When I try this on sqlline, I can confirm that the output of the query
> is
> > > written until 5665th row and then the Varchar value comes up and the
> > query
> > > fails.
> > >
> > > Used queries:
> > >
> > > 1) SELECT CASE WHEN field_a IS NULL THEN 1 ELSE 0 END  from
> > > dfs.poc.`poc.json` t;
> > > 2) SELECT field_a FROM dfs.poc.`poc.json` where field_a is not null; (I
> > > actually also need those rows but anyway, this is also not working)
> > >
> > > As you see from the first query, even if I do not select the field
> itself
> > > in the statement as part of the output, the problem occurs during the
> > > reading of json file. (Specifically this field)
> > >
> > > On 17 September 2015 at 17:15, Andries Engelbrecht <
> > > [email protected]> wrote:
> > >
> > > > Don’t know what the use case is for the queries you are trying to
> run.
> > > >
> > > > See if these 2 workarounds can work for your needs.
> > > >
> > > > 1. The simplest, if you are not interested in the records where the
> > field
> > > > value is null or not null.
> > > > Use a predicate to filter out the records.
> > > >
> > > > 2. Use a case statement before casting to a specific data type to
> > handle
> > > > the records with the null field differently.
> > > >
> > > > —Andries
> > > >
> > > >
> > > > > On Sep 16, 2015, at 5:22 AM, Mustafa Engin Sözer <
> > > > [email protected]> wrote:
> > > > >
> > > > > By the way, I forgot to mention that we use Drill 1.0 currently.
> > > > > Additionall point:
> > > > >
> > > > > I've did some other tests and the point is that even if I change
> that
> > > > > varchar field into an integer (e.g. 123456) without the double
> > quotes,
> > > it
> > > > > still does not work. The only way it works is if I set that also to
> > > null.
> > > > > That's really weird. I might be missing something here but can't
> > figure
> > > > out
> > > > > what at the moment.
> > > > >
> > > > > On 16 September 2015 at 10:57, Mustafa Engin Sözer <
> > > > > [email protected]> wrote:
> > > > >
> > > > >> Hi everyone,
> > > > >>
> > > > >> I'm having an issue here. I have the following sample set as json:
> > > > >>
> > > > >> {
> > > > >> "field_a":null
> > > > >> }
> > > > >> {
> > > > >> "field_a":"e900511b2bff6b9d33cc"
> > > > >> }
> > > > >>
> > > > >> Due to some problems in the dataset, I had to already set:
> > > > >>
> > > > >> store.json.all_text_mode to true
> > > > >>
> > > > >> But even now, when i try to query the dataset, the following error
> > is
> > > > >> thrown:
> > > > >>
> > > > >> DATA_READ ERROR: Error parsing JSON - You tried to write a VarChar
> > > type
> > > > >> when you are using a ValueWriter of type NullableIntWriterImpl.
> > > > >>
> > > > >>
> > > > >> There are more than 40K rows already in this dataset (I've just
> > > > summarized
> > > > >> the related part here). The thing is up to
> > > > >> "field_a":"e900511b2bff6b9d33cc", the value of field_a was always
> > > null.
> > > > And
> > > > >> I presume that Drill already assigned a NullableInt type during
> > schema
> > > > >> recovery, thus when it faces a Varchar value at the 5665th record,
> > it
> > > > just
> > > > >> fails.
> > > > >>
> > > > >> As far as I know, if I enclose the null value with double quotes,
> > then
> > > > >> it's not really a true null representation. At the end, the
> question
> > > > is: do
> > > > >> you know what might be the problem and is there any workaround or
> > > > setting
> > > > >> to overcome this issue?
> > > > >>
> > > > >> Another thing is: when store.json.all_text_mode is set to true,
> why
> > > does
> > > > >> Drill still recognize this field as NullableInt ? Shouldn't it
> > > consider
> > > > >> everything as Varchar already?
> > > > >>
> > > > >> Thanks a lot for your help.
> > > > >>
> > > > >> Cheers,
> > > > >> --
> > > > >>
> > > > >> *M. Engin Sözer*
> > > > >> Junior Datawarehouse Manager
> > > > >> [email protected]
> > > > >>
> > > > >> Goodgame Studios
> > > > >> Theodorstr. 42-90, House 9
> > > > >> 22761 Hamburg, Germany
> > > > >> Phone: +49 (0)40 219 880 -0
> > > > >> *www.goodgamestudios.com <http://www.goodgamestudios.com>*
> > > > >>
> > > > >> Goodgame Studios is a branch of Altigi GmbH
> > > > >> Altigi GmbH, District court Hamburg, HRB 99869
> > > > >> Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek,
> > Fabian
> > > > >> Ritter
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > *M. Engin Sözer*
> > > > > Junior Datawarehouse Manager
> > > > > [email protected]
> > > > >
> > > > > Goodgame Studios
> > > > > Theodorstr. 42-90, House 9
> > > > > 22761 Hamburg, Germany
> > > > > Phone: +49 (0)40 219 880 -0
> > > > > *www.goodgamestudios.com <http://www.goodgamestudios.com>*
> > > > >
> > > > > Goodgame Studios is a branch of Altigi GmbH
> > > > > Altigi GmbH, District court Hamburg, HRB 99869
> > > > > Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek,
> > Fabian
> > > > > Ritter
> > > >
> > > >
> > >
> > >
> > > --
> > >
> > > *M. Engin Sözer*
> > > Junior Datawarehouse Manager
> > > [email protected]
> > >
> > > Goodgame Studios
> > > Theodorstr. 42-90, House 9
> > > 22761 Hamburg, Germany
> > > Phone: +49 (0)40 219 880 -0
> > > *www.goodgamestudios.com <http://www.goodgamestudios.com>*
> > >
> > > Goodgame Studios is a branch of Altigi GmbH
> > > Altigi GmbH, District court Hamburg, HRB 99869
> > > Board of directors: Dr. Kai Wawrzinek, Dr. Christian Wawrzinek, Fabian
> > > Ritter
> > >
> >
>



-- 
Julien

Reply via email to