Hi, Thanks for your reply, I actually want to submit my changes, but I am being denied to push any changes to the Drill repo. How to do the pull request in Git ? Are there any permissions required to get beforehand pushing to the repo ?
Le mer. 28 déc. 2022 à 15:46, Charles Givre <cgi...@gmail.com> a écrit : > Hi Marc, > Thanks for this. Here's the thing... Let's say you have json that looks > like this: > > { > "foo":null > },{ > "foo": 3.5 > } > > If you take the approach that `null` is treated like a string, you will > get a schema change exception when you read the next row. Our current > approach is to basically ignore fields that Drill cannot figure out what > they are in terns of data type. Once Drill encounters a data type, it will > then assign a data type to that column. See the example below which is > from DRILL-5033. I added a second row to demonstrate what happens once > Drill is able to determine a data type. Note that for the columns with a > defined value in the second row, Drill returns 'null' as the value. > > > [{ > "intKey" : null, > "bgintKey": null, > "strKey": null, > "boolKey": null, > "fltKey": null, > "dblKey": null, > "timKey": null, > "dtKey": null, > "tmstmpKey": null, > "intrvldyKey": null, > "intrvlyrKey": null > }, > { > "intKey" : 1, > "bgintKey": 3666565464, > "strKey": "hithere", > "boolKey": true, > "fltKey": 3.5, > "dblKey": 4.2, > "timKey": null, > "dtKey": null, > "tmstmpKey": null, > "intrvldyKey": null, > "intrvlyrKey": null > }] > > > select * from dfs.test.`nulls.json`; > > +--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+ > | intKey | bgintKey | strKey | boolKey | fltKey | dblKey | timKey | > dtKey | tmstmpKey | intrvldyKey | intrvlyrKey | > > +--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+ > | null | null | null | null | null | null | [] | > [] | [] | [] | [] | > | 1.0 | 3.666565464E9 | hithere | true | 3.5 | 4.2 | [] | > [] | [] | [] | [] | > > +--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+ > 2 rows selected (0.232 seconds) > > You are definitely welcome to submit a pull request, however this area is > extremely complex, and I'd suspect that what you propose will break other > unit tests. Another option which you might not be aware of is providing a > schema. If you do that from the beginning, then Drill will know what data > types to expect. > > Best, > -- C > > > > On Dec 28, 2022, at 8:57 AM, marc nicole <mk1853...@gmail.com> wrote: > > > > Hello Drillers :) > > > > I came across the aforementioned bug (DRILL-5033) and wanted to > contribute. > > My attempt is to consider a *null *token as a *string *and print the > "null" > > as the column value instead of omitting the key in the output > > resultset, details > > of the fix attempt is below: > > > > > > *1)* In JsonReader.java (java-exec/drill-exec/vector/complex/fn/) at line > > 283 i add the following: > > > >> ... > >> case VALUE_NULL: > >> // handle null as string > >> handleString(parser, map, fieldName); > >> break; > >> ... > > > > > > *2)* then at line 415 the handleString() becomes: > > > > private void handleString(JsonParser parser, MapWriter writer, String > >> fieldName) throws IOException { > >> try { > >> // added the following if > >> if (parser.nextToken() == VALUE_NULL) > >> writer.varChar(fieldName) > >> .writeVarChar(0, workingBuffer.prepareVarCharHolder("null"), > >> workingBuffer.getBuf()); > >> else > >> writer.varChar(fieldName) > >> .writeVarChar(0, > >> workingBuffer.prepareVarCharHolder(parser.getText()), > >> workingBuffer.getBuf()); > >> } catch (IllegalArgumentException e) { > >> if (parser.getText() == null || parser.getText().isEmpty()) { > >> // return; > >> } > >> throw e; > >> } > >> } > > > > > > > > Is this a possible fix to the mentioned bug? > > If yes should i pull request ? > > > > Thanks. > >