Hi Marc

Pushing to the main repo is locked down and the procedure to follow is to fork the main repo to your own GitHub account, then push to your fork, then open a PR from your fork to the main repo.

On 12/29/22 14:39, marc nicole wrote:
Hi,

Thanks for your reply,
I actually want to submit my changes, but I am being denied to push any
changes to the Drill repo. How to do the pull request in Git ? Are there
any permissions required to get beforehand pushing to the repo ?


Le mer. 28 déc. 2022 à 15:46, Charles Givre <cgi...@gmail.com> a écrit :

Hi Marc,
Thanks for this.  Here's the thing... Let's say you have json that looks
like this:

{
         "foo":null
},{
         "foo": 3.5
}

If you take the approach that `null` is treated like a string, you will
get a schema change exception when you read the next row.  Our current
approach is to basically ignore fields that Drill cannot figure out what
they are in terns of data type.  Once Drill encounters a data type, it will
then assign a data type to that column.  See the example below which is
from DRILL-5033.  I added a second row to demonstrate what happens once
Drill is able to determine a data type.  Note that for the columns with a
defined value in the second row, Drill returns 'null' as the value.


[{
"intKey" : null,
"bgintKey": null,
"strKey": null,
"boolKey": null,
"fltKey": null,
"dblKey": null,
"timKey": null,
"dtKey": null,
"tmstmpKey": null,
"intrvldyKey": null,
"intrvlyrKey": null
},
{
"intKey" : 1,
"bgintKey": 3666565464,
"strKey": "hithere",
"boolKey": true,
"fltKey": 3.5,
"dblKey": 4.2,
"timKey": null,
"dtKey": null,
"tmstmpKey": null,
"intrvldyKey": null,
"intrvlyrKey": null
}]


select * from dfs.test.`nulls.json`;

+--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+
| intKey |   bgintKey    | strKey  | boolKey | fltKey | dblKey | timKey |
dtKey | tmstmpKey | intrvldyKey | intrvlyrKey |

+--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+
| null   | null          | null    | null    | null   | null   | []     |
[]    | []        | []          | []          |
| 1.0    | 3.666565464E9 | hithere | true    | 3.5    | 4.2    | []     |
[]    | []        | []          | []          |

+--------+---------------+---------+---------+--------+--------+--------+-------+-----------+-------------+-------------+
2 rows selected (0.232 seconds)

You are definitely welcome to submit a pull request, however this area is
extremely complex, and I'd suspect that what you propose will break other
unit tests.  Another option which you might not be aware of is providing a
schema.  If you do that from the beginning, then Drill will know what data
types to expect.

Best,
-- C


On Dec 28, 2022, at 8:57 AM, marc nicole <mk1853...@gmail.com> wrote:

Hello Drillers :)

I came across the aforementioned bug (DRILL-5033) and wanted to
contribute.
My attempt is to consider a *null *token as a *string *and print the
"null"
as the column value instead of omitting the key in the output
resultset, details
of the fix attempt is below:


*1)* In JsonReader.java (java-exec/drill-exec/vector/complex/fn/) at line
283 i add the following:

...
case VALUE_NULL:
          // handle null as string
          handleString(parser, map, fieldName);
          break;
...

*2)* then at line 415 the handleString() becomes:

private void handleString(JsonParser parser, MapWriter writer, String
fieldName) throws IOException {
    try {
     // added the following if
      if (parser.nextToken() == VALUE_NULL)
        writer.varChar(fieldName)
          .writeVarChar(0, workingBuffer.prepareVarCharHolder("null"),
workingBuffer.getBuf());
      else
      writer.varChar(fieldName)
          .writeVarChar(0,
workingBuffer.prepareVarCharHolder(parser.getText()),
workingBuffer.getBuf());
    } catch (IllegalArgumentException e) {
      if (parser.getText() == null || parser.getText().isEmpty()) {
       // return;
      }
      throw e;
    }
  }


Is this a possible fix to the mentioned bug?
If yes should i pull request ?

Thanks.


Reply via email to