Hello all, I am struggling in both defining what the issue is with my JSON file, and how to zero in on it so I can define it. I figured I'd post my thought process now, as verbose and scary as that is, to help folks see some issues with error messages and (hopefully while helping me solve my problem) determine ways we can make the error messages/troubleshooting better.
First the description. I have a json file with one object per line. It's a nasty structured/nested thing, without many human readable fields which makes this process harder. The original file (let's call it orig.json) has 66240 lines in it. This is Drill 1.10.0 When I run the following query as is: select * from `/path/to/orig.json` limit 10I get: Error Returned - Code: 500 Error Text: SYSTEM ERROR: IllegalStateException: You tried to start when you are using a ValueWriter of type NullableBigIntWriterImpl. Fragment 0:0 [Error Id: de1d4c54-8a6d-4765-9f5f-2b75fdfd4b49 on zeta8.brewingintel.com:20005] I look at this error and already I am lost (as a user who hasn't lived in this user group for the past X years). It's just not helpful for me to troubleshoot the issue. I don't know where to look in the offending files, if this was a directory of files, I wouldn't know which file had the issue. Even if I knew the file (as I do) I am lost in what I am actually looking for. *Point 1: Where are we looking for the problem? Can that be in a the Error message? Filename, and line number at a min, char loc of error if possible. Field names if it knows it!* *Point 2: If a dev based error message is used (that's what I am calling this error message) Can there be some human readable explaination too? Like "A value you are reading may have changed types from X to Y, and that's causing this issue.* *Point 3: If there are options to try to fix, can those be included as well?* Ok, so going on to my own troubleshooting, I decided to guess and check the problem to a line number > head -5000 orig.json > test.json # Run query, still errors > head -4000 orig.json > test.json # Run query, still errors > head -3000 orig.json > test.json # Run query, no error! > head -3500 orig.json > test.json # Error ... ... > head -3425 orig.json > test.json # Error > head -3424 orig.json > test.json # No error! Ok, so I think the error is in line 3425 of the file, so now I try: > tail -1 test.json > test1.json And run the query. No error. That makes sense, one record there... so then > tail -10 test.json > test1.json # No error Huh? > tail -500 test.json > test1.json # no error!!! > tail -1000 test.json > test1.json # no error!!! (This is getting frustrating) > tail -3348 test.json > test1.json # No error > tail -3349 test.json > test1.json # Error So I guess I am stumped, why does something change and why does error be somewhat based on where I drop the need, yet inconsistent? And through this, how can we help improve this process for all users? Thanks! John
