Hello, We are trying to 'flatten' out a multi level (nested) json file that is the result of a REST API call. We were going to try flatten out this to a CSV, load it into Oracle tables and then do traditional SQL queries. The initial approach was going to be done using a combination of CSVKIT (in2csv) and Perl. Until I came across Apache Drill....
I have spent the past 4 days going through the setup and tutorials but have hit a roadlock on a number of different front which I have listed below and hope you can help. 1) Where does Drill get the employee.json file from? I have searched my whole computer after downloading the tar file and looked in the sample data directory and cannot find it anywhere. I have attached a screenshot after doing a find command which shows the content of the sample data directory. 2) I have the json file I wish to query in the bin directory but whenever I try to query it errors - based upon reading the tutes and also looking at the structure of the select statement agains the employee.json file I THINK it's right but it fails every time. I have attached a cut down version of the sample json i am trying to query. It is called sample_json.json 3) The aforementioned json file contains nested data and the problem is the nested level also contain similar names as some of the higher levels. This is the way the log information is returned in JSON format from the REST API call. How can one flatten this? The deepest level is 2 considering that leve one starts at 0. ie. there are 3 levels at most. In cases where there are 3 levels, we want to flatten it out, in cases where there are 2, then only 2 will be flattened out. 4) When trying to executed drill_embedded from the bin directory it starts up but then when trying to run the same query against employee.json that works from the UI, the drill_embedded command simply hangs until i have to kill the process. See the screenshot "drill embedded under cygwin hangs". Note JAVA_HOME has been set. 5)Then when trying sqlline from the bin directory it starts up, I run the same employee query, it hangs and then starts togive log4j errors. See attached screenshot "sqlline_hangs_then_gives_log4j_error" 6) Finally the sqlline bat file works under the Windows command prompt and returns results when querying employee.json (but no other file) but I cannot exit despite entering the !quit command. see attached screenshot: "Unable_to_exit_sqlline_when_running_from_Windows_command_promtp" Please note I've gone through the documentation and believe I've followed the instructions as best I can and managed to solve a few issues by searching online. However, even after trying towatch some Drill youtube videos and looking on forums like stackoverflow still can't it to work as nicely as the tutes say and videos demonstrated. Basically all we want to do is flatten out a nested JSON file and Drill seems far more suited to this task - if only we can get it working properly - than coding Python or Perl scripts. THe csvkit (in2csv) works fine but not for nested JSON data which is our use case. One final question - how can one 'spool' results to a file? We've tried giving it a go under Windows, Mac OS and also cygwin but each time have failed somewhere despite trying to follow the documentation and tutorial material to the best of our ability. It just looks soooooo much easier reaing the online doco then actually trying to implement. We'd gratefully welcome some feedback on the aforementioned teething issues we're experiencing as we'd certainly like to fly the Apache Drill flag. thanks and regards, Rob.
sample_json.json
Description: application/json
