Rob
Before answering specific questions, I just wanted to check how did you
setup Drill.
Did you just download the tar from the Drill site and launching in embedded
mode?
I just did the same and am able to query the file you attached. I just put
in my local file system under the directory and referenced in the query.
0: jdbc:drill:zk=local> select flatten(t.entries.entries) from
dfs.`/Users/nrentachintala/Downloads/yelp/sample_json.json` t limit 5;
*+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+*
*| **
EXPR$0
** |*
*+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+*
*|
*{"@type":"activityLogEntry","id":"444312323","type":"BBBDDD","objectName":"lll","runId":3,"runtimeEnvironmentId":"333K2E24333333333332","startTime":"2344-34-13T33:33:41.333Z","endTime":"2344-34-13T33:33:11.333Z","state":4,"failedSourceRows":3,"successSourceRows":4,"failedTargetRows":3,"successTargetRows":4,"errorMsg":"No
errors encountered."}* |*
*|
*{"@type":"activityLogEntry","id":"444312322","type":"BBBDDD","objectName":"mmm","runId":3,"runtimeEnvironmentId":"333K2E24333333333332","startTime":"2344-34-13T33:33:11.333Z","endTime":"2344-34-13T33:33:13.333Z","state":4,"failedSourceRows":3,"successSourceRows":4,"failedTargetRows":3,"successTargetRows":4,"errorMsg":"No
errors encountered."}* |*
*|
*{"@type":"activityLogEntry","id":"444312322","type":"FFFGGG","objectName":"","runId":3,"runtimeEnvironmentId":"333K2E24333333333332","startTime":"2344-34-13T33:33:13.333Z","endTime":"2344-34-13T33:33:24.333Z","state":4,"failedSourceRows":3,"successSourceRows":4,"failedTargetRows":3,"successTargetRows":4,"errorMsg":"No
errors encountered."}* |*
*|
*{"@type":"activityLogEntry","id":"444312322","type":"FFFGGG","objectName":"","runId":3,"runtimeEnvironmentId":"333K2E24333333333332","startTime":"2344-34-13T33:33:24.333Z","endTime":"2344-34-13T33:33:22.333Z","state":4,"failedSourceRows":3,"successSourceRows":3,"failedTargetRows":3,"successTargetRows":3,"errorMsg":"No
errors encountered."}* |*
*|
*{"@type":"activityLogEntry","id":"444312323","type":"BBBDDD","objectName":"qqq","runId":3,"runtimeEnvironmentId":"333K2E24333333333332","startTime":"2344-34-13T33:33:22.333Z","endTime":"2344-34-13T33:31:14.333Z","state":4,"failedSourceRows":3,"successSourceRows":422,"failedTargetRows":3,"successTargetRows":422,"errorMsg":"No
errors encountered."}* |*
*+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+*
0: jdbc:drill:zk=local> select t1.entries.objectName, t1.entries.runId from
(select flatten(t.entries.entries) entries from `sample_json.json` t) t1;
*+---------+---------+*
*| **EXPR$0 ** | **EXPR$1 ** |*
*+---------+---------+*
*| *lll * | *3 * |*
*| *mmm * | *3 * |*
*| * * | *3 * |*
*| * * | *3 * |*
*| *qqq * | *3 * |*
*| *ooo * | *3 * |*
*| *ppp * | *3 * |*
*| * * | *3 * |*
*| * * | *3 * |*
*| * * | *3 * |*
*+---------+---------+*
On Tue, Jun 9, 2015 at 12:41 AM, Rob E <[email protected]> wrote:
> Hello,
>
> We are trying to 'flatten' out a multi level (nested) json file that is
> the result of a REST API call. We were going to try flatten out this to a
> CSV, load it into Oracle tables and then do traditional SQL queries. The
> initial approach was going to be done using a combination of CSVKIT
> (in2csv) and Perl. Until I came across Apache Drill....
>
> I have spent the past 4 days going through the setup and tutorials but
> have hit a roadlock on a number of different front which I have listed
> below and hope you can help.
>
> 1) Where does Drill get the employee.json file from? I have searched my
> whole computer after downloading the tar file and looked in the sample data
> directory and cannot find it anywhere. I have attached a screenshot after
> doing a find command which shows the content of the sample data directory.
>
> 2) I have the json file I wish to query in the bin directory but whenever
> I try to query it errors - based upon reading the tutes and also looking at
> the structure of the select statement agains the employee.json file I THINK
> it's right but it fails every time. I have attached a cut down version of
> the sample json i am trying to query. It is called sample_json.json
>
> 3) The aforementioned json file contains nested data and the problem is
> the nested level also contain similar names as some of the higher levels.
> This is the way the log information is returned in JSON format from the
> REST API call. How can one flatten this? The deepest level is 2 considering
> that leve one starts at 0. ie. there are 3 levels at most. In cases where
> there are 3 levels, we want to flatten it out, in cases where there are 2,
> then only 2 will be flattened out.
>
> 4) When trying to executed drill_embedded from the bin directory it starts
> up but then when trying to run the same query against employee.json that
> works from the UI, the drill_embedded command simply hangs until i have to
> kill the process. See the screenshot "drill embedded under cygwin hangs".
> Note JAVA_HOME has been set.
>
> 5)Then when trying sqlline from the bin directory it starts up, I run the
> same employee query, it hangs and then starts togive log4j errors. See
> attached screenshot "sqlline_hangs_then_gives_log4j_error"
>
> 6) Finally the sqlline bat file works under the Windows command prompt and
> returns results when querying employee.json (but no other file) but I
> cannot exit despite entering the !quit command. see attached screenshot:
> "Unable_to_exit_sqlline_when_running_from_Windows_command_promtp"
>
> Please note I've gone through the documentation and believe I've followed
> the instructions as best I can and managed to solve a few issues by
> searching online. However, even after trying towatch some Drill youtube
> videos and looking on forums like stackoverflow still can't it to work as
> nicely as the tutes say and videos demonstrated.
>
> Basically all we want to do is flatten out a nested JSON file and Drill
> seems far more suited to this task - if only we can get it working properly
> - than coding Python or Perl scripts. THe csvkit (in2csv) works fine but
> not for nested JSON data which is our use case.
>
> One final question - how can one 'spool' results to a file?
>
> We've tried giving it a go under Windows, Mac OS and also cygwin but each
> time have failed somewhere despite trying to follow the documentation and
> tutorial material to the best of our ability.
>
> It just looks soooooo much easier reaing the online doco then actually
> trying to implement.
>
> We'd gratefully welcome some feedback on the aforementioned teething
> issues we're experiencing as we'd certainly like to fly the Apache Drill
> flag.
>
> thanks and regards,
> Rob.
>