Thankyou Jason for ur detailed answer. Will try to use the Flatten on data column and let u know the status.
Error message got from ODBC is "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query: SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT 100 [30024]Query execution error. Details:[ Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ] ] " Is there any way to normalise or convert this nested data to simpler JSON so that i can play with DRILL? *RegardsMuthupandi.K* Think before you print. On Thu, Apr 2, 2015 at 9:23 PM, Jason Altekruse <[email protected]> wrote: > To answer Andries' question, with an enhancement in the 0.8 release, there > should be no hard limit on the size of Drill records supported. That being > said, Drill is not fundamentally set up for processing enormous rows, so we > do not have a clear idea of the performance impact of working with such > datasets. > > This document is going to be read as a single record originally, and I > think the 0.8 release should be able to read it in. From there, flatten > should be able to produce individual records suitable for further analysis, > these records will be be a more reasonable size and get you good > performance for further analysis. > > -Jason > > On Thu, Apr 2, 2015 at 8:49 AM, Jason Altekruse <[email protected]> > wrote: > > > Hi Muthu, > > > > Welcome to the Drill community! > > > > Unfortunately the mailing list does not allow attachments, please send > > along the error log copied into a mail message. > > > > If you are working with the 0.7 version of Drill, I would recommend > > upgrading the the new 0.8 release that just came out, there were a lot of > > bug fixes and enhancements in the release. > > > > We're glad to hear you have been successful with your previous efforts > > with Drill. Unfortunately Drill is not well suited fro exploring datasets > > like the one you have linked to. By default Drill supports records of the > > format accepted by Mongo DB for bulk import, where individual records > take > > the form of a JSON object. > > > > Looking at this dataset, it follows a pattern we have seen before, but > > currently are not well suited for working with in Drill. All of the data > is > > in a single JSON object, at the top of the object are a number of > > dataset-wide metadata fields. These are all nested under a field "view", > > with the main data I am guessing you want to analyze nested under the > field > > "data" in an array. While this format is not ideal for Drill, with the > size > > of the dataset you might be able to get it working with an operator in > > Drill that could help make the data more accessible. > > > > The operator is called flatten, and is designed to take an array and > > produce individual records for each element in the array. Optionally > other > > fields from the record can be included alongside each of the newly > spawned > > records to maintain a relationship between the incoming fields in the > > output of flatten. > > > > For more info on flatten, see this page in the wiki: > > https://cwiki.apache.org/confluence/display/DRILL/FLATTEN+Function > > > > For this dataset, you might be able to get access to the data simply by > > running the following: > > > > select flatten(data) from dfs.`/path/to/file.json`; > > > > If you need to have access to some of the other fields from the top of > the > > dataset, you can include them alongside flatten and they will be copied > > into each record produced by the flatten operation: > > > > select flatten(data), view.id, view.category from > > dfs.`/path/to/file.json`; > > > > > > > > On Wed, Apr 1, 2015 at 10:52 PM, Muthu Pandi <[email protected]> > wrote: > > > >> Hi All > >> > >> > >> Am new to the JSON format and exploring the same. I had used > >> Drill to analyse simple JSON files which work like a charm, but am not > able > >> to load the this " > >> > https://opendata.socrata.com/api/views/n2rk-fwkj/rows.json?accessType=DOWNLOAD > " > >> JSON file for analysis. > >> > >> Am using ODBC connector to connect to the 0.8 Drill. Kindly find the > >> attachment for the error. > >> > >> > >> > >> *RegardsMuthupandi.K* > >> > >> Think before you print. > >> > >> > >> > > >
