Hi Gayathri,

Please provide a few more details. Drill handles JSON in many forms; you did 
not indicate which form you are trying to use.

The most common form of JSON is in a file, with each record as an object. This 
is not true JSON, but is a very common format. Example:

{a: 10, b: "fred"}{a: 20, b: "Wilma"}
Drill also handles JSON as one big array. This is true JSON, but seems to not 
occur very often in practice:

[ {a: 10, b: "fred"}.  {a: 20, b: "Wilma"} ]

In both cases, the JSON parser which Drill uses correctly parses one record at 
a time. That is, even if we had a file with 10 billion records, Drill reads 
them one at a time. Drill batches up data into "record batches"of something 
like a few thousand records; so the amount of memory used to hold the data is 
independent of file size on disk.

The doc link you listed is interesting; I don't believe it is actually true. 
The passage says "Drill cannot manage lengthy JSON objects, such as a gigabit 
JSON file. Finding the beginning and end of records can be time consuming and 
require scanning the whole file." This is plain wrong. Drill does not attempt 
to split JSON into blocks: each file is read by a single "record reader" which 
starts at byte 0 and works its way to the end, one byte at a time, as described 
above. It is true that Drill cannot parallelize reads of huge JSON files, but 
the (single-threaded) read should work.

You cited an error about a string being too long. You also mentioned not 
storing JSON in the file system. Are you trying to read JSON from some other 
data source as a big long string? Can you provide a bit more details?

Thanks,
- Paul

 

    On Friday, August 10, 2018, 11:48:50 AM PDT, Gayathri Selvaraj 
<gayathri.selvar...@gmail.com> wrote:  
 
 Hi Team,

I am using Apache drill to query JSON files. The size of JSON file which am
having is more than a GB. Because of that, Apache drill is throwing error
saying "string is too long".

In the following link, I learnt that Apache drill currently do not support
lengthy JSON (
https://drill.apache.org/docs/json-data-model/#lengthy-json-objects).

According to my requirement, I should not store the JSON in File system. It
should be in memory only.

Do you have any work around for this? Any solution is really appreciated.

Expecting a quick response from you.

Thanks,
Gayathri.
  

Reply via email to