Re: Batch load of unstructured data in Drill

Stefán Baxter Thu, 08 Dec 2016 22:26:40 -0800

Hi,

Have you considered batching them up into a nicely defined directory
structure and use directory pruning as part of your queries?


I ask because our batch processes does that. Data is arranged into Hour,
Day, Month, Quarter, Years structures (which we then roll-up in different
ways, based on volume (from H->*->Y)).
We then use simple directory pruning to decide what data is applicable for
each query.

Hope this helps,
 -Stefán

On Thu, Dec 8, 2016 at 5:13 PM, Alexander Reshetov <
alexander.v.reshe...@gmail.com> wrote:

> By the way, is it possible to append data to parquet data source?
> I'm looking for possibility to update (append to) existing data new
> rows so every query execution will have new data rows.
>
> Surely it's possible with plain JSON, but I want more efficient binary
> format which will give quicker reads (and executions of queries).
>
> On Wed, Dec 7, 2016 at 4:08 PM, Alexander Reshetov
> <alexander.v.reshe...@gmail.com> wrote:
> > Hello,
> >
> > I want to load batches of unstructured data in Drill. Mostly JSON data.
> >
> > Is there any batch API or other options to do so?
> >
> >
> > Thanks.
>

Re: Batch load of unstructured data in Drill

Reply via email to