Re: Batch load of unstructured data in Drill

Alexander Reshetov Thu, 08 Dec 2016 22:59:15 -0800

Hi Stefán,

Yes, I'm considering this option now (while there is no other better options).


Faced some limitation though. You can not query on directory when
schema between files different.

Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support
schema changes


On Fri, Dec 9, 2016 at 9:26 AM, Stefán Baxter <[email protected]> wrote:
> Hi,
>
> Have you considered batching them up into a nicely defined directory
> structure and use directory pruning as part of your queries?
>
> I ask because our batch processes does that. Data is arranged into Hour,
> Day, Month, Quarter, Years structures (which we then roll-up in different
> ways, based on volume (from H->*->Y)).
> We then use simple directory pruning to decide what data is applicable for
> each query.
>
> Hope this helps,
>  -Stefán
>
> On Thu, Dec 8, 2016 at 5:13 PM, Alexander Reshetov <
> [email protected]> wrote:
>
>> By the way, is it possible to append data to parquet data source?
>> I'm looking for possibility to update (append to) existing data new
>> rows so every query execution will have new data rows.
>>
>> Surely it's possible with plain JSON, but I want more efficient binary
>> format which will give quicker reads (and executions of queries).
>>
>> On Wed, Dec 7, 2016 at 4:08 PM, Alexander Reshetov
>> <[email protected]> wrote:
>> > Hello,
>> >
>> > I want to load batches of unstructured data in Drill. Mostly JSON data.
>> >
>> > Is there any batch API or other options to do so?
>> >
>> >
>> > Thanks.
>>

Re: Batch load of unstructured data in Drill

Reply via email to