Hi Stefán,

Yes, I'm considering this option now (while there is no other better options).

Faced some limitation though. You can not query on directory when
schema between files different.

Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support
schema changes


On Fri, Dec 9, 2016 at 9:26 AM, Stefán Baxter <[email protected]> wrote:
> Hi,
>
> Have you considered batching them up into a nicely defined directory
> structure and use directory pruning as part of your queries?
>
> I ask because our batch processes does that. Data is arranged into Hour,
> Day, Month, Quarter, Years structures (which we then roll-up in different
> ways, based on volume (from H->*->Y)).
> We then use simple directory pruning to decide what data is applicable for
> each query.
>
> Hope this helps,
>  -Stefán
>
> On Thu, Dec 8, 2016 at 5:13 PM, Alexander Reshetov <
> [email protected]> wrote:
>
>> By the way, is it possible to append data to parquet data source?
>> I'm looking for possibility to update (append to) existing data new
>> rows so every query execution will have new data rows.
>>
>> Surely it's possible with plain JSON, but I want more efficient binary
>> format which will give quicker reads (and executions of queries).
>>
>> On Wed, Dec 7, 2016 at 4:08 PM, Alexander Reshetov
>> <[email protected]> wrote:
>> > Hello,
>> >
>> > I want to load batches of unstructured data in Drill. Mostly JSON data.
>> >
>> > Is there any batch API or other options to do so?
>> >
>> >
>> > Thanks.
>>

Reply via email to