Hi Stefán, Yes, I'm considering this option now (while there is no other better options).
Faced some limitation though. You can not query on directory when schema between files different. Error: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes On Fri, Dec 9, 2016 at 9:26 AM, Stefán Baxter <[email protected]> wrote: > Hi, > > Have you considered batching them up into a nicely defined directory > structure and use directory pruning as part of your queries? > > I ask because our batch processes does that. Data is arranged into Hour, > Day, Month, Quarter, Years structures (which we then roll-up in different > ways, based on volume (from H->*->Y)). > We then use simple directory pruning to decide what data is applicable for > each query. > > Hope this helps, > -Stefán > > On Thu, Dec 8, 2016 at 5:13 PM, Alexander Reshetov < > [email protected]> wrote: > >> By the way, is it possible to append data to parquet data source? >> I'm looking for possibility to update (append to) existing data new >> rows so every query execution will have new data rows. >> >> Surely it's possible with plain JSON, but I want more efficient binary >> format which will give quicker reads (and executions of queries). >> >> On Wed, Dec 7, 2016 at 4:08 PM, Alexander Reshetov >> <[email protected]> wrote: >> > Hello, >> > >> > I want to load batches of unstructured data in Drill. Mostly JSON data. >> > >> > Is there any batch API or other options to do so? >> > >> > >> > Thanks. >>
