Hey Christian,

It sounds like you're looking for bulk insert into that also is
transactional from the external view of the table. Is that correct?

I'd like to spend some time against this as I think we can provide
something useful in a reasonable amount of time but first wanted to make
sure I understand correctly.

I also noticed that you filed DRILL-4506 (I think). Is this related to what
you are doing here or is that for independent use case? Can you give a
little more overview of the pattern you're trying to achieve with that?

thanks,
Jacques

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Mar 14, 2016 at 1:14 PM, Christian Hellström <[email protected]>
wrote:

> So, the data is in HDFS, from where I need to transform it into a structure
> that is appropriate for visualization, which I do with a set of views.
>
> Every day I want to pick up the newest files and pack them into Parquet
> files, so that our BI tool runs a bit snappier, because running everything
> off JSON files that are accessed through views is too slow. While new data
> is inserted, a process over which I have no control, I UNION ALL the
> previous days' data with the current day's data.
>
> I would normally write an INSERT INTO to run every night that only takes
> the recent data. Since this is not supported I'm curious to see how else I
> can solve this. As I said, a CTAS on all the existing data plus the latest
> additions is not viable performance-wise, apart from the fact that it's an
> ugly solution. Similarly, doing a CTAS to a dummy table followed by a copy
> to the right directory is a hack that I don't consider acceptable for
> production purposes.
>
> Any thoughts are greatly appreciated!
>

Reply via email to