So as Insert into isn't really an option, I would love some feed back on options with Drill for ETL. I understand Drill is very good with raw text, at the same time, for a highly used table, the advantages of taking text data and "final" storing it as Parquet would be handy. I was thinking through some options, and I thought I'd lay them out here, partially to organize my thoughts, and partially to get feedback or look for other options.
We have data coming in on a regular basis, say every 30 minutes. It's quasi batched already, and we are looking to reduce the lag time of the data as much as possible. To start out with things, this data makes sense to have a "directory" partition based on the date. It can be fairly large data per data, and that will be the easiest to prune on. I "may" have this data in a format that is already loaded Parquet files from a Hadoop system (I think Impala) but I am not sure how well these files are formed, and Drill may have some advantage to loading them itself. So, with Drill, can I just copy the files into the "current" day partition and have it be good? Should I put subdirectories (dir1) with the time of the load or does that not give me anything? At the end of the day, is there anything in Drill like Hive where I can say "INSERT INTO table(partition=curday) select * from table where partition=curday that could read all the loaded Parquet files as a set and rewrite them, potentially optimized for queries (adding file level partitions via PARTITIONED BY to the directory level partitions?) Obviously without an INSERT INTO functionality , that wouldn't work, I could create table as a tmp partition like CREATE TABLE `tablename/dailyload` AS select * from `tablename` where dir0 = 'curday` If that completes, rm 'curday' and mv dailyload curday. That should work right? Will Drill work if I have all my days with only dir0 but the daily load has dir1? That may be a challenge, but maybe I don't have to have subdirectories for each hour. I will admit I am coming at this from a Hive perspective, but trying to explain my thinking to see if folks can point out other options or ideas in the Drill mindset. Thanks! John
