The data size is not big for every hour but data size will grow with the time say if I have data for 2 years and data is coming on hourly basis and everytime creating the paruqet table is not the feasible solution . Likewise for hive create the partition and insert the data into partition accordingly . Was lookiing for that kind of solution. Does Drill provides that kind of functionalty ?
Thanks, Divya On 26 July 2017 at 15:04, Saurabh Mahapatra <saurabhmahapatr...@gmail.com> wrote: > I always recommend against using CTAS as a shortcut for a ETL type large > workload. You will need to size your Drill cluster accordingly. Consider > using Hive or Spark instead. > > What are the source file formats? For every hour, what is the size and the > number of rows for that data? Are you doing any aggregations? And what is > the lag between the streaming data and data available for analytics that > you are willing to tolerate? > > On Tue, Jul 25, 2017 at 11:27 PM, rahul challapalli < > challapallira...@gmail.com> wrote: > > > I am not aware of any clean way to do this. However if your data is > > partitioned based on directories, then you can use the below hack which > > leverages temporary tables [1]. Essentially, you backup your partition > to a > > temp table, then override it by taking the union of new partition data > and > > existing partition data. This way we are not over-writing the entire > table. > > > > create temporary table mytable_2017 (col1, col2....) as select col1, > col2, > > ......from mytable where dir0 = "2017"; > > drop table `mytable/2017`; > > create table `mytable/2017` as > > select col1, col2 .........from new_partition_data > > union > > select col1, col2 ......... from mytable_2017; > > drop table mytable_2017; > > > > Caveat : Temporary tables get dropped automatically if the session ends > or > > the drillbit crashes. In the above sequence, if the connection gets > dropped > > (there are known issues causing this) between the client and drillbit > after > > executing the "DROP" statement, then your partition data is lost forever. > > And since drill doesn't support transactions, the mentioned approach is > > dangerous. > > > > [1] https://drill.apache.org/docs/create-temporary-table-as-cttas/ > > > > > > On Tue, Jul 25, 2017 at 10:52 PM, Divya Gehlot <divya.htco...@gmail.com> > > wrote: > > > > > Hi, > > > I am naive to Apache drill. > > > As I have data coming in every hour , when I searched I couldnt find > the > > > insert into partition command in Apache drill. > > > How can we insert data to particular partition without rewriting the > > whole > > > data set ? > > > > > > > > > Appreciate the help. > > > Thanks, > > > Divya > > > > > >