Re: append data to already existing table saved in parquet format

Divya Gehlot Wed, 26 Jul 2017 02:53:32 -0700

The data size is not big for every hour but  data size will grow with the
time say if I have data for 2 years and data is coming on hourly basis and
everytime creating the paruqet table is not the feasible solution .
Likewise for hive create the partition and insert the data into partition
accordingly .
Was lookiing for that kind of solution.
Does Drill provides that kind of functionalty ?


Thanks,
Divya


On 26 July 2017 at 15:04, Saurabh Mahapatra <saurabhmahapatr...@gmail.com>
wrote:

> I always recommend against using CTAS as a shortcut for a ETL type large
> workload. You will need to size your Drill cluster accordingly. Consider
> using Hive or Spark instead.
>
> What are the source file formats? For every hour, what is the size and the
> number of rows for that data? Are you doing any aggregations? And what is
> the lag between the streaming data and data available for analytics that
> you are willing to tolerate?
>
> On Tue, Jul 25, 2017 at 11:27 PM, rahul challapalli <
> challapallira...@gmail.com> wrote:
>
> > I am not aware of any clean way to do this. However if your data is
> > partitioned based on directories, then you can use the below hack which
> > leverages temporary tables [1]. Essentially, you backup your partition
> to a
> > temp table, then override it by taking the union of new partition data
> and
> > existing partition data. This way we are not over-writing the entire
> table.
> >
> > create temporary table mytable_2017 (col1, col2....)  as select col1,
> col2,
> > ......from mytable where dir0 = "2017";
> > drop table `mytable/2017`;
> > create table `mytable/2017` as
> >     select col1, col2 .........from new_partition_data
> >     union
> >     select col1, col2 ......... from mytable_2017;
> > drop table mytable_2017;
> >
> > Caveat : Temporary tables get dropped automatically if the session ends
> or
> > the drillbit crashes. In the above sequence, if the connection gets
> dropped
> > (there are known issues causing this) between the client and drillbit
> after
> > executing the "DROP" statement, then your partition data is lost forever.
> > And since drill doesn't support transactions, the mentioned approach is
> > dangerous.
> >
> > [1] https://drill.apache.org/docs/create-temporary-table-as-cttas/
> >
> >
> > On Tue, Jul 25, 2017 at 10:52 PM, Divya Gehlot <divya.htco...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > I am naive to Apache drill.
> > > As I have data coming in every hour , when I searched I couldnt find
> the
> > > insert into partition command in Apache drill.
> > > How can we insert data to particular partition without rewriting the
> > whole
> > >  data set ?
> > >
> > >
> > > Appreciate the help.
> > > Thanks,
> > > Divya
> > >
> >
>

Re: append data to already existing table saved in parquet format

Reply via email to