uhh… there are already tons of data in the table, it will take forever to do
that…
any better ways?
i just want to completely get rid of the last insert batch, can I directly
delete some subdirectories in HDFS in order to do that?
> 在 2016年4月7日,下午6:41,Ruilong Huo <[email protected]> 写道:
>
> You may try with below steps to de-duplicate the data in table t:
> 1. insert into new_t select distinct tuple from t
> 2. drop table t
> 3. rename new_t to t
>
> Best regards,
> Ruilong Huo
>
> On Thu, Apr 7, 2016 at 6:36 PM, hawqstudy <[email protected]
> <mailto:[email protected]>> wrote:
> I inserted millions of rows into an existing table but found there’s
> duplication. I would like to delete the rows based on the date column but
> found there’s no delete command in Hawk.
>
> Is there any way I can remove the records without rebuilding the table from
> scratch?
>