You mentioned that there are millions of rows. I guess it is not a big number that make it taking forever:) What's the # rows by estimation? What's the data size in raw indeed?
Best regards, Ruilong Huo On Thu, Apr 7, 2016 at 6:54 PM, hawqstudy <[email protected]> wrote: > uhh… there are already tons of data in the table, it will take forever to > do that… > any better ways? > i just want to completely get rid of the last insert batch, can I directly > delete some subdirectories in HDFS in order to do that? > > 在 2016年4月7日,下午6:41,Ruilong Huo <[email protected]> 写道: > > You may try with below steps to de-duplicate the data in table t: > 1. insert into new_t select distinct tuple from t > 2. drop table t > 3. rename new_t to t > > Best regards, > Ruilong Huo > > On Thu, Apr 7, 2016 at 6:36 PM, hawqstudy <[email protected]> wrote: > >> I inserted millions of rows into an existing table but found there’s >> duplication. I would like to delete the rows based on the date column but >> found there’s no delete command in Hawk. >> >> Is there any way I can remove the records without rebuilding the table >> from scratch? >> > > >
