Is the table partitioned based on date to store incremental data? If yes, you can try the steps I mentioned only in the partition that is messed up.
Best regards, Ruilong Huo On Thu, Apr 7, 2016 at 9:16 PM, hawqstudy <[email protected]> wrote: > well, there were already tons of data in the table, like billion records. > today our guy tried to insert daily incremental into the table they messed > something up and made the insertion happened twice. Each incremental was > about several millions. We hope there’s some way to revert what we did > today ( there’s date column for each record ) or just delete those rows. > Cheers… > > 在 2016年4月7日,下午9:02,Ruilong Huo <[email protected]> 写道: > > You mentioned that there are millions of rows. I guess it is not a big > number that make it taking forever:) > What's the # rows by estimation? What's the data size in raw indeed? > > Best regards, > Ruilong Huo > > On Thu, Apr 7, 2016 at 6:54 PM, hawqstudy <[email protected]> wrote: > >> uhh… there are already tons of data in the table, it will take forever to >> do that… >> any better ways? >> i just want to completely get rid of the last insert batch, can I >> directly delete some subdirectories in HDFS in order to do that? >> >> 在 2016年4月7日,下午6:41,Ruilong Huo <[email protected]> 写道: >> >> You may try with below steps to de-duplicate the data in table t: >> 1. insert into new_t select distinct tuple from t >> 2. drop table t >> 3. rename new_t to t >> >> Best regards, >> Ruilong Huo >> >> On Thu, Apr 7, 2016 at 6:36 PM, hawqstudy <[email protected]> wrote: >> >>> I inserted millions of rows into an existing table but found there’s >>> duplication. I would like to delete the rows based on the date column but >>> found there’s no delete command in Hawk. >>> >>> Is there any way I can remove the records without rebuilding the table >>> from scratch? >>> >> >> >> > >
