well, there were already tons of data in the table, like billion records. today 
our guy tried to insert daily incremental into the table they messed something 
up and made the insertion happened twice. Each incremental was about several 
millions. We hope there’s some way to revert what we did today ( there’s date 
column for each record ) or just delete those rows.
Cheers…

> 在 2016年4月7日,下午9:02,Ruilong Huo <[email protected]> 写道:
> 
> You mentioned that there are millions of rows. I guess it is not a big number 
> that make it taking forever:)
> What's the # rows by estimation? What's the data size in raw indeed?
> 
> Best regards,
> Ruilong Huo
> 
> On Thu, Apr 7, 2016 at 6:54 PM, hawqstudy <[email protected] 
> <mailto:[email protected]>> wrote:
> uhh… there are already tons of data in the table, it will take forever to do 
> that…
> any better ways?
> i just want to completely get rid of the last insert batch, can I directly 
> delete some subdirectories in HDFS in order to do that?
> 
>> 在 2016年4月7日,下午6:41,Ruilong Huo <[email protected] <mailto:[email protected]>> 写道:
>> 
>> You may try with below steps to de-duplicate the data in table t:
>> 1. insert into new_t select distinct tuple from t
>> 2. drop table t
>> 3. rename new_t to t
>> 
>> Best regards,
>> Ruilong Huo
>> 
>> On Thu, Apr 7, 2016 at 6:36 PM, hawqstudy <[email protected] 
>> <mailto:[email protected]>> wrote:
>> I inserted millions of rows into an existing table but found there’s 
>> duplication. I would like to delete the rows based on the date column but 
>> found there’s no delete command in Hawk.
>> 
>> Is there any way I can remove the records without rebuilding the table from 
>> scratch?
>> 
> 
> 

Reply via email to