You mentioned that there are millions of rows. I guess it is not a big
number that make it taking forever:)
What's the # rows by estimation? What's the data size in raw indeed?

Best regards,
Ruilong Huo

On Thu, Apr 7, 2016 at 6:54 PM, hawqstudy <[email protected]> wrote:

> uhh… there are already tons of data in the table, it will take forever to
> do that…
> any better ways?
> i just want to completely get rid of the last insert batch, can I directly
> delete some subdirectories in HDFS in order to do that?
>
> 在 2016年4月7日,下午6:41,Ruilong Huo <[email protected]> 写道:
>
> You may try with below steps to de-duplicate the data in table t:
> 1. insert into new_t select distinct tuple from t
> 2. drop table t
> 3. rename new_t to t
>
> Best regards,
> Ruilong Huo
>
> On Thu, Apr 7, 2016 at 6:36 PM, hawqstudy <[email protected]> wrote:
>
>> I inserted millions of rows into an existing table but found there’s
>> duplication. I would like to delete the rows based on the date column but
>> found there’s no delete command in Hawk.
>>
>> Is there any way I can remove the records without rebuilding the table
>> from scratch?
>>
>
>
>

Reply via email to