Is the table partitioned based on date to store incremental data? If yes,
you can try the steps I mentioned only in the partition that is messed up.

Best regards,
Ruilong Huo

On Thu, Apr 7, 2016 at 9:16 PM, hawqstudy <[email protected]> wrote:

> well, there were already tons of data in the table, like billion records.
> today our guy tried to insert daily incremental into the table they messed
> something up and made the insertion happened twice. Each incremental was
> about several millions. We hope there’s some way to revert what we did
> today ( there’s date column for each record ) or just delete those rows.
> Cheers…
>
> 在 2016年4月7日,下午9:02,Ruilong Huo <[email protected]> 写道:
>
> You mentioned that there are millions of rows. I guess it is not a big
> number that make it taking forever:)
> What's the # rows by estimation? What's the data size in raw indeed?
>
> Best regards,
> Ruilong Huo
>
> On Thu, Apr 7, 2016 at 6:54 PM, hawqstudy <[email protected]> wrote:
>
>> uhh… there are already tons of data in the table, it will take forever to
>> do that…
>> any better ways?
>> i just want to completely get rid of the last insert batch, can I
>> directly delete some subdirectories in HDFS in order to do that?
>>
>> 在 2016年4月7日,下午6:41,Ruilong Huo <[email protected]> 写道:
>>
>> You may try with below steps to de-duplicate the data in table t:
>> 1. insert into new_t select distinct tuple from t
>> 2. drop table t
>> 3. rename new_t to t
>>
>> Best regards,
>> Ruilong Huo
>>
>> On Thu, Apr 7, 2016 at 6:36 PM, hawqstudy <[email protected]> wrote:
>>
>>> I inserted millions of rows into an existing table but found there’s
>>> duplication. I would like to delete the rows based on the date column but
>>> found there’s no delete command in Hawk.
>>>
>>> Is there any way I can remove the records without rebuilding the table
>>> from scratch?
>>>
>>
>>
>>
>
>

Reply via email to