Re: How to delete data from table?

Ruilong Huo Thu, 07 Apr 2016 06:24:19 -0700

Is the table partitioned based on date to store incremental data? If yes,
you can try the steps I mentioned only in the partition that is messed up.


Best regards,
Ruilong Huo

On Thu, Apr 7, 2016 at 9:16 PM, hawqstudy <[email protected]> wrote:

> well, there were already tons of data in the table, like billion records.
> today our guy tried to insert daily incremental into the table they messed
> something up and made the insertion happened twice. Each incremental was
> about several millions. We hope there’s some way to revert what we did
> today ( there’s date column for each record ) or just delete those rows.
> Cheers…
>
> 在 2016年4月7日，下午9:02，Ruilong Huo <[email protected]> 写道：
>
> You mentioned that there are millions of rows. I guess it is not a big
> number that make it taking forever:)
> What's the # rows by estimation? What's the data size in raw indeed?
>
> Best regards,
> Ruilong Huo
>
> On Thu, Apr 7, 2016 at 6:54 PM, hawqstudy <[email protected]> wrote:
>
>> uhh… there are already tons of data in the table, it will take forever to
>> do that…
>> any better ways?
>> i just want to completely get rid of the last insert batch, can I
>> directly delete some subdirectories in HDFS in order to do that?
>>
>> 在 2016年4月7日，下午6:41，Ruilong Huo <[email protected]> 写道：
>>
>> You may try with below steps to de-duplicate the data in table t:
>> 1. insert into new_t select distinct tuple from t
>> 2. drop table t
>> 3. rename new_t to t
>>
>> Best regards,
>> Ruilong Huo
>>
>> On Thu, Apr 7, 2016 at 6:36 PM, hawqstudy <[email protected]> wrote:
>>
>>> I inserted millions of rows into an existing table but found there’s
>>> duplication. I would like to delete the rows based on the date column but
>>> found there’s no delete command in Hawk.
>>>
>>> Is there any way I can remove the records without rebuilding the table
>>> from scratch?
>>>
>>
>>
>>
>
>

Re: How to delete data from table?

Reply via email to