RE: Repair daily refreshed table

2018-08-20 Thread Per Otterström
Hi Maxim.

Assuming all your update operations are successful and that you only delete 
data by TTL in that table, then you shouldn’t have to do repairs on it.

You may also consider to lower the gc_grace_seconds value on that table, but 
you should be aware of how this impacts hints and logged batches: 
https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlCreateTable.html#tabProp__cqlTableGc_grace_seconds

/pelle

From: Maxim Parkachov 
Sent: den 20 augusti 2018 08:29
To: user@cassandra.apache.org
Subject: Re: Repair daily refreshed table

Hi Raul,

I cannot afford delete and then load as this will create downtime for the 
record, that's why I'm upserting with TTL today()+7days as I mentioted in my 
original question. And at the moment I don't have an issue either with loading 
nor with access times. My question is should I repair such table or not and if 
yes before load or after (or it doesn't matter) ?

Thanks,
Maxim.

On Sun, Aug 19, 2018 at 8:52 AM Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
If you wanted to be certain that all replicas were acknowledging receipt of the 
data, then you could use ALL or EACH_QUORUM ( if you have multiple DCs) but you 
must really want high consistency if you do that.

You should avoid consciously creating tombstones if possible — it ends up 
making reads slower because they need to be accounted for until they are 
compacted / garbage collected out.

Tombstones are created when data is either deleted, or nulled. When marking 
data with a TTL , the actual delete is not done until after the TTL has expired.

When you say you are overwriting, are you deleting and then loading? That’s the 
only way you should see tombstones — or maybe you are setting nulls?

Rahul
On Aug 18, 2018, 11:16 PM -0700, Maxim Parkachov 
mailto:lazy.gop...@gmail.com>>, wrote:
Hi Rahul,

I'm already using LOCAL_QUORUM in batch process and it runs every day. As far 
as I understand, because I'm overwriting whole table with new TTL, process 
creates tons of thumbstones and I'm more concerned with them.

Regards,
Maxim.
On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
Are you loading using a batch process? What’s the frequency of the data Ingest 
and does it have to very fast. If not too frequent and can be a little slower, 
you may consider a higher consistency to ensure data is on replicas.

Rahul
On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov 
mailto:lazy.gop...@gmail.com>>, wrote:
Hi community,

I'm currently puzzled with following challenge. I have a CF with 7 days TTL on 
all rows. Daily there is a process which loads actual data with +7 days TTL. 
Thus records which are not present in last 7 days of load expired. Amount of 
these expired records are very small < 1%. I have daily repair process, which 
take considerable amount of time and resources, and snapshot after that. 
Obviously I'm concerned only with the last loaded data. Basically, my question: 
should I run repair before load, after load or maybe I don't need to repair 
such table at all ?

Regards,
Maxim.


Re: Repair daily refreshed table

2018-08-20 Thread Maxim Parkachov
Hi Raul,

I cannot afford delete and then load as this will create downtime for the
record, that's why I'm upserting with TTL today()+7days as I mentioted in
my original question. And at the moment I don't have an issue either with
loading nor with access times. My question is should I repair such table or
not and if yes before load or after (or it doesn't matter) ?

Thanks,
Maxim.

On Sun, Aug 19, 2018 at 8:52 AM Rahul Singh 
wrote:

> If you wanted to be certain that all replicas were acknowledging receipt
> of the data, then you could use ALL or EACH_QUORUM ( if you have multiple
> DCs) but you must really want high consistency if you do that.
>
> You should avoid consciously creating tombstones if possible — it ends up
> making reads slower because they need to be accounted for until they are
> compacted / garbage collected out.
>
> Tombstones are created when data is either deleted, or nulled. When
> marking data with a TTL , the actual delete is not done until after the TTL
> has expired.
>
> When you say you are overwriting, are you deleting and then loading?
> That’s the only way you should see tombstones — or maybe you are setting
> nulls?
>
> Rahul
> On Aug 18, 2018, 11:16 PM -0700, Maxim Parkachov ,
> wrote:
>
> Hi Rahul,
>
> I'm already using LOCAL_QUORUM in batch process and it runs every day. As
> far as I understand, because I'm overwriting whole table with new TTL,
> process creates tons of thumbstones and I'm more concerned with them.
>
> Regards,
> Maxim.
>
> On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh 
> wrote:
>
>> Are you loading using a batch process? What’s the frequency of the data
>> Ingest and does it have to very fast. If not too frequent and can be a
>> little slower, you may consider a higher consistency to ensure data is on
>> replicas.
>>
>> Rahul
>> On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov ,
>> wrote:
>>
>> Hi community,
>>
>> I'm currently puzzled with following challenge. I have a CF with 7 days
>> TTL on all rows. Daily there is a process which loads actual data with +7
>> days TTL. Thus records which are not present in last 7 days of load
>> expired. Amount of these expired records are very small < 1%. I have daily
>> repair process, which take considerable amount of time and resources, and
>> snapshot after that. Obviously I'm concerned only with the last loaded
>> data. Basically, my question: should I run repair before load, after load
>> or maybe I don't need to repair such table at all ?
>>
>> Regards,
>> Maxim.
>>
>>


Re: Repair daily refreshed table

2018-08-19 Thread Rahul Singh
If you wanted to be certain that all replicas were acknowledging receipt of the 
data, then you could use ALL or EACH_QUORUM ( if you have multiple DCs) but you 
must really want high consistency if you do that.

You should avoid consciously creating tombstones if possible — it ends up 
making reads slower because they need to be accounted for until they are 
compacted / garbage collected out.

Tombstones are created when data is either deleted, or nulled. When marking 
data with a TTL , the actual delete is not done until after the TTL has expired.

When you say you are overwriting, are you deleting and then loading? That’s the 
only way you should see tombstones — or maybe you are setting nulls?

Rahul
On Aug 18, 2018, 11:16 PM -0700, Maxim Parkachov , wrote:
> Hi Rahul,
>
> I'm already using LOCAL_QUORUM in batch process and it runs every day. As far 
> as I understand, because I'm overwriting whole table with new TTL, process 
> creates tons of thumbstones and I'm more concerned with them.
>
> Regards,
> Maxim.
>
> > On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh  
> > wrote:
> > > Are you loading using a batch process? What’s the frequency of the data 
> > > Ingest and does it have to very fast. If not too frequent and can be a 
> > > little slower, you may consider a higher consistency to ensure data is on 
> > > replicas.
> > >
> > > Rahul
> > > On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov , 
> > > wrote:
> > > > Hi community,
> > > >
> > > > I'm currently puzzled with following challenge. I have a CF with 7 days 
> > > > TTL on all rows. Daily there is a process which loads actual data with 
> > > > +7 days TTL. Thus records which are not present in last 7 days of load 
> > > > expired. Amount of these expired records are very small < 1%. I have 
> > > > daily repair process, which take considerable amount of time and 
> > > > resources, and snapshot after that. Obviously I'm concerned only with 
> > > > the last loaded data. Basically, my question: should I run repair 
> > > > before load, after load or maybe I don't need to repair such table at 
> > > > all ?
> > > >
> > > > Regards,
> > > > Maxim.


Re: Repair daily refreshed table

2018-08-19 Thread Maxim Parkachov
Hi Rahul,

I'm already using LOCAL_QUORUM in batch process and it runs every day. As
far as I understand, because I'm overwriting whole table with new TTL,
process creates tons of thumbstones and I'm more concerned with them.

Regards,
Maxim.

On Sun, Aug 19, 2018 at 3:02 AM Rahul Singh 
wrote:

> Are you loading using a batch process? What’s the frequency of the data
> Ingest and does it have to very fast. If not too frequent and can be a
> little slower, you may consider a higher consistency to ensure data is on
> replicas.
>
> Rahul
> On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov ,
> wrote:
>
> Hi community,
>
> I'm currently puzzled with following challenge. I have a CF with 7 days
> TTL on all rows. Daily there is a process which loads actual data with +7
> days TTL. Thus records which are not present in last 7 days of load
> expired. Amount of these expired records are very small < 1%. I have daily
> repair process, which take considerable amount of time and resources, and
> snapshot after that. Obviously I'm concerned only with the last loaded
> data. Basically, my question: should I run repair before load, after load
> or maybe I don't need to repair such table at all ?
>
> Regards,
> Maxim.
>
>


Re: Repair daily refreshed table

2018-08-18 Thread Rahul Singh
Are you loading using a batch process? What’s the frequency of the data Ingest 
and does it have to very fast. If not too frequent and can be a little slower, 
you may consider a higher consistency to ensure data is on replicas.

Rahul
On Aug 18, 2018, 2:29 AM -0700, Maxim Parkachov , wrote:
> Hi community,
>
> I'm currently puzzled with following challenge. I have a CF with 7 days TTL 
> on all rows. Daily there is a process which loads actual data with +7 days 
> TTL. Thus records which are not present in last 7 days of load expired. 
> Amount of these expired records are very small < 1%. I have daily repair 
> process, which take considerable amount of time and resources, and snapshot 
> after that. Obviously I'm concerned only with the last loaded data. 
> Basically, my question: should I run repair before load, after load or maybe 
> I don't need to repair such table at all ?
>
> Regards,
> Maxim.


Repair daily refreshed table

2018-08-18 Thread Maxim Parkachov
Hi community,

I'm currently puzzled with following challenge. I have a CF with 7 days TTL
on all rows. Daily there is a process which loads actual data with +7 days
TTL. Thus records which are not present in last 7 days of load expired.
Amount of these expired records are very small < 1%. I have daily repair
process, which take considerable amount of time and resources, and snapshot
after that. Obviously I'm concerned only with the last loaded data.
Basically, my question: should I run repair before load, after load or
maybe I don't need to repair such table at all ?

Regards,
Maxim.