Re: Column Slice Query performance after deletions

Víctor Hugo Oliveira Molinar Mon, 04 Mar 2013 07:23:38 -0800

Guys, thank you very much.

For my case scenario, I'm gonna need to change a little bit my data model
by spliting my row nto N pieces. And implement a further control of it.
That will mitigate the problem.
Also, I'll try LeveledCompaction after.



Thanks!

On Mon, Mar 4, 2013 at 3:25 AM, aaron morton <aa...@thelastpickle.com>wrote:

> I need something to keep the deleted columns away from my query fetch. Not
> only the tombstones.
> It looks like the min compaction might help on this. But I'm not sure yet
> on what would be a reasonable value for its threeshold.
>
> Your tombstones will not be purged in a compaction until after gc_grace
> and only if all fragments of the row are in the compaction. You right that
> you would probably want to run repair during the day if you are going to
> dramatically reduce gc_grace to avoid deleted data coming back to life.
>
> If you are using a single cassandra row as a queue, you are going to have
> trouble. Levelled compaction may help a little.
>
> If you are reading the "most recent" entries in the row, assuming the
> columns are sorted by some time stamp. Use the Reverse Comparator and issue
> slice commands to get the first X cols. That will remove tombstones from
> the problem. (Am guessing this is not something you do, just mentioning
> it).
>
> You next option is to change the data model so you don't use the same row
> all day.
>
> After that, consider a message queue.
>
> Cheers
>
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 2/03/2013, at 12:03 PM, Víctor Hugo Oliveira Molinar <
> vhmoli...@gmail.com> wrote:
>
> Tombstones stay around until gc grace so you could lower that to see of
> that fixes the performance issues.
>
> If the tombstones get collected,the column will live again, causing data
> inconsistency since I cant run a repair during the regular operations. Not
> sure if I got your thoughts on this.
>
>
> Size tiered or leveled comparison?
>
>
> I'm actuallly running on Size Tiered Compaction, but I've been looking
> into changing it for Leveled. It seems to be the case.  Although even if I
> achieve some performance, I would still have the same problem with the
> deleted columns.
>
>
> I need something to keep the deleted columns away from my query fetch. Not
> only the tombstones.
> It looks like the min compaction might help on this. But I'm not sure yet
> on what would be a reasonable value for its threeshold.
>
>
> On Sat, Mar 2, 2013 at 4:22 PM, Michael Kjellman 
> <mkjell...@barracuda.com>wrote:
>
>> Tombstones stay around until gc grace so you could lower that to see of
>> that fixes the performance issues.
>>
>> Size tiered or leveled comparison?
>>
>> On Mar 2, 2013, at 11:15 AM, "Víctor Hugo Oliveira Molinar" <
>> vhmoli...@gmail.com> wrote:
>>
>> What is your gc_grace set to? Sounds like as the number of tombstones
>> records increase your performance decreases. (Which I would expect)
>>
>>
>> gr_grace is default.
>>
>>
>> Casandra's data files are write once. Deletes are another write. Until
>> compaction they all live on disk.Making really big rows has these problem.
>>
>> Oh, so it looks like I should lower the min_compaction_threshold for this
>> column family. Right?
>> What does realy mean this threeshold value?
>>
>>
>> Guys, thanks for the help so far.
>>
>> On Sat, Mar 2, 2013 at 3:42 PM, Michael Kjellman <mkjell...@barracuda.com
>> > wrote:
>>
>>> What is your gc_grace set to? Sounds like as the number of tombstones
>>> records increase your performance decreases. (Which I would expect)
>>>
>>> On Mar 2, 2013, at 10:28 AM, "Víctor Hugo Oliveira Molinar" <
>>> vhmoli...@gmail.com> wrote:
>>>
>>> I have a daily maintenance of my cluster where I truncate this column
>>> family. Because its data doesnt need to be kept more than a day.
>>> Since all the regular operations on it finishes around 4 hours before
>>> finishing the day. I regurlarly run a truncate on it followed by a repair
>>> at the end of the day.
>>>
>>> And every day, when the operations are started(when are only few deleted
>>> columns), the performance looks pretty well.
>>> Unfortunately it is degraded along the day.
>>>
>>>
>>> On Sat, Mar 2, 2013 at 2:54 PM, Michael Kjellman <
>>> mkjell...@barracuda.com> wrote:
>>>
>>>> When is the last time you did a cleanup on the cf?
>>>>
>>>> On Mar 2, 2013, at 9:48 AM, "Víctor Hugo Oliveira Molinar" <
>>>> vhmoli...@gmail.com> wrote:
>>>>
>>>> > Hello guys.
>>>> > I'm investigating the reasons of performance degradation for my case
>>>> scenario which follows:
>>>> >
>>>> > - I do have a column family which is filled of thousands of columns
>>>> inside a unique row(varies between 10k ~ 200k). And I do have also
>>>> thousands of rows, not much more than 15k.
>>>> > - This rows are constantly updated. But the write-load is not that
>>>> intensive. I estimate it as 100w/sec in the column family.
>>>> > - Each column represents a message which is read and processed by
>>>> another process. After reading it, the column is marked for deletion in
>>>> order to keep it out from the next query on this row.
>>>> >
>>>> > Ok, so, I've been figured out that after many insertions plus
>>>> deletion updates, my queries( column slice query ) are taking more time to
>>>> be performed. Even if there are only few columns, lower than 100.
>>>> >
>>>> > So it looks like that the longer is the number of columns being
>>>> deleted, the longer is the time spent for a query.
>>>> > -> Internally at C*, does column slice query ranges among deleted
>>>> columns?
>>>> > If so, how can I mitigate the impact in my queries? Or, how can I
>>>> avoid those deleted columns?
>>>>
>>>> Copy, by Barracuda, helps you store, protect, and share all your amazing
>>>> things. Start today: www.copy.com.
>>>>
>>>
>>>
>>> ----------------------------------
>>> Copy, by Barracuda, helps you store, protect, and share all your amazing
>>> things. Start today: www.copy.com <http://www.copy.com/?a=em_footer>.
>>>   
>>>
>>
>>
>> ----------------------------------
>> Copy, by Barracuda, helps you store, protect, and share all your amazing
>> things. Start today: www.copy.com <http://www.copy.com/?a=em_footer>.
>>   
>>
>
>
>

Re: Column Slice Query performance after deletions

Reply via email to