Re: Getting partition min/max timestamp

2018-01-16 Thread Brian Hess
Jeremiah, this might be the exception, since the value that is being aggregated is exactly the same value that determines liveliness of the data, and more so since the aggregation requested is the *max* of the timestamp, given that Cassandra is a Last-Write-Wins (so, looks at the maximum

Re: Getting partition min/max timestamp

2018-01-14 Thread Benedict Elliott Smith
It's a long time since I looked at the code, but I'm pretty sure that comment is explaining why we translate *no* timestamp to *epoch*, to save space when serializing the encoding stats. Not stipulating that the data may be inaccurate. However, being such a long time since I looked, I forgot we

Re: Getting partition min/max timestamp

2018-01-14 Thread Jeremiah Jordan
Finding the max timestamp of a partition is an aggregation. Doing that calculation purely on the replica (wether pre-calculated or not) is problematic for any CL > 1 in the face of deletions or update that are missing. As the contents of the partition on a given replica are different than what

Re: Getting partition min/max timestamp

2018-01-14 Thread arhel...@gmail.com
First of all, thx for all the ideas. Benedict ElIiott Smith, in code comments I found a notice that data in EncodingStats can be wrong, not sure that its good idea to use it for accurate results. As I understand incorrect data is not a problem for the current use case of it, but not for my

Re: Getting partition min/max timestamp

2018-01-14 Thread Benedict Elliott Smith
(Obviously, not to detract from the points that Jon and Jeremiah make, i.e. that if TTLs or tombstones are involved the metadata we have, or can add, is going to be worthless in most cases anyway) On 14 January 2018 at 16:11, Benedict Elliott Smith wrote: > We already store

Re: Getting partition min/max timestamp

2018-01-14 Thread Benedict Elliott Smith
We already store the minimum timestamp in the EncodingStats of each partition, to support more efficient encoding of atom timestamps. This just isn't exposed beyond UnfilteredRowIterator, though it probably could be. Storing the max alongside would still require justification, though its cost

Re: Getting partition min/max timestamp

2018-01-14 Thread Jeremiah Jordan
Don’t forget about deleted and missing data. The bane of all on replica aggregation optimization’s. > On Jan 14, 2018, at 12:07 AM, Jeff Jirsa wrote: > > > You’re right it’s not stored in metadata now. Adding this to metadata isn’t > hard, it’s just hard to do it right

Re: Getting partition min/max timestamp

2018-01-13 Thread Jeff Jirsa
You’re right it’s not stored in metadata now. Adding this to metadata isn’t hard, it’s just hard to do it right where it’s useful to people with other data models (besides yours) so it can make it upstream (if that’s your goal). In particular the worst possible case is a table with no

Re: Getting partition min/max timestamp

2018-01-13 Thread Jonathan Haddad
Do you need to support TTLs? That might be a bit of an issue. On Sat, Jan 13, 2018 at 12:41 PM Arthur Kushka wrote: > Hi folks, > > Currently, I working on custom CQL operator that should return the max > timestamp for some partition. > > I don't think that scanning of