Keep the original document base table, but then the query table should have the PK as last_modified, docId, with last_modified descending, so that a query can get the n most recently modified documents.
Yes, you still need to manually delete the old entry for the document in the query table if duplicates are a problem for you. Yeah, a TTL would be good if you don't care about documents modified a month or a week ago. -- Jack Krupansky On Tue, Jul 21, 2015 at 11:13 AM, Carlos Alonso <i...@mrcalonso.com> wrote: > Hi Robert, > > What about modelling it as a time serie? > > CREATE TABLE document ( > docId UUID, > doc TEXT, > last_modified TIMESTAMP > PRIMARY KEY(docId, last_modified) > ) WITH CLUSTERING ORDER BY (last_modified DESC); > > This way, you the lastest modification will always be the first record in > the row, therefore accessing it should be as easy as: > > SELECT * FROM document WHERE docId == <the docId> LIMIT 1; > > And, if you experience diskspace issues due to very long rows, then you > can always expire old ones using TTL or on a batch job. Tombstones will > never be a problem in this case as, due to the specified clustering order, > the latest modification will always be first record in the row. > > Hope it helps. > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > On 21 July 2015 at 05:59, Robert Wille <rwi...@fold3.com> wrote: > >> Data structures that have a recently-modified access pattern seem to be a >> poor fit for Cassandra. I’m wondering if any of you smart guys can provide >> suggestions. >> >> For the sake of discussion, lets assume I have the following tables: >> >> CREATE TABLE document ( >> docId UUID, >> doc TEXT, >> last_modified TIMEUUID, >> PRIMARY KEY ((docid)) >> ) >> >> CREATE TABLE doc_by_last_modified ( >> date TEXT, >> last_modified TIMEUUID, >> docId UUID, >> PRIMARY KEY ((date), last_modified) >> ) >> >> When I update a document, I retrieve its last_modified time, delete the >> current record from doc_by_last_modified, and add a new one. Unfortunately, >> if you’d like each document to appear at most once in the >> doc_by_last_modified table, then this doesn’t work so well. >> >> Documents can get into the doc_by_last_modified table multiple times if >> there is concurrent access, or if there is a consistency issue. >> >> Any thoughts out there on how to efficiently provide recently-modified >> access to a table? This problem exists for many types of data structures, >> not just recently-modified. Any ordered data structure that can be >> dynamically reordered suffers from the same problems. As I’ve been doing >> schema design, this pattern keeps recurring. A nice way to address this >> problem has lots of applications. >> >> Thanks in advance for your thoughts >> >> Robert >> >> >