Maybe you could explain in more detail what you mean by recently modified documents, since that is precisely what I thought I suggested with descending ordering.
-- Jack Krupansky On Thu, Jul 23, 2015 at 3:40 PM, Robert Wille <rwi...@fold3.com> wrote: > Carlos’ suggestion (nor yours) didn’t didn’t provide a way to query > recently-modified documents. > > His updated suggestion provides a way to get recently-modified > documents, but not ordered. > > On Jul 22, 2015, at 4:19 PM, Jack Krupansky <jack.krupan...@gmail.com> > wrote: > > "No way to query recently-modified documents." > > I don't follow why you say that. I mean, that was the point of the data > model suggestion I proposed. Maybe you could clarify. > > I also wanted to mention that the new materialized view feature of > Cassandra 3.0 might handle this use case, including taking care of the > delete, automatically. > > > -- Jack Krupansky > > On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille <rwi...@fold3.com> wrote: > >> The time series doesn’t provide the access pattern I’m looking for. No >> way to query recently-modified documents. >> >> On Jul 21, 2015, at 9:13 AM, Carlos Alonso <i...@mrcalonso.com> wrote: >> >> Hi Robert, >> >> What about modelling it as a time serie? >> >> CREATE TABLE document ( >> docId UUID, >> doc TEXT, >> last_modified TIMESTAMP >> PRIMARY KEY(docId, last_modified) >> ) WITH CLUSTERING ORDER BY (last_modified DESC); >> >> This way, you the lastest modification will always be the first record >> in the row, therefore accessing it should be as easy as: >> >> SELECT * FROM document WHERE docId == <the docId> LIMIT 1; >> >> And, if you experience diskspace issues due to very long rows, then you >> can always expire old ones using TTL or on a batch job. Tombstones will >> never be a problem in this case as, due to the specified clustering order, >> the latest modification will always be first record in the row. >> >> Hope it helps. >> >> Carlos Alonso | Software Engineer | @calonso >> <https://twitter.com/calonso> >> >> On 21 July 2015 at 05:59, Robert Wille <rwi...@fold3.com> wrote: >> >>> Data structures that have a recently-modified access pattern seem to be >>> a poor fit for Cassandra. I’m wondering if any of you smart guys can >>> provide suggestions. >>> >>> For the sake of discussion, lets assume I have the following tables: >>> >>> CREATE TABLE document ( >>> docId UUID, >>> doc TEXT, >>> last_modified TIMEUUID, >>> PRIMARY KEY ((docid)) >>> ) >>> >>> CREATE TABLE doc_by_last_modified ( >>> date TEXT, >>> last_modified TIMEUUID, >>> docId UUID, >>> PRIMARY KEY ((date), last_modified) >>> ) >>> >>> When I update a document, I retrieve its last_modified time, delete the >>> current record from doc_by_last_modified, and add a new one. Unfortunately, >>> if you’d like each document to appear at most once in the >>> doc_by_last_modified table, then this doesn’t work so well. >>> >>> Documents can get into the doc_by_last_modified table multiple times if >>> there is concurrent access, or if there is a consistency issue. >>> >>> Any thoughts out there on how to efficiently provide recently-modified >>> access to a table? This problem exists for many types of data structures, >>> not just recently-modified. Any ordered data structure that can be >>> dynamically reordered suffers from the same problems. As I’ve been doing >>> schema design, this pattern keeps recurring. A nice way to address this >>> problem has lots of applications. >>> >>> Thanks in advance for your thoughts >>> >>> Robert >>> >>> >> >> > >