Keep the original document base table, but then the query table should have
the PK as last_modified, docId, with last_modified descending, so that a
query can get the n most recently modified documents.

Yes, you still need to manually delete the old entry for the document in
the query table if duplicates are a problem for you.

Yeah, a TTL would be good if you don't care about documents modified a
month or a week ago.

-- Jack Krupansky

On Tue, Jul 21, 2015 at 11:13 AM, Carlos Alonso <i...@mrcalonso.com> wrote:

> Hi Robert,
>
> What about modelling it as a time serie?
>
> CREATE TABLE document (
>   docId UUID,
>   doc TEXT,
>   last_modified TIMESTAMP
>   PRIMARY KEY(docId, last_modified)
> ) WITH CLUSTERING ORDER BY (last_modified DESC);
>
> This way, you the lastest modification will always be the first record in
> the row, therefore accessing it should be as easy as:
>
> SELECT * FROM document WHERE docId == <the docId> LIMIT 1;
>
> And, if you experience diskspace issues due to very long rows, then you
> can always expire old ones using TTL or on a batch job. Tombstones will
> never be a problem in this case as, due to the specified clustering order,
> the latest modification will always be first record in the row.
>
> Hope it helps.
>
> Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>
>
> On 21 July 2015 at 05:59, Robert Wille <rwi...@fold3.com> wrote:
>
>> Data structures that have a recently-modified access pattern seem to be a
>> poor fit for Cassandra. I’m wondering if any of you smart guys can provide
>> suggestions.
>>
>> For the sake of discussion, lets assume I have the following tables:
>>
>> CREATE TABLE document (
>>         docId UUID,
>>         doc TEXT,
>>         last_modified TIMEUUID,
>>         PRIMARY KEY ((docid))
>> )
>>
>> CREATE TABLE doc_by_last_modified (
>>         date TEXT,
>>         last_modified TIMEUUID,
>>         docId UUID,
>>         PRIMARY KEY ((date), last_modified)
>> )
>>
>> When I update a document, I retrieve its last_modified time, delete the
>> current record from doc_by_last_modified, and add a new one. Unfortunately,
>> if you’d like each document to appear at most once in the
>> doc_by_last_modified table, then this doesn’t work so well.
>>
>> Documents can get into the doc_by_last_modified table multiple times if
>> there is concurrent access, or if there is a consistency issue.
>>
>> Any thoughts out there on how to efficiently provide recently-modified
>> access to a table? This problem exists for many types of data structures,
>> not just recently-modified. Any ordered data structure that can be
>> dynamically reordered suffers from the same problems. As I’ve been doing
>> schema design, this pattern keeps recurring. A nice way to address this
>> problem has lots of applications.
>>
>> Thanks in advance for your thoughts
>>
>> Robert
>>
>>
>

Reply via email to