Maybe you could explain in more detail what you mean by recently modified
documents, since that is precisely what I thought I suggested with
descending ordering.

-- Jack Krupansky

On Thu, Jul 23, 2015 at 3:40 PM, Robert Wille <rwi...@fold3.com> wrote:

>  Carlos’ suggestion (nor yours) didn’t didn’t provide a way to query
> recently-modified documents.
>
>  His updated suggestion provides a way to get recently-modified
> documents, but not ordered.
>
>  On Jul 22, 2015, at 4:19 PM, Jack Krupansky <jack.krupan...@gmail.com>
> wrote:
>
>  "No way to query recently-modified documents."
>
>  I don't follow why you say that. I mean, that was the point of the data
> model suggestion I proposed. Maybe you could clarify.
>
>  I also wanted to mention that the new materialized view feature of
> Cassandra 3.0 might handle this use case, including taking care of the
> delete, automatically.
>
>
>  -- Jack Krupansky
>
> On Tue, Jul 21, 2015 at 12:37 PM, Robert Wille <rwi...@fold3.com> wrote:
>
>> The time series doesn’t provide the access pattern I’m looking for. No
>> way to query recently-modified documents.
>>
>>  On Jul 21, 2015, at 9:13 AM, Carlos Alonso <i...@mrcalonso.com> wrote:
>>
>>  Hi Robert,
>>
>>  What about modelling it as a time serie?
>>
>>  CREATE TABLE document (
>>   docId UUID,
>>   doc TEXT,
>>   last_modified TIMESTAMP
>>   PRIMARY KEY(docId, last_modified)
>> ) WITH CLUSTERING ORDER BY (last_modified DESC);
>>
>>  This way, you the lastest modification will always be the first record
>> in the row, therefore accessing it should be as easy as:
>>
>>  SELECT * FROM document WHERE docId == <the docId> LIMIT 1;
>>
>>  And, if you experience diskspace issues due to very long rows, then you
>> can always expire old ones using TTL or on a batch job. Tombstones will
>> never be a problem in this case as, due to the specified clustering order,
>> the latest modification will always be first record in the row.
>>
>>  Hope it helps.
>>
>>  Carlos Alonso | Software Engineer | @calonso
>> <https://twitter.com/calonso>
>>
>> On 21 July 2015 at 05:59, Robert Wille <rwi...@fold3.com> wrote:
>>
>>> Data structures that have a recently-modified access pattern seem to be
>>> a poor fit for Cassandra. I’m wondering if any of you smart guys can
>>> provide suggestions.
>>>
>>> For the sake of discussion, lets assume I have the following tables:
>>>
>>> CREATE TABLE document (
>>>         docId UUID,
>>>         doc TEXT,
>>>         last_modified TIMEUUID,
>>>         PRIMARY KEY ((docid))
>>> )
>>>
>>> CREATE TABLE doc_by_last_modified (
>>>         date TEXT,
>>>         last_modified TIMEUUID,
>>>         docId UUID,
>>>         PRIMARY KEY ((date), last_modified)
>>> )
>>>
>>> When I update a document, I retrieve its last_modified time, delete the
>>> current record from doc_by_last_modified, and add a new one. Unfortunately,
>>> if you’d like each document to appear at most once in the
>>> doc_by_last_modified table, then this doesn’t work so well.
>>>
>>> Documents can get into the doc_by_last_modified table multiple times if
>>> there is concurrent access, or if there is a consistency issue.
>>>
>>> Any thoughts out there on how to efficiently provide recently-modified
>>> access to a table? This problem exists for many types of data structures,
>>> not just recently-modified. Any ordered data structure that can be
>>> dynamically reordered suffers from the same problems. As I’ve been doing
>>> schema design, this pattern keeps recurring. A nice way to address this
>>> problem has lots of applications.
>>>
>>> Thanks in advance for your thoughts
>>>
>>> Robert
>>>
>>>
>>
>>
>
>

Reply via email to