Re: Implement ThreadIdGuessingAlgorithm for the distributed module

[email protected] Tue, 20 Jul 2021 04:39:52 -0700

Hello Quan,

On 20/07/2021 17:24, Quan tran hong wrote:
> [...]
>
> SELECT threadId FROM threadtable WHERE username = 'quan' AND baseSubject =
> 'baseSubject1' AND mimeMessageId IN ('MimeMessageID2', 'MimeMessageID3')
> LIMIT 1 ALLOW FILTERING;
ALLOW FILTERING should not be used as it will result in a full scan and
is thus a performance disaster.


If you need it, this means you do not have the right table structure and
likely should rework the CREATE TABLE statement.
>
> => This new message should have this threadId.
> New unrelated message
>
> Assume that we do a query for a new unrelated message.
>
> SELECT threadId FROM threadtable WHERE username = 'quan' AND baseSubject =
> 'unrelatedBaseSubject' AND mimeMessageId IN ('MimeMessageID2',
> 'MimeMessageID3') LIMIT 1 ALLOW FILTERING;
>
> => This new message should have a new threadId.
> Insert new message data
>
> After having a threadId, we need to insert new message data into the thread
> table.
>
> insert into ThreadTable (messageId, threadId, username, mimeMessageId,
> baseSubject) values (now(), 02294fe1-e941-11eb-a8ee-77de5498f1fa, 'quan',
> 'MimeMessageID2', 'baseSubject1');
>
> insert into ThreadTable (messageId, threadId, username, mimeMessageId,
> baseSubject) values (now(), 02294fe1-e941-11eb-a8ee-77de5498f1fa, 'quan',
> 'MimeMessageID3', 'baseSubject1');
> Conclusion
>
> I think this data model complies with the needed request for the guessing
> algorithm problem, but it looks like still maybe there is room for
> improvement.

What Cassandra request do we use to delete the data in there?

>
>
> Best Regards,
>
> Quan
>
>
>
>
>
> Vào Th 2, 19 thg 7, 2021 vào lúc 18:23 [email protected] <
> [email protected]> đã viết:
>
>> Hello Quan,
>>
>> On 19/07/2021 17:59, Quan tran hong wrote:
>>> Hi,
>>> I am starting to implement ThreadIdGuessingAlgorithm for the distributed
>>> module. Because this is a breaking change and I am new to Cassandra also,
>>> therefore I want to have some discussion with you about how to do this.
>> As long as we introduce a new table there is no reason that it creates
>> breaking change, but getting the format right will ease our life down
>> the line.
>>> For the ones who did not catch up with this work, please have a look at
>>> JMAP Threads specs [1] and my work related to this [2].
>>>
>>> So my ideas on how to do this:
>>> - Add a needed inputs Cassandra Table for guessing threadId algorithm.
>>> Maybe a table likes:
>>>  CREATE TABLE ThreadRelatedTable (
>>> threadId       timeuuid,
>>> messageId      timeuuid,
>>> mimeMessageIds     SET<text>,
>>> subject       text,
>>> PRIMARY KEY (mimeMessageIds, subject)
>>> );
>>> - Whenever we guess threadId for a new message, we access this table and
>> do
>>> the matching query to get related threadId(if there is) or decide new
>>> message should have a new threadId.
>>> - Whenever we save a new message, we save the thread-related data to this
>>> table.
>>>
>>> This is my first come-up idea. Please express your thoughts about this.
>> Collections are an advanced data modeling tool, that should be used with
>> caution. I am not sure using it in a PRIMARY KEY is a good idea. I am
>> not sure that does what you want (the full primary key should be
>> specified to know which node hold the data.
>>
>> Also, once you found the message related to a thread you want to
>> validate that the subject matches. This can be done on application side
>> (James), and avoids complicated data model.
>>
>> I encourage you to validate your data model using a Cassandra in docker
>> and executing CQL commands locally with CQLSH tool to simulate the
>> queries you whish to do, and learn about your data model before even
>> starting to implement it. IMO sharing CQL commands for creating the
>> table, inserting data in it, and retrieving data from it would be a
>> great follow up to this email.
>>
>> How would you populate the data of that table?
>>
>> Best regards,
>>
>> Benoit
>>> Best regards,
>>>
>>> Quan
>>>
>>> [1] https://jmap.io/spec-mail.html#threads
>>> [2] https://issues.apache.org/jira/browse/JAMES-3516
>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Implement ThreadIdGuessingAlgorithm for the distributed module

Reply via email to