Hi Benoit, Following the approval of the table's design, I will start to implement *threadtable *and start with define the table and its DAO layer. I will come back for the deletion part feature finish the guessing threadid feature.
Best regards, Quan Vào Th 4, 21 thg 7, 2021 vào lúc 18:08 btell...@linagora.com < btell...@linagora.com> đã viết: > > On 21/07/2021 18:06, Quan tran hong wrote: > > Hi Benoit, > > Thanks for your great reviews. > > > >> Or we can just do several selects, one for each mimeMessageIds.I > > personaly likely prefer this option. > > > > This will make the upper logic layer more complicated. But I agree with > > your suggestion cause I would prefer balance and performance also. > > > >> Remember to do the delete of ThreadTable before threadtable_lookup, > > otherwise if you delete the pointer before the data you might end up in > > a case where the actual data is never deleted. > > > > Sure. Thanks for your reminder. > > > >> When/How do we call it? > > StoringThreadIdGuessingAlgorithm maybe? > Not really, this would only be called when appending messages. > > Likely this would need to be hooked in DeleteMessageListener. > > > > Cheers, > > Quan > > > > Vào Th 4, 21 thg 7, 2021 vào lúc 17:03 btell...@apache.org < > > btell...@apache.org> đã viết: > > > >> Hello quan, > >> > >> Nice proposal! I think the Cassandra data model you propose is evolved > >> enough so that we start implementing it. > >> > >> Some comments inlined... > >> > >> On 21/07/2021 15:55, Quan tran hong wrote: > >>> Hi Benoit, > >>> I did have another try on this. Please have a look. > >>> CREATE threadtable > >>> > >>> CREATE TABLE ThreadTable (messageId timeuuid, threadId timeuuid, > username > >>> text, mimeMessageId text, baseSubject text, PRIMARY KEY((username, > >>> mimemessageid), messageid)); > >>> > >>> => Partition key: (username, mimemessageid), clustering key: messageid. > >>> > >>> [...] > >>> > >>> SELECT basesubject, threadId FROM threadtable WHERE username = 'quan' > >> AND > >>> mimeMessageId IN ('MimeMessageID2', 'MimeMessageID3'); > >> This looks way better to me. > >> > >> IN usage is PRIMARY KEY is discouraged as it leads to coordination > >> across partitions. > >> > >> Read more for instance in > >> > >> > https://stackoverflow.com/questions/55604857/cassandra-query-performance-using-in-clause-for-one-portion-of-the-composite-pa > >> > >> Either we should move "mimeMessageId" to the clustering key (but all the > >> messages of a user, including their subject would end up in a single > >> partition, which could be quite large... for instance 1 million messages > >> x size of the mime message ids and subject could be too much, as > >> partitions are recommended to be 100MBs at most). > >> > >> Or we can just do several selects, one for each mimeMessageIds.I > >> personaly likely prefer this option. > >>> [...] > >>> > >>> CREATE threadtable_lookup for deletion purpose > >>> > >>> Supposed when we delete a message, we would need to delete that > message’s > >>> thread-related data in the threadtable also. I guess we just need > >> messageId > >>> to delete that message. > >>> > >>> We would need to have another table similar to the threadtable but > looked > >>> up by messageId to get the needed params for deletion query. > >>> > >>> CREATE TABLE ThreadTable_lookup (messageId timeuuid, username text, > >>> mimeMessageIds set<text>, PRIMARY KEY(messageid)); > >> The set should be frozen. We will never add or remove data in it, so we > >> do not need a CRDT (Commutative Replicated Data Type). > >> > >> Forbidding adding - removing individual elements avoids lots of issues, > >> and lead to a more compact storage. > >> > >>> To ease testing, I create a table with messageId’s type is int instead. > >>> > >>> We will insert the data as same as the original table. > >>> > >>> insert into ThreadTable_lookup (messageId, username, > mimeMessageIds) > >>> values (1, 'quan', {'MimeMessageID1', 'MimeMessageID2', > >> 'MimeMessageID3'}); > >>> [...] > >>> > >>> > >>> SELECT * FROM threadtable_lookup; > >>> > >>> > >>> Now we will do a query selection by messageId to get the needed > username, > >>> mimeMessageIds for original threadtable deletion. > >>> > >>> Supposed we want to delete message 4. > >>> > >>> SELECT username, mimemessageids FROM threadtable_lookup WHERE messageid > >> = 4; > >>> > >>> Then we will use these results to do a deletion query on threadtable. > >>> > >> Remember to do the delete of ThreadTable before threadtable_lookup, > >> otherwise if you delete the pointer before the data you might end up in > >> a case where the actual data is never deleted. > >> > >> The algorithm that you propose looks good. > >> > >> When/How do we call it? > >> > >> > >> Cheers, > >> > >> Benoit > >>> The data of messageId 4 deleted. > >>> > >>> > >>> > >>> Best regards, > >>> > >>> Quan > >>> > >>> > >>> Vào Th 3, 20 thg 7, 2021 vào lúc 18:39 btell...@apache.org < > >>> btell...@apache.org> đã viết: > >>> > >>>> Hello Quan, > >>>> > >>>> On 20/07/2021 17:24, Quan tran hong wrote: > >>>>> [...] > >>>>> > >>>>> SELECT threadId FROM threadtable WHERE username = 'quan' AND > >> baseSubject > >>>> = > >>>>> 'baseSubject1' AND mimeMessageId IN ('MimeMessageID2', > >> 'MimeMessageID3') > >>>>> LIMIT 1 ALLOW FILTERING; > >>>> ALLOW FILTERING should not be used as it will result in a full scan > and > >>>> is thus a performance disaster. > >>>> > >>>> If you need it, this means you do not have the right table structure > and > >>>> likely should rework the CREATE TABLE statement. > >>>>> => This new message should have this threadId. > >>>>> New unrelated message > >>>>> > >>>>> Assume that we do a query for a new unrelated message. > >>>>> > >>>>> SELECT threadId FROM threadtable WHERE username = 'quan' AND > >> baseSubject > >>>> = > >>>>> 'unrelatedBaseSubject' AND mimeMessageId IN ('MimeMessageID2', > >>>>> 'MimeMessageID3') LIMIT 1 ALLOW FILTERING; > >>>>> > >>>>> => This new message should have a new threadId. > >>>>> Insert new message data > >>>>> > >>>>> After having a threadId, we need to insert new message data into the > >>>> thread > >>>>> table. > >>>>> > >>>>> insert into ThreadTable (messageId, threadId, username, > mimeMessageId, > >>>>> baseSubject) values (now(), 02294fe1-e941-11eb-a8ee-77de5498f1fa, > >> 'quan', > >>>>> 'MimeMessageID2', 'baseSubject1'); > >>>>> > >>>>> insert into ThreadTable (messageId, threadId, username, > mimeMessageId, > >>>>> baseSubject) values (now(), 02294fe1-e941-11eb-a8ee-77de5498f1fa, > >> 'quan', > >>>>> 'MimeMessageID3', 'baseSubject1'); > >>>>> Conclusion > >>>>> > >>>>> I think this data model complies with the needed request for the > >> guessing > >>>>> algorithm problem, but it looks like still maybe there is room for > >>>>> improvement. > >>>> What Cassandra request do we use to delete the data in there? > >>>> > >>>>> Best Regards, > >>>>> > >>>>> Quan > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> Vào Th 2, 19 thg 7, 2021 vào lúc 18:23 btell...@apache.org < > >>>>> btell...@apache.org> đã viết: > >>>>> > >>>>>> Hello Quan, > >>>>>> > >>>>>> On 19/07/2021 17:59, Quan tran hong wrote: > >>>>>>> Hi, > >>>>>>> I am starting to implement ThreadIdGuessingAlgorithm for the > >>>> distributed > >>>>>>> module. Because this is a breaking change and I am new to Cassandra > >>>> also, > >>>>>>> therefore I want to have some discussion with you about how to do > >> this. > >>>>>> As long as we introduce a new table there is no reason that it > creates > >>>>>> breaking change, but getting the format right will ease our life > down > >>>>>> the line. > >>>>>>> For the ones who did not catch up with this work, please have a > look > >> at > >>>>>>> JMAP Threads specs [1] and my work related to this [2]. > >>>>>>> > >>>>>>> So my ideas on how to do this: > >>>>>>> - Add a needed inputs Cassandra Table for guessing threadId > >> algorithm. > >>>>>>> Maybe a table likes: > >>>>>>> CREATE TABLE ThreadRelatedTable ( > >>>>>>> threadId timeuuid, > >>>>>>> messageId timeuuid, > >>>>>>> mimeMessageIds SET<text>, > >>>>>>> subject text, > >>>>>>> PRIMARY KEY (mimeMessageIds, subject) > >>>>>>> ); > >>>>>>> - Whenever we guess threadId for a new message, we access this > table > >>>> and > >>>>>> do > >>>>>>> the matching query to get related threadId(if there is) or decide > new > >>>>>>> message should have a new threadId. > >>>>>>> - Whenever we save a new message, we save the thread-related data > to > >>>> this > >>>>>>> table. > >>>>>>> > >>>>>>> This is my first come-up idea. Please express your thoughts about > >> this. > >>>>>> Collections are an advanced data modeling tool, that should be used > >> with > >>>>>> caution. I am not sure using it in a PRIMARY KEY is a good idea. I > am > >>>>>> not sure that does what you want (the full primary key should be > >>>>>> specified to know which node hold the data. > >>>>>> > >>>>>> Also, once you found the message related to a thread you want to > >>>>>> validate that the subject matches. This can be done on application > >> side > >>>>>> (James), and avoids complicated data model. > >>>>>> > >>>>>> I encourage you to validate your data model using a Cassandra in > >> docker > >>>>>> and executing CQL commands locally with CQLSH tool to simulate the > >>>>>> queries you whish to do, and learn about your data model before even > >>>>>> starting to implement it. IMO sharing CQL commands for creating the > >>>>>> table, inserting data in it, and retrieving data from it would be a > >>>>>> great follow up to this email. > >>>>>> > >>>>>> How would you populate the data of that table? > >>>>>> > >>>>>> Best regards, > >>>>>> > >>>>>> Benoit > >>>>>>> Best regards, > >>>>>>> > >>>>>>> Quan > >>>>>>> > >>>>>>> [1] https://jmap.io/spec-mail.html#threads > >>>>>>> [2] https://issues.apache.org/jira/browse/JAMES-3516 > >>>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > >>>>>> For additional commands, e-mail: server-dev-h...@james.apache.org > >>>>>> > >>>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > >>>> For additional commands, e-mail: server-dev-h...@james.apache.org > >>>> > >>>> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > >> For additional commands, e-mail: server-dev-h...@james.apache.org > >> > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org > For additional commands, e-mail: server-dev-h...@james.apache.org > >