Re: Looking for design ideas
Steve Does a document have a different URL when it is in a personal DB? I suspect the easiest solution is to use just one index. You can have a field containing an integer identifying the personal DB. For public, set this to zero. Call it DBid. Update the doc to change this and the URL when the user starts editing. Then the query contains the userid, and you boost on this field. Or something like that. Cheers -- Rick On March 18, 2018 11:13:49 AM EDT, Steven Whitewrote: >Hi everyone, > >I have a design problem that i"m not sure how to solve best so I >figured I >share it here and see what ideas others may have. > >I have a DB that hold documents (over 1 million and growing). This is >known as the "Public" DB that holds documents visible to all of my end >users. > >My application let users "check-out" one or more documents at a time >off >this "Public" DB, edit them and "check-in" back into the "Public" DB. >When >a document is checked-out, it goes into a "Personal" DB for that user >(and >the document in the "Public" DB is flagged as such to alert other >users.) >The owner of this checked-out document in the "Personal" DB can make >changes to the document and save it back into the "Personal" DB as >often as >he wants to. Sometimes the document lives in the "Personal" DB for few >minutes before it is checked-in back into the "Public" DB and sometimes >it >can live in the "Personal" DB for 1 day or 1 month. When a document is >saved into the "Personal" DB, only the owner of that document can see >it. > >Currently there are 100 users but this will grow to at least 500 or >maybe >even 1000. > >I'm looking at a solution on how to enable a full text search on those >documents, both in the "Public" and "Personal" DB so that: > >1) Documents in the "Public" DB are searchable by all users. This is >the >easy part. > >2) Documents in the "Personal" DB of each user is searchable by the >owner >of that "Personal" DB. This is easy too. > >3) A user can search both the "Public" and "Personal" DB at anytime but >if >a document is in the "Personal" DB, we will not search it the "Public" >-- >i.e.: whatever is in "Personal" DB takes over what's in the "Public" >DB. > >Item #3 is important and is what I'm trying to solve. The goal is to >give >hits to the user on documents that they are editing (in their >"Personal" >DB) instead of that in the "Public". > >The way I'm thinking to solve this problem is to create 2 Solr indexes >(do >we call those "cores"?): > >1) The "Public" DB is indexed into the "Public" Solr index. > >2) The "Personal" DB is indexed into the "Personal" Solr index with a >field >indicating the owner of that document. > >With the above 2 indexes, I can now send the user's search syntax to >both >indexes but for the "Public", I will also send a list of IDs (those >documents in the user's "Personal" DB) to exclude from the result set. >This way, I let a user search both the "Public" and "Personal" DB as >such >the documents in the "Personal" DB are included in the search and are >excluded from the "Public" DB. > >Did I make sense? If so, is this doable? Will ranking be effected >given >that I'm searching 2 indexes? > >Let me know what issues I might be overlooking with this solution. > >Thanks > >Steve -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Re: Looking for design ideas
I’ve worked on something similar - data set was 100m documents with thousands of users. The ranking is relative in each index. Eg. What is #1 , #2, #3 is only 1,2,3 in that index. Your challenge will in the user interface result display: how to merge results in a way that the relevant results are shown first before non relevant results. There are numerous ways to merge — could even retrieve , merge, index, and retrieve from that — but computing power aside, that’s not efficient. You could consider two indexes not as public and private but as a metadata (data indexed only, not stored) and data (index / stored values). This way you’ll get your ranking without having to compromise. Once you have your doc ids , you can retrieve from a data index / read only SolR cluster or a scalable persistent store (Cassandra, Mongo, etc. ) that would scale way better than SolR itself for thousands if not millions of users ( please let’s not start a debate about this ). This way your users would have relevant results, and fast access to the index , the data would be protected - if you filter by the doc owner Id as a “or” query in addition to doc owner I’d = ‘public’. What you lose in not getting the document Data from the initial query you can retrieve asynchronously or maybe “join” with another collection — which I’ve not done but I know it’s possible. Also may want to consider CQRS pattern for doc checkin / checkout Actions to keep the indexing / query time scalable. It may be more work but it’s more scalable. Go big or go home. ;) Hope it helps -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 18, 2018, 11:14 AM -0400, Steven White, wrote: > Hi everyone, > > I have a design problem that i"m not sure how to solve best so I figured I > share it here and see what ideas others may have. > > I have a DB that hold documents (over 1 million and growing). This is > known as the "Public" DB that holds documents visible to all of my end > users. > > My application let users "check-out" one or more documents at a time off > this "Public" DB, edit them and "check-in" back into the "Public" DB. When > a document is checked-out, it goes into a "Personal" DB for that user (and > the document in the "Public" DB is flagged as such to alert other users.) > The owner of this checked-out document in the "Personal" DB can make > changes to the document and save it back into the "Personal" DB as often as > he wants to. Sometimes the document lives in the "Personal" DB for few > minutes before it is checked-in back into the "Public" DB and sometimes it > can live in the "Personal" DB for 1 day or 1 month. When a document is > saved into the "Personal" DB, only the owner of that document can see it. > > Currently there are 100 users but this will grow to at least 500 or maybe > even 1000. > > I'm looking at a solution on how to enable a full text search on those > documents, both in the "Public" and "Personal" DB so that: > > 1) Documents in the "Public" DB are searchable by all users. This is the > easy part. > > 2) Documents in the "Personal" DB of each user is searchable by the owner > of that "Personal" DB. This is easy too. > > 3) A user can search both the "Public" and "Personal" DB at anytime but if > a document is in the "Personal" DB, we will not search it the "Public" -- > i.e.: whatever is in "Personal" DB takes over what's in the "Public" DB. > > Item #3 is important and is what I'm trying to solve. The goal is to give > hits to the user on documents that they are editing (in their "Personal" > DB) instead of that in the "Public". > > The way I'm thinking to solve this problem is to create 2 Solr indexes (do > we call those "cores"?): > > 1) The "Public" DB is indexed into the "Public" Solr index. > > 2) The "Personal" DB is indexed into the "Personal" Solr index with a field > indicating the owner of that document. > > With the above 2 indexes, I can now send the user's search syntax to both > indexes but for the "Public", I will also send a list of IDs (those > documents in the user's "Personal" DB) to exclude from the result set. > This way, I let a user search both the "Public" and "Personal" DB as such > the documents in the "Personal" DB are included in the search and are > excluded from the "Public" DB. > > Did I make sense? If so, is this doable? Will ranking be effected given > that I'm searching 2 indexes? > > Let me know what issues I might be overlooking with this solution. > > Thanks > > Steve
Looking for design ideas
Hi everyone, I have a design problem that i"m not sure how to solve best so I figured I share it here and see what ideas others may have. I have a DB that hold documents (over 1 million and growing). This is known as the "Public" DB that holds documents visible to all of my end users. My application let users "check-out" one or more documents at a time off this "Public" DB, edit them and "check-in" back into the "Public" DB. When a document is checked-out, it goes into a "Personal" DB for that user (and the document in the "Public" DB is flagged as such to alert other users.) The owner of this checked-out document in the "Personal" DB can make changes to the document and save it back into the "Personal" DB as often as he wants to. Sometimes the document lives in the "Personal" DB for few minutes before it is checked-in back into the "Public" DB and sometimes it can live in the "Personal" DB for 1 day or 1 month. When a document is saved into the "Personal" DB, only the owner of that document can see it. Currently there are 100 users but this will grow to at least 500 or maybe even 1000. I'm looking at a solution on how to enable a full text search on those documents, both in the "Public" and "Personal" DB so that: 1) Documents in the "Public" DB are searchable by all users. This is the easy part. 2) Documents in the "Personal" DB of each user is searchable by the owner of that "Personal" DB. This is easy too. 3) A user can search both the "Public" and "Personal" DB at anytime but if a document is in the "Personal" DB, we will not search it the "Public" -- i.e.: whatever is in "Personal" DB takes over what's in the "Public" DB. Item #3 is important and is what I'm trying to solve. The goal is to give hits to the user on documents that they are editing (in their "Personal" DB) instead of that in the "Public". The way I'm thinking to solve this problem is to create 2 Solr indexes (do we call those "cores"?): 1) The "Public" DB is indexed into the "Public" Solr index. 2) The "Personal" DB is indexed into the "Personal" Solr index with a field indicating the owner of that document. With the above 2 indexes, I can now send the user's search syntax to both indexes but for the "Public", I will also send a list of IDs (those documents in the user's "Personal" DB) to exclude from the result set. This way, I let a user search both the "Public" and "Personal" DB as such the documents in the "Personal" DB are included in the search and are excluded from the "Public" DB. Did I make sense? If so, is this doable? Will ranking be effected given that I'm searching 2 indexes? Let me know what issues I might be overlooking with this solution. Thanks Steve