I’ve worked on something similar - data set was 100m documents with thousands of users. The ranking is relative in each index. Eg. What is #1 , #2, #3 is only 1,2,3 in that index.
Your challenge will in the user interface result display: how to merge results in a way that the relevant results are shown first before non relevant results. There are numerous ways to merge — could even retrieve , merge, index, and retrieve from that — but computing power aside, that’s not efficient. You could consider two indexes not as public and private but as a metadata (data indexed only, not stored) and data (index / stored values). This way you’ll get your ranking without having to compromise. Once you have your doc ids , you can retrieve from a data index / read only SolR cluster or a scalable persistent store (Cassandra, Mongo, etc. ) that would scale way better than SolR itself for thousands if not millions of users ( please let’s not start a debate about this ). This way your users would have relevant results, and fast access to the index , the data would be protected - if you filter by the doc owner Id as a “or” query in addition to doc owner I’d = ‘public’. What you lose in not getting the document Data from the initial query you can retrieve asynchronously or maybe “join” with another collection — which I’ve not done but I know it’s possible. Also may want to consider CQRS pattern for doc checkin / checkout Actions to keep the indexing / query time scalable. It may be more work but it’s more scalable. Go big or go home. ;) Hope it helps -- Rahul Singh rahul.si...@anant.us Anant Corporation On Mar 18, 2018, 11:14 AM -0400, Steven White <swhite4...@gmail.com>, wrote: > Hi everyone, > > I have a design problem that i"m not sure how to solve best so I figured I > share it here and see what ideas others may have. > > I have a DB that hold documents (over 1 million and growing). This is > known as the "Public" DB that holds documents visible to all of my end > users. > > My application let users "check-out" one or more documents at a time off > this "Public" DB, edit them and "check-in" back into the "Public" DB. When > a document is checked-out, it goes into a "Personal" DB for that user (and > the document in the "Public" DB is flagged as such to alert other users.) > The owner of this checked-out document in the "Personal" DB can make > changes to the document and save it back into the "Personal" DB as often as > he wants to. Sometimes the document lives in the "Personal" DB for few > minutes before it is checked-in back into the "Public" DB and sometimes it > can live in the "Personal" DB for 1 day or 1 month. When a document is > saved into the "Personal" DB, only the owner of that document can see it. > > Currently there are 100 users but this will grow to at least 500 or maybe > even 1000. > > I'm looking at a solution on how to enable a full text search on those > documents, both in the "Public" and "Personal" DB so that: > > 1) Documents in the "Public" DB are searchable by all users. This is the > easy part. > > 2) Documents in the "Personal" DB of each user is searchable by the owner > of that "Personal" DB. This is easy too. > > 3) A user can search both the "Public" and "Personal" DB at anytime but if > a document is in the "Personal" DB, we will not search it the "Public" -- > i.e.: whatever is in "Personal" DB takes over what's in the "Public" DB. > > Item #3 is important and is what I'm trying to solve. The goal is to give > hits to the user on documents that they are editing (in their "Personal" > DB) instead of that in the "Public". > > The way I'm thinking to solve this problem is to create 2 Solr indexes (do > we call those "cores"?): > > 1) The "Public" DB is indexed into the "Public" Solr index. > > 2) The "Personal" DB is indexed into the "Personal" Solr index with a field > indicating the owner of that document. > > With the above 2 indexes, I can now send the user's search syntax to both > indexes but for the "Public", I will also send a list of IDs (those > documents in the user's "Personal" DB) to exclude from the result set. > This way, I let a user search both the "Public" and "Personal" DB as such > the documents in the "Personal" DB are included in the search and are > excluded from the "Public" DB. > > Did I make sense? If so, is this doable? Will ranking be effected given > that I'm searching 2 indexes? > > Let me know what issues I might be overlooking with this solution. > > Thanks > > Steve