Re: Looking for design ideas

2018-03-18 Thread Rick Leir
Steve
Does a document have a different URL when it is in a personal DB? 

I suspect the easiest solution is to use just one index.

You can have a field containing an integer identifying the personal DB. For 
public, set this to zero. Call it DBid. Update the doc to change this and the 
URL when the user starts editing.

Then the query contains the userid, and you boost on this field. Or something 
like that.
Cheers -- Rick


On March 18, 2018 11:13:49 AM EDT, Steven White  wrote:
>Hi everyone,
>
>I have a design problem that i"m not sure how to solve best so I
>figured I
>share it here and see what ideas others may have.
>
>I have a DB that hold documents (over 1 million and growing).  This is
>known as the "Public" DB that holds documents visible to all of my end
>users.
>
>My application let users "check-out" one or more documents at a time
>off
>this "Public" DB, edit them and "check-in" back into the "Public" DB. 
>When
>a document is checked-out, it goes into a "Personal" DB for that user
>(and
>the document in the "Public" DB is flagged as such to alert other
>users.)
>The owner of this checked-out document in the "Personal" DB can make
>changes to the document and save it back into the "Personal" DB as
>often as
>he wants to.  Sometimes the document lives in the "Personal" DB for few
>minutes before it is checked-in back into the "Public" DB and sometimes
>it
>can live in the "Personal" DB for 1 day or 1 month.  When a document is
>saved into the "Personal" DB, only the owner of that document can see
>it.
>
>Currently there are 100 users but this will grow to at least 500 or
>maybe
>even 1000.
>
>I'm looking at a solution on how to enable a full text search on those
>documents, both in the "Public" and "Personal" DB so that:
>
>1) Documents in the "Public" DB are searchable by all users.  This is
>the
>easy part.
>
>2) Documents in the "Personal" DB of each user is searchable by the
>owner
>of that "Personal" DB.  This is easy too.
>
>3) A user can search both the "Public" and "Personal" DB at anytime but
>if
>a document is in the "Personal" DB, we will not search it the "Public"
>--
>i.e.: whatever is in "Personal" DB takes over what's in the "Public"
>DB.
>
>Item #3 is important and is what I'm trying to solve.  The goal is to
>give
>hits to the user on documents that they are editing (in their
>"Personal"
>DB) instead of that in the "Public".
>
>The way I'm thinking to solve this problem is to create 2 Solr indexes
>(do
>we call those "cores"?):
>
>1) The "Public" DB is indexed into the "Public" Solr index.
>
>2) The "Personal" DB is indexed into the "Personal" Solr index with a
>field
>indicating the owner of that document.
>
>With the above 2 indexes, I can now send the user's search syntax to
>both
>indexes but for the "Public", I will also send a list of IDs (those
>documents in the user's "Personal" DB) to exclude from the result set.
>This way, I let a user search both the "Public" and "Personal" DB as
>such
>the documents in the "Personal" DB are included in the search and are
>excluded from the "Public" DB.
>
>Did I make sense?  If so, is this doable?  Will ranking be effected
>given
>that I'm searching 2 indexes?
>
>Let me know what issues I might be overlooking with this solution.
>
>Thanks
>
>Steve

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Looking for design ideas

2018-03-18 Thread Rahul Singh
I’ve worked on something similar - data set was 100m documents with thousands 
of users. The ranking is relative in each index. Eg. What is #1 , #2, #3 is 
only 1,2,3 in that index.

Your challenge will in the user interface result display: how to merge results 
in a way that the relevant results are shown first before non relevant results.

There are numerous ways to merge — could even retrieve , merge, index, and 
retrieve from that — but computing power aside, that’s not efficient.

You could consider two indexes not as public and private but as a metadata 
(data indexed only, not stored) and data (index / stored values). This way 
you’ll get your ranking without having to compromise. Once you have your doc 
ids , you can retrieve from a data index / read only SolR cluster or a scalable 
persistent store (Cassandra, Mongo, etc. ) that would scale way better than 
SolR itself for thousands if not millions of users ( please let’s not start a 
debate about this ).

This way your users would have relevant results, and fast access to the index , 
the data would be protected - if you filter by the doc owner Id as a “or” query 
in addition to doc owner I’d = ‘public’. What you lose in not getting the 
document Data from the initial query you can retrieve asynchronously or maybe 
“join” with another collection — which I’ve not done but I know it’s possible.

Also may want to consider CQRS pattern for doc checkin / checkout Actions to 
keep the indexing / query time scalable. It may be more work but it’s more 
scalable. Go big or go home. ;)

Hope it helps

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 18, 2018, 11:14 AM -0400, Steven White , wrote:
> Hi everyone,
>
> I have a design problem that i"m not sure how to solve best so I figured I
> share it here and see what ideas others may have.
>
> I have a DB that hold documents (over 1 million and growing). This is
> known as the "Public" DB that holds documents visible to all of my end
> users.
>
> My application let users "check-out" one or more documents at a time off
> this "Public" DB, edit them and "check-in" back into the "Public" DB. When
> a document is checked-out, it goes into a "Personal" DB for that user (and
> the document in the "Public" DB is flagged as such to alert other users.)
> The owner of this checked-out document in the "Personal" DB can make
> changes to the document and save it back into the "Personal" DB as often as
> he wants to. Sometimes the document lives in the "Personal" DB for few
> minutes before it is checked-in back into the "Public" DB and sometimes it
> can live in the "Personal" DB for 1 day or 1 month. When a document is
> saved into the "Personal" DB, only the owner of that document can see it.
>
> Currently there are 100 users but this will grow to at least 500 or maybe
> even 1000.
>
> I'm looking at a solution on how to enable a full text search on those
> documents, both in the "Public" and "Personal" DB so that:
>
> 1) Documents in the "Public" DB are searchable by all users. This is the
> easy part.
>
> 2) Documents in the "Personal" DB of each user is searchable by the owner
> of that "Personal" DB. This is easy too.
>
> 3) A user can search both the "Public" and "Personal" DB at anytime but if
> a document is in the "Personal" DB, we will not search it the "Public" --
> i.e.: whatever is in "Personal" DB takes over what's in the "Public" DB.
>
> Item #3 is important and is what I'm trying to solve. The goal is to give
> hits to the user on documents that they are editing (in their "Personal"
> DB) instead of that in the "Public".
>
> The way I'm thinking to solve this problem is to create 2 Solr indexes (do
> we call those "cores"?):
>
> 1) The "Public" DB is indexed into the "Public" Solr index.
>
> 2) The "Personal" DB is indexed into the "Personal" Solr index with a field
> indicating the owner of that document.
>
> With the above 2 indexes, I can now send the user's search syntax to both
> indexes but for the "Public", I will also send a list of IDs (those
> documents in the user's "Personal" DB) to exclude from the result set.
> This way, I let a user search both the "Public" and "Personal" DB as such
> the documents in the "Personal" DB are included in the search and are
> excluded from the "Public" DB.
>
> Did I make sense? If so, is this doable? Will ranking be effected given
> that I'm searching 2 indexes?
>
> Let me know what issues I might be overlooking with this solution.
>
> Thanks
>
> Steve


Looking for design ideas

2018-03-18 Thread Steven White
Hi everyone,

I have a design problem that i"m not sure how to solve best so I figured I
share it here and see what ideas others may have.

I have a DB that hold documents (over 1 million and growing).  This is
known as the "Public" DB that holds documents visible to all of my end
users.

My application let users "check-out" one or more documents at a time off
this "Public" DB, edit them and "check-in" back into the "Public" DB.  When
a document is checked-out, it goes into a "Personal" DB for that user (and
the document in the "Public" DB is flagged as such to alert other users.)
The owner of this checked-out document in the "Personal" DB can make
changes to the document and save it back into the "Personal" DB as often as
he wants to.  Sometimes the document lives in the "Personal" DB for few
minutes before it is checked-in back into the "Public" DB and sometimes it
can live in the "Personal" DB for 1 day or 1 month.  When a document is
saved into the "Personal" DB, only the owner of that document can see it.

Currently there are 100 users but this will grow to at least 500 or maybe
even 1000.

I'm looking at a solution on how to enable a full text search on those
documents, both in the "Public" and "Personal" DB so that:

1) Documents in the "Public" DB are searchable by all users.  This is the
easy part.

2) Documents in the "Personal" DB of each user is searchable by the owner
of that "Personal" DB.  This is easy too.

3) A user can search both the "Public" and "Personal" DB at anytime but if
a document is in the "Personal" DB, we will not search it the "Public" --
i.e.: whatever is in "Personal" DB takes over what's in the "Public" DB.

Item #3 is important and is what I'm trying to solve.  The goal is to give
hits to the user on documents that they are editing (in their "Personal"
DB) instead of that in the "Public".

The way I'm thinking to solve this problem is to create 2 Solr indexes (do
we call those "cores"?):

1) The "Public" DB is indexed into the "Public" Solr index.

2) The "Personal" DB is indexed into the "Personal" Solr index with a field
indicating the owner of that document.

With the above 2 indexes, I can now send the user's search syntax to both
indexes but for the "Public", I will also send a list of IDs (those
documents in the user's "Personal" DB) to exclude from the result set.
This way, I let a user search both the "Public" and "Personal" DB as such
the documents in the "Personal" DB are included in the search and are
excluded from the "Public" DB.

Did I make sense?  If so, is this doable?  Will ranking be effected given
that I'm searching 2 indexes?

Let me know what issues I might be overlooking with this solution.

Thanks

Steve