Architecture decisions with Solr

2011-02-09 Thread Greg Georges
Hello all,

I am looking into an enterprise search solution for our architecture and I am 
very pleased to see all the features Solr provides. In our case, we will have a 
need for a highly scalable application for multiple clients. This application 
will be built to serve many users who each will have a client account. Each 
client will have a multitude of documents to index (0-1000s of documents). 
After discussion we were talking about going multicore and to have one index 
file per client account. The reason for this is that security is achieved by 
having a separate index for each client etc.. Is this the best approach? How 
feasible is it (dynamically create indexes on client account creation. Is it 
better to go the faceted search capabilities route? Thanks for your help

Greg


Re: Architecture decisions with Solr

2011-02-09 Thread Darren Govoni
What about standing up a VM (search appliance that you would make) for
each client? 
If there's no data sharing across clients, then using the same solr
server/index doesn't seem necessary.

Solr will easily meet your needs though, its the best there is.

On Wed, 2011-02-09 at 14:23 -0500, Greg Georges wrote:

 Hello all,
 
 I am looking into an enterprise search solution for our architecture and I am 
 very pleased to see all the features Solr provides. In our case, we will have 
 a need for a highly scalable application for multiple clients. This 
 application will be built to serve many users who each will have a client 
 account. Each client will have a multitude of documents to index (0-1000s of 
 documents). After discussion we were talking about going multicore and to 
 have one index file per client account. The reason for this is that security 
 is achieved by having a separate index for each client etc.. Is this the best 
 approach? How feasible is it (dynamically create indexes on client account 
 creation. Is it better to go the faceted search capabilities route? Thanks 
 for your help
 
 Greg




RE: Architecture decisions with Solr

2011-02-09 Thread Greg Georges
From what I understand about multicore, each of the indexes are independant 
from each other right? Or would one index have access to the info of the other? 
My requirement is like you mention, a client has access only to his or her 
search data based in their documents. Other clients have no access to the index 
of other clients.

Greg

-Original Message-
From: Darren Govoni [mailto:dar...@ontrenet.com] 
Sent: 9 février 2011 14:28
To: solr-user@lucene.apache.org
Subject: Re: Architecture decisions with Solr

What about standing up a VM (search appliance that you would make) for
each client? 
If there's no data sharing across clients, then using the same solr
server/index doesn't seem necessary.

Solr will easily meet your needs though, its the best there is.

On Wed, 2011-02-09 at 14:23 -0500, Greg Georges wrote:

 Hello all,
 
 I am looking into an enterprise search solution for our architecture and I am 
 very pleased to see all the features Solr provides. In our case, we will have 
 a need for a highly scalable application for multiple clients. This 
 application will be built to serve many users who each will have a client 
 account. Each client will have a multitude of documents to index (0-1000s of 
 documents). After discussion we were talking about going multicore and to 
 have one index file per client account. The reason for this is that security 
 is achieved by having a separate index for each client etc.. Is this the best 
 approach? How feasible is it (dynamically create indexes on client account 
 creation. Is it better to go the faceted search capabilities route? Thanks 
 for your help
 
 Greg




Re: Architecture decisions with Solr

2011-02-09 Thread Glen Newton
 This application will be built to serve many users

If this means that you have thousands of users, 1000s of VMs and/or
1000s of cores is not going to scale.

Have an ID in the index for each user, and filter using it.
Then they can see only their own documents.

Assuming that you are building an app that through which they
authenticate  talks to solr .
(i.e. all requests are filtered using their ID)

-Glen

On Wed, Feb 9, 2011 at 2:31 PM, Greg Georges greg.geor...@biztree.com wrote:
 From what I understand about multicore, each of the indexes are independant 
 from each other right? Or would one index have access to the info of the 
 other? My requirement is like you mention, a client has access only to his or 
 her search data based in their documents. Other clients have no access to the 
 index of other clients.

 Greg

 -Original Message-
 From: Darren Govoni [mailto:dar...@ontrenet.com]
 Sent: 9 février 2011 14:28
 To: solr-user@lucene.apache.org
 Subject: Re: Architecture decisions with Solr

 What about standing up a VM (search appliance that you would make) for
 each client?
 If there's no data sharing across clients, then using the same solr
 server/index doesn't seem necessary.

 Solr will easily meet your needs though, its the best there is.

 On Wed, 2011-02-09 at 14:23 -0500, Greg Georges wrote:

 Hello all,

 I am looking into an enterprise search solution for our architecture and I 
 am very pleased to see all the features Solr provides. In our case, we will 
 have a need for a highly scalable application for multiple clients. This 
 application will be built to serve many users who each will have a client 
 account. Each client will have a multitude of documents to index (0-1000s of 
 documents). After discussion we were talking about going multicore and to 
 have one index file per client account. The reason for this is that security 
 is achieved by having a separate index for each client etc.. Is this the 
 best approach? How feasible is it (dynamically create indexes on client 
 account creation. Is it better to go the faceted search capabilities route? 
 Thanks for your help

 Greg






-- 

-


Re: Architecture decisions with Solr

2011-02-09 Thread Sujit Pal
Another option (assuming the case where a user can be granted access to
a certain class of documents, and more than one user would be able to
access certain documents) would be to store the access filter (as an OR
query of content types) in an external cache (perhaps a database or an
eternal cache that the database changes are published to periodically),
then using this access filter as a facet on the base query.

-sujit

On Wed, 2011-02-09 at 14:38 -0500, Glen Newton wrote:
  This application will be built to serve many users
 
 If this means that you have thousands of users, 1000s of VMs and/or
 1000s of cores is not going to scale.
 
 Have an ID in the index for each user, and filter using it.
 Then they can see only their own documents.
 
 Assuming that you are building an app that through which they
 authenticate  talks to solr .
 (i.e. all requests are filtered using their ID)
 
 -Glen
 
 On Wed, Feb 9, 2011 at 2:31 PM, Greg Georges greg.geor...@biztree.com wrote:
  From what I understand about multicore, each of the indexes are independant 
  from each other right? Or would one index have access to the info of the 
  other? My requirement is like you mention, a client has access only to his 
  or her search data based in their documents. Other clients have no access 
  to the index of other clients.
 
  Greg
 
  -Original Message-
  From: Darren Govoni [mailto:dar...@ontrenet.com]
  Sent: 9 février 2011 14:28
  To: solr-user@lucene.apache.org
  Subject: Re: Architecture decisions with Solr
 
  What about standing up a VM (search appliance that you would make) for
  each client?
  If there's no data sharing across clients, then using the same solr
  server/index doesn't seem necessary.
 
  Solr will easily meet your needs though, its the best there is.
 
  On Wed, 2011-02-09 at 14:23 -0500, Greg Georges wrote:
 
  Hello all,
 
  I am looking into an enterprise search solution for our architecture and I 
  am very pleased to see all the features Solr provides. In our case, we 
  will have a need for a highly scalable application for multiple clients. 
  This application will be built to serve many users who each will have a 
  client account. Each client will have a multitude of documents to index 
  (0-1000s of documents). After discussion we were talking about going 
  multicore and to have one index file per client account. The reason for 
  this is that security is achieved by having a separate index for each 
  client etc.. Is this the best approach? How feasible is it (dynamically 
  create indexes on client account creation. Is it better to go the faceted 
  search capabilities route? Thanks for your help
 
  Greg
 
 
 
 
 
 



Re: Architecture decisions with Solr

2011-02-09 Thread Adam Estrada
I tried the multi-core route and it gets too complicated and cumbersome to 
maintain. That is just from my own personal testing...It was suggested that 
each user have their own ID in a single index that you can query against 
accordingly. In the example schema.xml I believe there is a field called 
texttight or something like that that is meant for skew numbers. Give each user 
their own guid or md5 hash and add that as part of all your queries. That way, 
only their data are returned. It would be the equivalent of something like 
this...

SELECT * FROM mytable WHERE userid = '3F2504E04F8911D39A0C0305E82C3301' AND 

Grant Ingersoll gave a presentation at the Lucene Revolution conference that 
demonstrated that you can build a query to be as easy or as complicated as any 
SQL statement. Maybe he can share that PPT?

Adam

On Feb 9, 2011, at 2:47 PM, Sujit Pal wrote:

 Another option (assuming the case where a user can be granted access to
 a certain class of documents, and more than one user would be able to
 access certain documents) would be to store the access filter (as an OR
 query of content types) in an external cache (perhaps a database or an
 eternal cache that the database changes are published to periodically),
 then using this access filter as a facet on the base query.
 
 -sujit
 
 On Wed, 2011-02-09 at 14:38 -0500, Glen Newton wrote:
 This application will be built to serve many users
 
 If this means that you have thousands of users, 1000s of VMs and/or
 1000s of cores is not going to scale.
 
 Have an ID in the index for each user, and filter using it.
 Then they can see only their own documents.
 
 Assuming that you are building an app that through which they
 authenticate  talks to solr .
 (i.e. all requests are filtered using their ID)
 
 -Glen
 
 On Wed, Feb 9, 2011 at 2:31 PM, Greg Georges greg.geor...@biztree.com 
 wrote:
 From what I understand about multicore, each of the indexes are independant 
 from each other right? Or would one index have access to the info of the 
 other? My requirement is like you mention, a client has access only to his 
 or her search data based in their documents. Other clients have no access 
 to the index of other clients.
 
 Greg
 
 -Original Message-
 From: Darren Govoni [mailto:dar...@ontrenet.com]
 Sent: 9 février 2011 14:28
 To: solr-user@lucene.apache.org
 Subject: Re: Architecture decisions with Solr
 
 What about standing up a VM (search appliance that you would make) for
 each client?
 If there's no data sharing across clients, then using the same solr
 server/index doesn't seem necessary.
 
 Solr will easily meet your needs though, its the best there is.
 
 On Wed, 2011-02-09 at 14:23 -0500, Greg Georges wrote:
 
 Hello all,
 
 I am looking into an enterprise search solution for our architecture and I 
 am very pleased to see all the features Solr provides. In our case, we 
 will have a need for a highly scalable application for multiple clients. 
 This application will be built to serve many users who each will have a 
 client account. Each client will have a multitude of documents to index 
 (0-1000s of documents). After discussion we were talking about going 
 multicore and to have one index file per client account. The reason for 
 this is that security is achieved by having a separate index for each 
 client etc.. Is this the best approach? How feasible is it (dynamically 
 create indexes on client account creation. Is it better to go the faceted 
 search capabilities route? Thanks for your help
 
 Greg