Re: Using Multiple Cores for Multiple Users

2010-11-10 Thread Jan Høydahl / Cominvent
Hi,

If your index is supposed to handle only public information, i.e. public RSS 
feeds, then I don't see a need for multiple cores.

I would probably try to handle this on the query side only. Imagine this 
scenario:

User A registers RSS-X and RSS-Y (the application starts pulling and indexing 
these feeds)
User B registers RSS-Z (the application starts pulling feed Z)
User C registers RSS-X and RSS-Z (the application does nothing, as these are 
already being indexed)

When searching, add a filter to each user's queries. Solr will handle MANY 
terms in such a filter, and it is not likely that a human user subscribes to 
more than say a few 100 feeds.

So for user C, the query would look like .../solr/select?q=foo 
barfq=feedID:(RSS-X OR RSS-Z)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 10. nov. 2010, at 03.00, Adam Estrada wrote:

 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...
 
 Adam
 
 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:
 
 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.
 
 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.
 
 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.
 
 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.
 
 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users
 
 Hi,
 
 All,
 
 I have a web application that requires the user to register and then
 login
 to gain access to the site. Pretty standard stuff...Now I would like to
 know what the best approach would be to implement a customized search
 experience for each user. Would this mean creating a separate core per
 user? I think that this is not possible without restarting Solr after
 each
 core is added to the multi-core xml file, right?
 
 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].
 
 
 My use case is this...User A would like to index 5 RSS feeds and User B
 would like to index 5 completely different RSS feeds and he is not
 interested at all in what User A is interested in. This means that they
 would have to be separate index cores, right?
 
 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.
 
 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core directory structure.
 
 But, reindexing a very large index takes up a lot more time and resources
 and
 relevancy might be an issue depending on the rss feeds' contents.
 
 
 What is the best approach for this kind of thing?
 
 I'd usually store the feeds in a single index and shard if it's too many
 for a
 single server with your specifications. Unless the demands are too
 specific.
 
 
 Thanks in advance,
 Adam
 
 [1]: http://wiki.apache.org/solr/CoreAdmin
 
 Cheers
 



Re: Using Multiple Cores for Multiple Users

2010-11-10 Thread Shalin Shekhar Mangar
On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
estrada.adam.gro...@gmail.comwrote:

 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...


I think it is customary for me to mention the techniques mentioned in
LotsOfCores for these kind of questions. The patches are mostly useless at
this point but if you are looking for a per-user solution, you will need
most of the tricks mentioned on the wiki page.

http://wiki.apache.org/solr/LotsOfCores

-- 
Regards,
Shalin Shekhar Mangar.


Using Multiple Cores for Multiple Users

2010-11-09 Thread Adam Estrada
All,

I have a web application that requires the user to register and then login
to gain access to the site. Pretty standard stuff...Now I would like to know
what the best approach would be to implement a customized search
experience for each user. Would this mean creating a separate core per user?
I think that this is not possible without restarting Solr after each core is
added to the multi-core xml file, right?

My use case is this...User A would like to index 5 RSS feeds and User B
would like to index 5 completely different RSS feeds and he is not
interested at all in what User A is interested in. This means that they
would have to be separate index cores, right?

What is the best approach for this kind of thing?

Thanks in advance,
Adam


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Markus Jelsma
Hi,

 All,
 
 I have a web application that requires the user to register and then login
 to gain access to the site. Pretty standard stuff...Now I would like to
 know what the best approach would be to implement a customized search
 experience for each user. Would this mean creating a separate core per
 user? I think that this is not possible without restarting Solr after each
 core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration. 
Sometimes you must reindex after a change, the same is true for reloading 
cores. Check the wiki on this one [1].

 
 My use case is this...User A would like to index 5 RSS feeds and User B
 would like to index 5 completely different RSS feeds and he is not
 interested at all in what User A is interested in. This means that they
 would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can 
assign an user ID to those documents, creating a multi user index with rss 
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up 
additional memory and disk space, doesn't share caches etc.  There is also 
more maintenance and your need some support scripts to dynamically create new 
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and 
relevancy might be an issue depending on the rss feeds' contents. 

 
 What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a 
single server with your specifications. Unless the demands are too specific.

 
 Thanks in advance,
 Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
I'm willing to bet a lot that the standard approach is to use a Server Side 
Langauge to customize the queries for the user . . . on the same core/set of 
cores.

The only reasons that my limited experience suggests for a 'core per user' is 
privacy/performance. Unless you have a very small set of users, I would think 
managing cores for LOTS of users to be PIA. Create one (takes time), replicate 
to it (takes MORE time), use it, destroy it after session expires (requires 
garbage collection program running pretty often)(LOTS more time/CPU resource 
taken up.

I am happy to be corrected on any of this.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Markus Jelsma markus.jel...@openindex.io
To: solr-user@lucene.apache.org
Cc: Adam Estrada estrada.adam.gro...@gmail.com
Sent: Tue, November 9, 2010 3:57:34 PM
Subject: Re: Using Multiple Cores for Multiple Users

Hi,

 All,
 
 I have a web application that requires the user to register and then login
 to gain access to the site. Pretty standard stuff...Now I would like to
 know what the best approach would be to implement a customized search
 experience for each user. Would this mean creating a separate core per
 user? I think that this is not possible without restarting Solr after each
 core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration. 
Sometimes you must reindex after a change, the same is true for reloading 
cores. Check the wiki on this one [1].

 
 My use case is this...User A would like to index 5 RSS feeds and User B
 would like to index 5 completely different RSS feeds and he is not
 interested at all in what User A is interested in. This means that they
 would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can 
assign an user ID to those documents, creating a multi user index with rss 
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up 
additional memory and disk space, doesn't share caches etc.  There is also 
more maintenance and your need some support scripts to dynamically create new 
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and 
relevancy might be an issue depending on the rss feeds' contents. 

 
 What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a 
single server with your specifications. Unless the demands are too specific.

 
 Thanks in advance,
 Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers



RE: Using Multiple Cores for Multiple Users

2010-11-09 Thread Jonathan Rochkind
If storing in a single index (possibly sharded if you need it), you can simply 
include a solr field that specifies the user ID of the saved thing. On the 
client side, in your application, simply ensure that there is an fq parameter 
limiting to the current user, if you want to limit to the current user's stuff. 
 Relevancy ranking should work just as if you had 'seperate cores', there is no 
relevancy issue. 

It IS true that when your index gets very large, commits will start taking 
longer, which can be a problem. I don't mean commits will take longer just 
because there is more stuff to commit -- the larger the index, the longer an 
update to a single document will take to commit. 

In general, i suspect that having dozens or hundreds (or thousands!) of cores 
is not going to scale well, it is not going to make good use of your cpu/ram/hd 
resources.   Not really the intended use case of multiple cores. 

However, you are probably going to run into some issues with the single index 
approach too. In general, how to deal with multi-tenancy in Solr is an 
oft-asked question that there doesn't seem to be any just works and does 
everything for you without needing to think about it solution for in solr. 
Judging from past thread. I am not a Solr developer or expert. 


From: Markus Jelsma [markus.jel...@openindex.io]
Sent: Tuesday, November 09, 2010 6:57 PM
To: solr-user@lucene.apache.org
Cc: Adam Estrada
Subject: Re: Using Multiple Cores for Multiple Users

Hi,

 All,

 I have a web application that requires the user to register and then login
 to gain access to the site. Pretty standard stuff...Now I would like to
 know what the best approach would be to implement a customized search
 experience for each user. Would this mean creating a separate core per
 user? I think that this is not possible without restarting Solr after each
 core is added to the multi-core xml file, right?

No, you can dynamically manage cores and parts of their configuration.
Sometimes you must reindex after a change, the same is true for reloading
cores. Check the wiki on this one [1].


 My use case is this...User A would like to index 5 RSS feeds and User B
 would like to index 5 completely different RSS feeds and he is not
 interested at all in what User A is interested in. This means that they
 would have to be separate index cores, right?

If you view documents within an rss feed as a separate documents, you can
assign an user ID to those documents, creating a multi user index with rss
documents per user, or group or whatever.

Having a core per user isn't a good idea if you have many users.  It takes up
additional memory and disk space, doesn't share caches etc.  There is also
more maintenance and your need some support scripts to dynamically create new
cores - Solr currently doesn't create a new core directory structure.

But, reindexing a very large index takes up a lot more time and resources and
relevancy might be an issue depending on the rss feeds' contents.


 What is the best approach for this kind of thing?

I'd usually store the feeds in a single index and shard if it's too many for a
single server with your specifications. Unless the demands are too specific.


 Thanks in advance,
 Adam

[1]: http://wiki.apache.org/solr/CoreAdmin

Cheers


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Adam Estrada
Thanks a lot for all the tips, guys! I think that we may explore both
options just to see what happens. I'm sure that scalability will be a huge
mess with the core-per-user scenario. I like the idea of creating a user ID
field and agree that it's probably the best approach. We'll see...I will be
sure to let the list know what I find! Please don't stop posting your
comments everyone ;-) My inquiring mind wants to know...

Adam

On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].

 
  My use case is this...User A would like to index 5 RSS feeds and User B
  would like to index 5 completely different RSS feeds and he is not
  interested at all in what User A is interested in. This means that they
  would have to be separate index cores, right?

 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.

 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core directory structure.

 But, reindexing a very large index takes up a lot more time and resources
 and
 relevancy might be an issue depending on the rss feeds' contents.

 
  What is the best approach for this kind of thing?

 I'd usually store the feeds in a single index and shard if it's too many
 for a
 single server with your specifications. Unless the demands are too
 specific.

 
  Thanks in advance,
  Adam

 [1]: http://wiki.apache.org/solr/CoreAdmin

 Cheers



Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Lance Norskog
There is a standard problem with this: relevance is determined from
all of the words in a field of all documents, not just the documents
that match the query. That is, when user A searches for 'monkeys' and
one of his feeds has a document with this word, but someone else is a
zoophile, 'monkeys' will be a common word in the index. This will skew
the relevance computation for user A.

You could have a separate text field for each user. This might work
better- but you can't use field norms (they take up space for all
documents).

Lance

On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
estrada.adam.gro...@gmail.com wrote:
 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...

 Adam

 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].

 
  My use case is this...User A would like to index 5 RSS feeds and User B
  would like to index 5 completely different RSS feeds and he is not
  interested at all in what User A is interested in. This means that they
  would have to be separate index cores, right?

 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.

 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core directory structure.

 But, reindexing a very large index takes up a lot more time and resources
 and
 relevancy might be an issue depending on the rss feeds' contents.

 
  What is the best approach for this kind of thing?

 I'd usually store the feeds in a single index and shard if it's too many
 for a
 single server with your specifications. Unless the demands are too
 specific.

 
  Thanks in advance,
  Adam

 [1]: http://wiki.apache.org/solr/CoreAdmin

 Cheers





-- 
Lance Norskog
goks...@gmail.com


Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
hm, relevance is before filtering, probably during indexing?
 Dennis Gearon 


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die. 



- Original Message 
From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 7:07:45 PM
Subject: Re: Using Multiple Cores for Multiple Users

There is a standard problem with this: relevance is determined from
all of the words in a field of all documents, not just the documents
that match the query. That is, when user A searches for 'monkeys' and
one of his feeds has a document with this word, but someone else is a
zoophile, 'monkeys' will be a common word in the index. This will skew
the relevance computation for user A.

You could have a separate text field for each user. This might work
better- but you can't use field norms (they take up space for all
documents).

Lance

On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
estrada.adam.gro...@gmail.com wrote:
 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...

 Adam

 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].

 
  My use case is this...User A would like to index 5 RSS feeds and User B
  would like to index 5 completely different RSS feeds and he is not
  interested at all in what User A is interested in. This means that they
  would have to be separate index cores, right?

 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.

 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core directory structure.

 But, reindexing a very large index takes up a lot more time and resources
 and
 relevancy might be an issue depending on the rss feeds' contents.

 
  What is the best approach for this kind of thing?

 I'd usually store the feeds in a single index and shard if it's too many
 for a
 single server with your specifications. Unless

Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Lance Norskog
Relevance is TF/DF, meaning the term frequency in the index. DF is the
number of times the term appears in the document.

There is no quick calculation for total frequency for terms only in
these documents. Facets do this, and they're very very slow.

On Tue, Nov 9, 2010 at 7:50 PM, Dennis Gearon gear...@sbcglobal.net wrote:
 hm, relevance is before filtering, probably during indexing?
  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a 
 better
 idea to learn from others’ mistakes, so you do not have to make them yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, November 9, 2010 7:07:45 PM
 Subject: Re: Using Multiple Cores for Multiple Users

 There is a standard problem with this: relevance is determined from
 all of the words in a field of all documents, not just the documents
 that match the query. That is, when user A searches for 'monkeys' and
 one of his feeds has a document with this word, but someone else is a
 zoophile, 'monkeys' will be a common word in the index. This will skew
 the relevance computation for user A.

 You could have a separate text field for each user. This might work
 better- but you can't use field norms (they take up space for all
 documents).

 Lance

 On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
 estrada.adam.gro...@gmail.com wrote:
 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...

 Adam

 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts of their configuration.
 Sometimes you must reindex after a change, the same is true for reloading
 cores. Check the wiki on this one [1].

 
  My use case is this...User A would like to index 5 RSS feeds and User B
  would like to index 5 completely different RSS feeds and he is not
  interested at all in what User A is interested in. This means that they
  would have to be separate index cores, right?

 If you view documents within an rss feed as a separate documents, you can
 assign an user ID to those documents, creating a multi user index with rss
 documents per user, or group or whatever.

 Having a core per user isn't a good idea if you have many users.  It takes
 up
 additional memory and disk space, doesn't share caches etc.  There is also
 more maintenance and your need some support scripts to dynamically create
 new
 cores - Solr currently doesn't create a new core

Re: Using Multiple Cores for Multiple Users

2010-11-09 Thread Dennis Gearon
So, if my other filter/selection criteria get some set of the whole index that 
goes say from 50% relevance to 60% relevance, the set still gets ordered by 
relevance and then each item in the returned set is still based on its 
relevance 
relative to the set, right? That would only be a problem if there was some 
minimal relevance desired, right?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Tue, November 9, 2010 8:00:09 PM
Subject: Re: Using Multiple Cores for Multiple Users

Relevance is TF/DF, meaning the term frequency in the index. DF is the
number of times the term appears in the document.

There is no quick calculation for total frequency for terms only in
these documents. Facets do this, and they're very very slow.

On Tue, Nov 9, 2010 at 7:50 PM, Dennis Gearon gear...@sbcglobal.net wrote:
 hm, relevance is before filtering, probably during indexing?
  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a 
better
 idea to learn from others’ mistakes, so you do not have to make them yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tue, November 9, 2010 7:07:45 PM
 Subject: Re: Using Multiple Cores for Multiple Users

 There is a standard problem with this: relevance is determined from
 all of the words in a field of all documents, not just the documents
 that match the query. That is, when user A searches for 'monkeys' and
 one of his feeds has a document with this word, but someone else is a
 zoophile, 'monkeys' will be a common word in the index. This will skew
 the relevance computation for user A.

 You could have a separate text field for each user. This might work
 better- but you can't use field norms (they take up space for all
 documents).

 Lance

 On Tue, Nov 9, 2010 at 6:00 PM, Adam Estrada
 estrada.adam.gro...@gmail.com wrote:
 Thanks a lot for all the tips, guys! I think that we may explore both
 options just to see what happens. I'm sure that scalability will be a huge
 mess with the core-per-user scenario. I like the idea of creating a user ID
 field and agree that it's probably the best approach. We'll see...I will be
 sure to let the list know what I find! Please don't stop posting your
 comments everyone ;-) My inquiring mind wants to know...

 Adam

 On Tue, Nov 9, 2010 at 7:34 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 If storing in a single index (possibly sharded if you need it), you can
 simply include a solr field that specifies the user ID of the saved thing.
 On the client side, in your application, simply ensure that there is an fq
 parameter limiting to the current user, if you want to limit to the current
 user's stuff.  Relevancy ranking should work just as if you had 'seperate
 cores', there is no relevancy issue.

 It IS true that when your index gets very large, commits will start taking
 longer, which can be a problem. I don't mean commits will take longer just
 because there is more stuff to commit -- the larger the index, the longer an
 update to a single document will take to commit.

 In general, i suspect that having dozens or hundreds (or thousands!) of
 cores is not going to scale well, it is not going to make good use of your
 cpu/ram/hd resources.   Not really the intended use case of multiple cores.

 However, you are probably going to run into some issues with the single
 index approach too. In general, how to deal with multi-tenancy in Solr is
 an oft-asked question that there doesn't seem to be any just works and does
 everything for you without needing to think about it solution for in solr.
 Judging from past thread. I am not a Solr developer or expert.

 
 From: Markus Jelsma [markus.jel...@openindex.io]
 Sent: Tuesday, November 09, 2010 6:57 PM
 To: solr-user@lucene.apache.org
 Cc: Adam Estrada
 Subject: Re: Using Multiple Cores for Multiple Users

 Hi,

  All,
 
  I have a web application that requires the user to register and then
 login
  to gain access to the site. Pretty standard stuff...Now I would like to
  know what the best approach would be to implement a customized search
  experience for each user. Would this mean creating a separate core per
  user? I think that this is not possible without restarting Solr after
 each
  core is added to the multi-core xml file, right?

 No, you can dynamically manage cores and parts