Re: Re[2]: multiple indices

2007-09-17 Thread Matt Kangas
Jack, the JNDI-enabling jarfiles now ship as part of the main .zip  
distribution. There is no need for a separate JettyPlus download as  
of Jetty 6.


I used Jetty 6.1.3 (http://dist.codehaus.org/jetty/jetty-6.1.x/ 
jetty-6.1.3.zip) at the time, and I am using only these jarfiles from  
the main distribution. I stripped everything else out that seemed  
unnecessary for running Solr.


lib/jetty-6.1.3.jar
lib/jetty-util-6.1.3.jar
lib/jsp-2.1/ant-1.6.5.jar
lib/jsp-2.1/core-3.1.1.jar
lib/jsp-2.1/jsp-2.1.jar
lib/jsp-2.1/jsp-api-2.1.jar
lib/naming/jetty-naming-6.1.3.jar
lib/plus/jetty-plus-6.1.3.jar
lib/servlet-api-2.5-6.1.3.jar

--Matt

On Sep 13, 2007, at 11:44 AM, Jack L wrote:


Thanks Matt, I'll give it a try! So this requires JettyPlus?

--
Best regards,
Jack

Wednesday, September 12, 2007, 5:14:32 AM, you wrote:


Jack, I've posted a complete recipe for running two Solr indices
within one Jetty 6 container:



http://wiki.apache.org/solr/SolrJetty



Scroll down to the part that says:

(7/2007 MattKangas) The recipe above didn't work for me with Jetty
6.1.3.

...

I'm glossing over a lot of details, so attached is a tarball with a
known-good configuration that runs two Solr instances inside one
Jetty container. I'm using Solr 1.2.0 and Jetty 6.1.3 respectively.





Hope this helps,
--matt



On Sep 11, 2007, at 11:52 AM, Jack L wrote:



I was going through some old emails on this topic. Rafael Rossini
figured
out how to run multiple indices on single instance of jetty but it
has to
be jetty plus. I guess jetty doesn't allow this? I suppose I can add
additional jars and make it work but I haven't tried that. It'll
always be much safer/simpler/less playing around if a feature is
available out of box.

I'm mentioning this again because I really think it's a desirable
feature,
especially because each JVM uses a lot of memory and sometimes it's
not possible to start a new jetty for each index due to memory
limitation.

I understand I can use a type field and mix doc types but this is  
not

ideal for two reasons:

1. it's easier to maintain separate indices. I can just wipe out all
the files and re-post an individual index. Much less posting work to
do as opposed to re-posting all docs. Or I can move one index to
another partition, or even to another server to run separately in
order to scale up. It'll be a problem (although solvable by deleting
and re-posting) with a mixed index.

2. my understanding is that mixed index means larger index files and
slower performance

JettyPlus's download links seem to be broken so I wasn't able to  
check

its download size. If not too big, maybe JettyPlus is an option?
If not, there should be a way to have this feature implemented on  
solr

side? Maybe by prefixing the REST URLs with index names...

--
Thanks,
Jack




--
Matt Kangas / [EMAIL PROTECTED]





--
Matt Kangas / [EMAIL PROTECTED]




Re[2]: multiple indices

2007-09-13 Thread Jack L
Thanks Matt, I'll give it a try! So this requires JettyPlus?

-- 
Best regards,
Jack

Wednesday, September 12, 2007, 5:14:32 AM, you wrote:

 Jack, I've posted a complete recipe for running two Solr indices  
 within one Jetty 6 container:

 http://wiki.apache.org/solr/SolrJetty

 Scroll down to the part that says:
 (7/2007 MattKangas) The recipe above didn't work for me with Jetty
 6.1.3.

 ...

 I'm glossing over a lot of details, so attached is a tarball with a
 known-good configuration that runs two Solr instances inside one  
 Jetty container. I'm using Solr 1.2.0 and Jetty 6.1.3 respectively.



 Hope this helps,
 --matt

 On Sep 11, 2007, at 11:52 AM, Jack L wrote:

 I was going through some old emails on this topic. Rafael Rossini  
 figured
 out how to run multiple indices on single instance of jetty but it
 has to
 be jetty plus. I guess jetty doesn't allow this? I suppose I can add
 additional jars and make it work but I haven't tried that. It'll
 always be much safer/simpler/less playing around if a feature is
 available out of box.

 I'm mentioning this again because I really think it's a desirable  
 feature,
 especially because each JVM uses a lot of memory and sometimes it's
 not possible to start a new jetty for each index due to memory
 limitation.

 I understand I can use a type field and mix doc types but this is not
 ideal for two reasons:

 1. it's easier to maintain separate indices. I can just wipe out all
 the files and re-post an individual index. Much less posting work to
 do as opposed to re-posting all docs. Or I can move one index to
 another partition, or even to another server to run separately in
 order to scale up. It'll be a problem (although solvable by deleting
 and re-posting) with a mixed index.

 2. my understanding is that mixed index means larger index files and
 slower performance

 JettyPlus's download links seem to be broken so I wasn't able to check
 its download size. If not too big, maybe JettyPlus is an option?
 If not, there should be a way to have this feature implemented on solr
 side? Maybe by prefixing the REST URLs with index names...

 -- 
 Thanks,
 Jack


 --
 Matt Kangas / [EMAIL PROTECTED]




multiple indices

2007-09-11 Thread Jack L
I was going through some old emails on this topic. Rafael Rossini figured
out how to run multiple indices on single instance of jetty but it has to
be jetty plus. I guess jetty doesn't allow this? I suppose I can add
additional jars and make it work but I haven't tried that. It'll
always be much safer/simpler/less playing around if a feature is
available out of box.

I'm mentioning this again because I really think it's a desirable feature,
especially because each JVM uses a lot of memory and sometimes it's
not possible to start a new jetty for each index due to memory
limitation.

I understand I can use a type field and mix doc types but this is not
ideal for two reasons:

1. it's easier to maintain separate indices. I can just wipe out all
the files and re-post an individual index. Much less posting work to
do as opposed to re-posting all docs. Or I can move one index to
another partition, or even to another server to run separately in
order to scale up. It'll be a problem (although solvable by deleting
and re-posting) with a mixed index.

2. my understanding is that mixed index means larger index files and
slower performance

JettyPlus's download links seem to be broken so I wasn't able to check
its download size. If not too big, maybe JettyPlus is an option?
If not, there should be a way to have this feature implemented on solr
side? Maybe by prefixing the REST URLs with index names...

-- 
Thanks,
Jack



Re: multiple indices

2007-09-11 Thread Mike Klaas

On 11-Sep-07, at 8:52 AM, Jack L wrote:

I was going through some old emails on this topic. Rafael Rossini  
figured
out how to run multiple indices on single instance of jetty but it  
has to

be jetty plus. I guess jetty doesn't allow this? I suppose I can add
additional jars and make it work but I haven't tried that. It'll
always be much safer/simpler/less playing around if a feature is
available out of box.


The example that comes with Solr is meant to be a starting point for  
users.  It is a relatively functional and well-commented example, and  
its config files are pretty much the canonical documentation for solr  
config, and for many people they can modifying it for their own  
production use


but it is still just an example application.

By the time people want to do expert-level activities with Solr  
(multi-index falls into that category), they should be able to  
configure their own servlet container, whether it be jetty plus,  
tomcat, resin, etc.



1. it's easier to maintain separate indices. I can just wipe out all
the files and re-post an individual index. Much less posting work to
do as opposed to re-posting all docs. Or I can move one index to
another partition, or even to another server to run separately in
order to scale up. It'll be a problem (although solvable by deleting
and re-posting) with a mixed index.



2. my understanding is that mixed index means larger index files and
slower performance


Both of these are true, but do not typically have to be decided at  
minute zero when developing a project with solr.  I recent split our  
main index into two separate solr installations in a single jettyplus  
container and it was less than a day's work (most of it was tweaking  
interface code on our side, not the solr config itsefl).



JettyPlus's download links seem to be broken so I wasn't able to check
its download size. If not too big, maybe JettyPlus is an option?
If not, there should be a way to have this feature implemented on solr
side? Maybe by prefixing the REST URLs with index names...


There just might be something like that in 1.3...

-Mike


Re[2]: multiple indices

2007-09-11 Thread Jack L
Hello Mike,

 but it is still just an example application.

I think this is a very modest statement. I'd like to say both solr
(including the example) and jetty are production level software.
I suppose many users, like me, will just take it and make minimum
modification of the configs and use it on a production server.
That's exactly what I did and it has been working great.

If multiple indices feature is available out of the box, it'll become
another great feature that's easily accessible for entry level users,
like me :)

 There just might be something like that in 1.3...

I'm looking forward to it!

Jack



RE: multiple indices

2007-09-11 Thread George Aroush
  I was going through some old emails on this topic. Rafael Rossini 
  figured out how to run multiple indices on single instance of jetty 
  but it has to be jetty plus. I guess jetty doesn't allow this? I 
  suppose I can add additional jars and make it work but I 
 haven't tried 
  that. It'll always be much safer/simpler/less playing around if a 
  feature is available out of box.
 
 The example that comes with Solr is meant to be a starting 
 point for users.  It is a relatively functional and 
 well-commented example, and its config files are pretty much 
 the canonical documentation for solr config, and for many 
 people they can modifying it for their own production use
 
 but it is still just an example application.
 
 By the time people want to do expert-level activities with 
 Solr (multi-index falls into that category), they should be 
 able to configure their own servlet container, whether it be 
 jetty plus, tomcat, resin, etc.

Does this means Solr 1.2 supports MultiSearcher?

-- George



Re: multiple indices

2007-09-11 Thread Mike Klaas

On 11-Sep-07, at 3:32 PM, George Aroush wrote:



The example that comes with Solr is meant to be a starting
point for users.  It is a relatively functional and
well-commented example, and its config files are pretty much
the canonical documentation for solr config, and for many
people they can modifying it for their own production use

but it is still just an example application.

By the time people want to do expert-level activities with
Solr (multi-index falls into that category), they should be
able to configure their own servlet container, whether it be
jetty plus, tomcat, resin, etc.


Does this means Solr 1.2 supports MultiSearcher?


No, I'm purely talking about housing two different indices on the  
same machine with Solr.  They may be related, they may not be.


Currently (1.2), the options are:

1. multiple processes/servlet containers/jvms
2. multiple instances of solr webapps within a single container/ 
process/jvm


In the future, (1.3 or farther down the line), another option will be:

3. multiple indices within a single solr webapp, added/removed on the  
fly.


-Mike


Re: multiple indices

2007-06-27 Thread Rafael Rossini

I have 3 different instances of solr on jetty 6.1.13, but you need the jetty
plus.
my etc/jetty.xml looks like this

   Call name=addLifeCycle
 Arg
New class=org.mortbay.jetty.webapp.WebAppContext
   ArgRef id=Contexts//Arg
   ArgSystemProperty name=jetty.home default=./*
/webapps/solr1*/Arg
   Arg*/solr1*/Arg
  Set name=ConfigurationClassesRef id=plusConfig//Set
  Set name=defaultsDescriptorSystemProperty name=
jetty.home default=.//etc/webdefault.xml/Set
  New id=solr_home class=
org.mortbay.jetty.plus.naming.EnvEntry
 Argsolr/home/Arg
 Arg type=java.lang.StringSystemProperty name=
jetty.home default=./override this value/Arg
  /New
 /New
 /Arg
   /Call
   Call name=addLifeCycle
 Arg
New class=org.mortbay.jetty.webapp.WebAppContext
   ArgRef id=Contexts//Arg
   ArgSystemProperty name=jetty.home default=./*
/webapps/solr2*/Arg
   Arg*/solr2*/Arg
  Set name=ConfigurationClassesRef id=plusConfig//Set
  Set name=defaultsDescriptorSystemProperty name=
jetty.home default=.//etc/webdefault.xml/Set
  New id=solr_home class=
org.mortbay.jetty.plus.naming.EnvEntry
 Argsolr/home/Arg
 Arg type=java.lang.StringSystemProperty name=
jetty.home default=./override this value/Arg
  /New
 /New
 /Arg
   /Call


then, on the webapps/solr1/WEB-INF you need a jetty-env.xml like this:

?xml version=1.0?
!DOCTYPE Configure PUBLIC -//Mort Bay Consulting//DTD Configure//EN 
http://jetty.mortbay.org/configure.dtd;

Configure class=org.mortbay.jetty.webapp.WebAppContext

!-- Add an override for a global EnvEntry   --
New id=solr_home class=org.mortbay.jetty.plus.naming.EnvEntry
 Argsolr/home/Arg
 Arg type=java.lang.StringSystemProperty name=jetty.home
default=.//solr1/Arg
/New

/Configure



Hope it helps



On 6/26/07, Otis Gospodnetic [EMAIL PROTECTED] wrote:


Hm, that JNDI again... this makes it sound like SOLR-215 is completely
superfluous?
I have not configured Jetty this way yet, but I do see some docs on
http://wiki.apache.org/solr/SolrJetty .  Interestingly, the configs look a
lot different than what's described on
http://docs.codehaus.org/display/JETTY/JNDI .  I also remember Jetty Plus
from a while back, but now I cannot find any information about Jetty Plus
6.*, only 5 - http://jetty.mortbay.org/jetty5/plus/index.html .

Otis



- Original Message 
From: Chris Hostetter [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday, June 26, 2007 8:10:46 PM
Subject: Re: multiple indices


:   I have multiple applications (blogs/forums/video/etc) - each of these
: is independent (no need to perform queries on multiple indices).

:   Would it be best to use multiple instances of SOLR/JVM - one for each
: index or use a solution where only one JVM instance is running (maybe
: solr-215?)?


you don't actaully need multiple JVM instances to run multiple Solr
instance ... you can configure your ServletContainer to run the solr.war
in multiple contexts each of which has a differnet solrconfig.xml and
schema.xml (using JNDI) ... that way you get most of hte benefits of
isolated instances but also can also take advantage of a single large heap
and common connection management.




-Hoss







Re: multiple indices

2007-06-27 Thread Chris Hostetter

: Hm, that JNDI again... this makes it sound like SOLR-215 is completely
: superfluous?

No ... i still haven't had a chance to review the patch, but Henri makes
some great argmuments for the WHY of the patch in the issue
description...

 Multiple cores:
 Deployment issues within some organizations where IT will resist
 deploying multiple web applications.
 Seamless schema update where you can create a new core and switch to
 it without starting/stopping servers.
 Embedding Solr in your own application (instead of 'raw' Lucene) and
 functionally need to segregate schemas  collections.

(there are some other arguments i'm not sure i buy into, but these seem
very justified)


-Hoss



multiple indices

2007-06-26 Thread michael ravits
dear solrs - thanks for all your help.
   
  I have multiple applications (blogs/forums/video/etc) - each of these is 
independent (no need to perform queries on multiple indices).
  Would it be best to use multiple instances of SOLR/JVM - one for each index 
or use a solution where only one JVM instance is running (maybe solr-215?)?
   
  Considering mostly performance and load on the servers.

   
-
Be a better Heartthrob. Get better relationship answers from someone who knows.
Yahoo! Answers - Check it out. 

Re: multiple indices

2007-06-26 Thread Otis Gospodnetic
I would use SOLR-215 instead of running multiple instances on the same box.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: michael ravits [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday, June 26, 2007 3:57:12 PM
Subject: multiple indices

dear solrs - thanks for all your help.
   
  I have multiple applications (blogs/forums/video/etc) - each of these is 
independent (no need to perform queries on multiple indices).
  Would it be best to use multiple instances of SOLR/JVM - one for each index 
or use a solution where only one JVM instance is running (maybe solr-215?)?
   
  Considering mostly performance and load on the servers.

   
-
Be a better Heartthrob. Get better relationship answers from someone who knows.
Yahoo! Answers - Check it out. 




Re: Searching multiple indices (solr newbie)

2007-01-12 Thread Erik Hatcher


On Jan 8, 2007, at 3:13 AM, Chris Hostetter wrote:
:  with a single schema -- but dynamicFields are used to store  
category
:  specific fields, so that if you are doing a category specific  
search,

:  category specific filters can be offered to you...
: 
:  http://shopper.cnet.com/4144-6501_9-0-1.html?query=canon
:
: Could you elaborate a bit more about how the front-end and back-end
: work to communicate the category specific filters?
:
: Showing *all* facets from the start can be unnecessarily too much  
for
: a user to navigate, so I'm interested in how to develop a system  
that

: can adjust the facets shown based on context a bit more elegantly.

I'm not sure i understand the question.  Our Facet lists (and
per facet constraint lists) are specific to each category -- once a
specific category has been specified by the application server, a  
custom

request handler parses the list of facets/constraints from a metadata
document for thta category, computes the intersection for each  
constraint

query and returns *all* of the information to the application -- it
normally shows only the first 3 facets not yet constrained, and for  
each
of those facets it shows the labels for the best constraints,  
where the
defintion of best depends on the data type of the facet -- by  
default

it's the ones with the largest counts, or in natural order for
numeric ranges, but there are overrides for extremely popular  
constraints.


there are a lot of optimizations that could be done in the Solr  
plugin to
only compute the counts for facets/constraints we know we wnat to  
display
-- but i specificly made it compute everything so the frontened  
could make

whatever choices it wanted about displaying facets without needing to
change the Solr plugin.  (it's on the list as a potential  
optimization to
move some of that logic into the requset handler, but there haven't  
been

any complaints about hte performance)

does that answer your question?


sorry for the delayed reply.  that answers the question perfectly,  
thanks!


Erik



Re: Searching multiple indices (solr newbie)

2007-01-09 Thread Mekin Maheshwari

: http://cnet.search.com/search?chkpt=astg.cnet.fd.search.cnetq=canontag=srch

I just so happen to have a bit of insight into how that page works, and
while it's true that it queries multiple indexes with differnet schemas,
it makes no attepts to merge the results -- the product results come
from one index, the features come from another, the downloads come
from a third, the blogs come from a forth, etc...  but all of hte
products are from a single index, even if the products are in differnet
categories (ie: cameras vs printers) they are still kept in a single index
with a single schema -- but dynamicFields are used to store category
specific fields, so that if you are doing a category specific search,
category specific filters can be offered to you...



This is very interesting.

I was living with a few assumptions that dont hold true in the solr
world, (things like updating indexes is not easy).

Is there any cost to using dynamicFields ?
Any disadvantage ?
Do dynamic fields have norms turned on by default ?
Can I boost on them at query time ?

I might end up exploding the size of the index.


In general I felt that smaller indexes with different requirements
might be more flexible than 1 large index (Would a  3G index
considered large ?). eg. backing up the index, deploying a fresh
index, etc. But Solr does address most of these.

The assumption could be baseless now  I should probably consider
having 1 index for all categories.

Thanks,
mekin


Re: Searching multiple indices (solr newbie)

2007-01-09 Thread Yonik Seeley

On 1/9/07, Mekin Maheshwari [EMAIL PROTECTED] wrote:

Is there any cost to using dynamicFields ?


Very little.  A dynamicField is created on demand during indexing or
searching, but the SchemaField created only has 4 pointers in it.

dynamicField is a Solr concept, not a Lucene one... so the only other
impacts are due to having many fields:
1) some info about each field is kept in memory, but not too much
2) a possibly larger term index since field1:val1 and field2:val1 are
separate terms
3) merging segments goes through the fields in a linear fasion
4) norms!

If norms are omitted, you could have thousands of fields without much impact.


Any disadvantage ?
Do dynamic fields have norms turned on by default ?


Like any other field in this regard... omitNorms needs to be set to
true on the fieldtype or the field.


Can I boost on them at query time ?


Yes, norms are only related to index-time boosts and length normalization.


I might end up exploding the size of the index.


In general I felt that smaller indexes with different requirements
might be more flexible than 1 large index (Would a  3G index
considered large ?). eg. backing up the index, deploying a fresh
index, etc. But Solr does address most of these.


That's medium sized.  Shouldn't be too big for a single index if
that's the route you want to take.


The assumption could be baseless now  I should probably consider
having 1 index for all categories.


-Yonik


Re: Searching multiple indices (solr newbie)

2007-01-09 Thread J.J. Larrea
+2 cents:

At 2:43 PM +0530 1/9/07, Mekin Maheshwari wrote:
In general I felt that smaller indexes with different requirements
might be more flexible than 1 large index (Would a  3G index
considered large ?). eg. backing up the index, deploying a fresh
index, etc. But Solr does address most of these.

3Gb indexes are not at all unreasonable -- I have a Lucene-based (soon-to-be 
SOLR-based) app which uses 5 indexes, the biggest of which is 3.8Gb.  The 
combined index is 6.7Gb.

The assumption could be baseless now  I should probably consider
having 1 index for all categories.

An important thing to note is that Lucene does not store information in a grid 
as do RDBMSs, it only stores the fields which are explicitly defined for each 
Document. So if some class of Documents has a set of class-specific fields, 
there is no storage penalty for the non-class Documents which don't have them.  
And Lucene's querying mechanism is very efficient at dealing with sparse values 
in the index so the query-time penalty is slight.

As Hoss pointed out, SOLR's wildcard-field specification makes it very simple 
take advantage of Lucene's sparse storage: SOLR will tell Lucene to index 
and/or store any field matching one of the wildcard patterns, and the Request 
Handlers will allow * as a field name which returns all stored fields in the 
resulting documents.

So while there may still be some issues needing to be worked out with a single 
index in your specific case, it is probably much simpler than integrating hits 
from multiple indexes.

- J.J.


Re: Searching multiple indices (solr newbie)

2007-01-08 Thread Chris Hostetter

:  with a single schema -- but dynamicFields are used to store category
:  specific fields, so that if you are doing a category specific search,
:  category specific filters can be offered to you...
: 
:  http://shopper.cnet.com/4144-6501_9-0-1.html?query=canon
:
: Could you elaborate a bit more about how the front-end and back-end
: work to communicate the category specific filters?
:
: Showing *all* facets from the start can be unnecessarily too much for
: a user to navigate, so I'm interested in how to develop a system that
: can adjust the facets shown based on context a bit more elegantly.

I'm not sure i understand the question.  Our Facet lists (and
per facet constraint lists) are specific to each category -- once a
specific category has been specified by the application server, a custom
request handler parses the list of facets/constraints from a metadata
document for thta category, computes the intersection for each constraint
query and returns *all* of the information to the application -- it
normally shows only the first 3 facets not yet constrained, and for each
of those facets it shows the labels for the best constraints, where the
defintion of best depends on the data type of the facet -- by default
it's the ones with the largest counts, or in natural order for
numeric ranges, but there are overrides for extremely popular constraints.

there are a lot of optimizations that could be done in the Solr plugin to
only compute the counts for facets/constraints we know we wnat to display
-- but i specificly made it compute everything so the frontened could make
whatever choices it wanted about displaying facets without needing to
change the Solr plugin.  (it's on the list as a potential optimization to
move some of that logic into the requset handler, but there haven't been
any complaints about hte performance)

does that answer your question?


-Hoss



Re: Searching multiple indices (solr newbie)

2007-01-07 Thread Erik Hatcher


On Jan 5, 2007, at 1:59 AM, Chris Hostetter wrote:



: The issue is best described with an example:
: search for canon - matches multiple categories, which will have very
: different schemas
: http://cnet.search.com/search? 
chkpt=astg.cnet.fd.search.cnetq=canontag=srch


I just so happen to have a bit of insight into how that page works,  
and
while it's true that it queries multiple indexes with differnet  
schemas,

it makes no attepts to merge the results -- the product results come
from one index, the features come from another, the downloads come
from a third, the blogs come from a forth, etc...  but all of hte
products are from a single index, even if the products are in  
differnet
categories (ie: cameras vs printers) they are still kept in a  
single index

with a single schema -- but dynamicFields are used to store category
specific fields, so that if you are doing a category specific search,
category specific filters can be offered to you...

http://shopper.cnet.com/4144-6501_9-0-1.html?query=canon


Could you elaborate a bit more about how the front-end and back-end  
work to communicate the category specific filters?


Showing *all* facets from the start can be unnecessarily too much for  
a user to navigate, so I'm interested in how to develop a system that  
can adjust the facets shown based on context a bit more elegantly.


Thanks,
Erik



Re: Searching multiple indices (solr newbie)

2007-01-04 Thread Mekin Maheshwari

Thanks Chris.


http://wiki.apache.org/solr/FederatedSearch


Thats useful  I might be getting close to that size soon.


The issue is best described with an example:
search for canon - matches multiple categories, which will have very
different schemas
http://cnet.search.com/search?chkpt=astg.cnet.fd.search.cnetq=canontag=srch

While this might vary with applications.
2 options that come to my mind:

1. Have 1 index that has ALL categories of products. Possibly 1
generic index with all searchable text, and separate indices when you
know the categroy user is looking for. But with this ranking products
well becomes very difficult.


2. Have separate indices for each category of products. Query them 
merge the results from them. Merging across indices is a difficult
issue. I can draw some learning from the federated search, but this
would more likely be business logic.



Thanks,
mekin



On 1/4/07, Chris Hostetter [EMAIL PROTECTED] wrote:


Mekin: Yonik has done some brainstorming on ways of supporting Feterated
searching across multiple instances of Solr - but the main motivation
there is to deal with homogeneous indexes which are too big to fit on a
single host efficiently...

http://wiki.apache.org/solr/FederatedSearch

..if you've got seperate schemas for each of your indexes, then you have
to query then in different ways, so how could you meaningfully merge the
scores?




-Hoss





--
My company - http://ugenie.com
My Blog - http://mekin.livejournal.com/
a href=http://www.linkedin.com/in/mekin;My linkedIn URL/a


Re: Searching multiple indices (solr newbie)

2007-01-04 Thread Chris Hostetter

: The issue is best described with an example:
: search for canon - matches multiple categories, which will have very
: different schemas
: http://cnet.search.com/search?chkpt=astg.cnet.fd.search.cnetq=canontag=srch

I just so happen to have a bit of insight into how that page works, and
while it's true that it queries multiple indexes with differnet schemas,
it makes no attepts to merge the results -- the product results come
from one index, the features come from another, the downloads come
from a third, the blogs come from a forth, etc...  but all of hte
products are from a single index, even if the products are in differnet
categories (ie: cameras vs printers) they are still kept in a single index
with a single schema -- but dynamicFields are used to store category
specific fields, so that if you are doing a category specific search,
category specific filters can be offered to you...

http://shopper.cnet.com/4144-6501_9-0-1.html?query=canon

: 1. Have 1 index that has ALL categories of products. Possibly 1
: generic index with all searchable text, and separate indices when you
: know the categroy user is looking for. But with this ranking products
: well becomes very difficult.

as i said, you don't relaly need the category specific indexes ... but why
do you think this approach makes ranking products well difficult ?


-Hoss



Re: Multiple indices

2006-03-22 Thread Chris Hostetter

: norms, term vectors, field caches, etc.  It almost sounds like one
: would want multiple instances of Solr in the same app container.  But
: if that's the case, you aren't saving much over just having multiple
: app servers.

I didn't say anything yesterday because i pretty much agreed with yonik
... but it occured to me this morning that if you have a finite amout of
ram on a box (which we all do) 'R' gigs, and you want to run a variable
number of solr indexs 'N, the only way to do this currently is to run
N appserver instances and explicitly configure the JVM for each to use at
most R/N Gigs max heap ... which would bnot only make addign a new index a
pain (you have to reconfigure every appserver instance) but it also means
that you are managine your memory instead of letting java do it.

if it was possible to run several indexes in one appserver, then you
wouldn't have to wworry about this (but if you wnated to worry about it,
you could run some or all of your indexes in their own appserver
instances)

perhaps one way to go would be if solr paid attention to what webapp name
it was being run in, and used that to determine the directory it looked in
for it's configs/index? ... so most people would just intall solr.war
and it would look for everything in ./solr/  -- but if i want multiple
idexes i copy solr.war was solr-people.war, solr-places.war, and
solr-things.war, and now i've got 3 webapps running solr, and 3 seperate
directories containing 3 seperate indexes and configs... ./solr-people/,
./solr-places/, ./solar-things, etc...




-Hoss



Re: Multiple indices

2006-03-21 Thread Yonik Seeley
On 3/21/06, Grant Ingersoll [EMAIL PROTECTED] wrote:
 I was wondering if it is possible to have one SOLR instance host multiple 
 indices?  Otherwise, I would need to deploy a separate WAR for every SOLR 
 instance I want, correct?

It's not currently possible.  A fair amount would have to change to
support that I think...
- index id in all update commands (or a change in the URL path)
- index id in all query commands (or a change in the URL path)

Would a single index searcher go over multiple unrelated indicies?  I
think not, since that would tend to cause problems for thinks like
norms, term vectors, field caches, etc.  It almost sounds like one
would want multiple instances of Solr in the same app container.  But
if that's the case, you aren't saving much over just having multiple
app servers.

-Yonik