How to setup search engine for B2B web app

2010-04-25 Thread Bill Paetzke
*Given:*

   - 1 database per client (business customer)
   - 5000 clients
   - Clients have between 2 to 2000 users (avg is ~100 users/client)
   - 100k to 10 million records per database
   - Users need to search those records often (it's the best way to navigate
   their data)

*The Question:*

How would you setup Solr (or Lucene) search so that each client can only
search within its database?

How would you setup the index(es)?
Where do you store the index(es)?
Would you need to add a filter to all search queries?
If a client cancelled, how would you delete their (part of the) index? (this
may be trivial--not sure yet)

I asked this question on
StackOverflow.comhttp://stackoverflow.com/questions/2707055/how-to-setup-lucene-search-for-a-b2b-web-app.
 I would like it better, if you answered there. Thanks.


Re: How to setup search engine for B2B web app

2010-04-25 Thread Shalin Shekhar Mangar
Hi Bill,

On Sun, Apr 25, 2010 at 12:23 PM, Bill Paetzke billpaet...@gmail.comwrote:

 *Given:*

   - 1 database per client (business customer)
   - 5000 clients
   - Clients have between 2 to 2000 users (avg is ~100 users/client)
   - 100k to 10 million records per database
   - Users need to search those records often (it's the best way to navigate
   their data)

 *The Question:*

 How would you setup Solr (or Lucene) search so that each client can only
 search within its database?

 How would you setup the index(es)?


I'd look at setting up multiple cores for each client. You may need to setup
slaves as well depending on search traffic.


 Where do you store the index(es)?


Setting up 5K cores on one box will not work. So you will need to partition
the clients into multiple boxes each having a subset of cores.


 Would you need to add a filter to all search queries?


Nope, but you will need to send the query to the correct host (perhaps a
mapping DB will help)


 If a client cancelled, how would you delete their (part of the) index?
 (this
 may be trivial--not sure yet)


With different cores for each client, this'd be pretty easy.

-- 
Regards,
Shalin Shekhar Mangar.


Re: DIH: inner select fails when outter entity is null/empty

2010-04-25 Thread Noble Paul നോബിള്‍ नोब्ळ्
do an onError=skip on the inner entity

On Fri, Apr 23, 2010 at 3:56 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 Hello,

 Here is a newbie DataImportHandler question:

 Currently, I have entities with entities.  There are some
 situations where a column value from the outer entity is null, and when I try 
 to use it in the inner entity, the null just gets replaced with an
 empty string.  That in turn causes the SQL query in the inner entity to
 fail.

 This seems like a common problem, but I couldn't find any solutions or 
 mention in the FAQ ( http://wiki.apache.org/solr/DataImportHandlerFaq )

 What is the best practice to avoid or convert null values to something safer? 
  Would
 this be done via a Transformer or is there a better mechanism for this?

 I think the problem I'm describing is similar to what was described here:  
 http://search-lucene.com/m/cjlhtFkG6m
 ... except I don't have the luxury of rewriting the SQL selects.

 Thanks,
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/





-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


local vs cloud

2010-04-25 Thread Dennis Gearon
I'm working on an app that could grow much faster and bigger than I could scale 
local resources, at least on certain dates and for other reasons.

So I'd like to run a local machine in a dedicated host or even virtual machine 
at a host.

If the load goes up, then queries are sent to the cloud at a certain point.

Is this practical, anyone have experience in this?

This is obviously a search engine app based on solr/lucene if someone is 
wondering.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


Re: [spAm] Solr does not honor facet.mincount and field.facet.mincount

2010-04-25 Thread Chris Hostetter
: REQUEST: 
: 
http://localhost:8983/solr/select/?q=*%3A*version=2.2rows=0start=0indent=onfacet=truefacet.field=Instrumentfacet.field=Locationfacet.mincount=9
: 
: RESPONSE: 
...
: lst name=params 
...
: str name=facet.minCount9/str 

...the REQUST url you listed says facet.mincount, but the response from 
Solr disagrees.  according to it you actaully had a capital C in 
facet.minCount ... solr params are case sensitive, so Solr is completley 
ignoring facet.minCount.


As for why you don't get any values for the Instrument facet -- 
understanding that requires you to tell us more about the field/fieldType 
for Instrument.


-Hoss



Re: performance of million documents search

2010-04-25 Thread Erick Erickson
NGrams might help here, search the SOLR list for NGram
and I think you'll find that this subject has been discussed
several times...

HTH
Erick

On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang weiqi...@gmail.com wrote:

 Hi,

 I have about 2 million documents in my index.  I want to search them by a
 string field.  Every document have this field such as 'LB681' .

 The field is a dynamic Field which type is string.  So, in solr/admin , I
 do
 search by using   PartNo_s:L*   which means started with L,

 I can get the result from 2 million documents less than 300 ms.  But,  when
 I use   PartNo_s:*B68*   which means included B68  to search,

 It take more than 2000 ms.  It is too slow for me.

 Has anyone know that how can I get the result more faster?


 thank you very much



hybrid approach to using cloud servers for Solr/Lucene

2010-04-25 Thread Dennis Gearon
I'm working on an app that could grow much faster and bigger than I could scale 
local resources, at least on certain dates and for other reasons.

So I'd like to run a local machine in a dedicated host or even virtual machine 
at a host.

If the load goes up, then queries are sent to the cloud at a certain point.

Is this practical, anyone have experience in this?

This is obviously a search engine app based on solr/lucene if someone is 
wondering.

Dennis Gearon

Signature Warning

EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


RE: Howto build a function query using the 'query' function

2010-04-25 Thread Villemos, Gert
If the 'query' returned a count, yes. But my problem is exactly that as far as 
I can see from the description of the 'query' function, it does NOT return the 
count but the score of the search.
 
So my quetion is;
 
How can I write a 'query' function that returns a count, not a score?
 
Cheers,
Gert.
 
 



From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
Sent: Sun 4/25/2010 2:15 AM
To: solr-user@lucene.apache.org
Subject: Re: Howto build a function query using the 'query' function



Villemos, Gert wrote:
 I want to build a function expression for a dismax request handler 'bf'
 field, to boost the documents if it is referenced by other documents.
 I.e. the more often a document is referenced, the higher the boost.

 

 Something like

 
 bflinear(query(myQueryReturningACountOfHowOftenThisDocumentIsReference
 d, 1), 0.01, 1)/bf

 

 Intended to mean;

 if count is 0, then the boost is 0*0.01+1 = 1

 if count is 1, then the boost is 1*0.01+1 = 1.01

 If count is 100, then the boost is 100*0.01 + 1 = 2

 

 However the query function
 (http://wiki.apache.org/solr/FunctionQuery#query) seems to only be able
 to return the score of the query results, not the count of results.

  
Probably I'm missing something, but doesn't just using
linear function meet your needs? i.e.

linear(myQueryReturningACountOfHowOftenThisDocumentIsReferenced, 0.01,1)

Koji

--
http://www.rondhuit.com/en/






Please help Logica to respect the environment by not printing this email  / 
Pour contribuer comme Logica au respect de l'environnement, merci de ne pas 
imprimer ce mail /  Bitte drucken Sie diese Nachricht nicht aus und helfen Sie 
so Logica dabei, die Umwelt zu schützen. /  Por favor ajude a Logica a 
respeitar o ambiente nao imprimindo este correio electronico.



This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.



Re: DIH: inner select fails when outter entity is null/empty

2010-04-25 Thread Otis Gospodnetic
Hi,

Thanks for this tip, Paul.  But what if this is not an error.  Is this what 
transformers should be used for somehow?
 Thanks,
Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Noble Paul നോബിള്‍  नोब्ळ् noble.p...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sun, April 25, 2010 9:16:22 AM
 Subject: Re: DIH: inner select fails when outter entity is null/empty
 
 do an onError=skip on the inner entity

On Fri, Apr 23, 2010 at 3:56 AM, 
 Otis Gospodnetic

 href=mailto:otis_gospodne...@yahoo.com;otis_gospodne...@yahoo.com 
 wrote:
 Hello,

 Here is a newbie DataImportHandler 
 question:

 Currently, I have entities with entities.  There are 
 some
 situations where a column value from the outer entity is null, and 
 when I try to use it in the inner entity, the null just gets replaced with 
 an
 empty string.  That in turn causes the SQL query in the inner entity 
 to
 fail.

 This seems like a common problem, but I 
 couldn't find any solutions or mention in the FAQ ( 
 href=http://wiki.apache.org/solr/DataImportHandlerFaq; target=_blank 
 http://wiki.apache.org/solr/DataImportHandlerFaq )

 What is 
 the best practice to avoid or convert null values to something safer? 
  Would
 this be done via a Transformer or is there a better mechanism for 
 this?

 I think the problem I'm describing is similar to what was 
 described here:  
 http://search-lucene.com/m/cjlhtFkG6m
 ... except I don't have the 
 luxury of rewriting the SQL selects.

 Thanks,
 
 Otis
 
 Sematext :: 
 target=_blank http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene 
 ecosystem search :: 
 http://search-lucene.com/





-- 
 
-
Noble Paul | 
 Systems Architect| AOL | 
 http://aol.com


Re: hybrid approach to using cloud servers for Solr/Lucene

2010-04-25 Thread findbestopensource
Hello Dennis

If the load goes up, then queries are sent to the cloud at a certain
point.
My advice is to do load balancing between local and cloud.  Your local
system seems to be capable as it is a dedicated host. Another option is to
do indexing in local and sync it with cloud. Cloud will be only used for
search.

Hope it helps.

Regards
Aditya
www,findbestopensource.com


On Mon, Apr 26, 2010 at 7:47 AM, Dennis Gearon gear...@sbcglobal.netwrote:

 I'm working on an app that could grow much faster and bigger than I could
 scale local resources, at least on certain dates and for other reasons.

 So I'd like to run a local machine in a dedicated host or even virtual
 machine at a host.

 If the load goes up, then queries are sent to the cloud at a certain point.

 Is this practical, anyone have experience in this?

 This is obviously a search engine app based on solr/lucene if someone is
 wondering.

 Dennis Gearon

 Signature Warning
 
 EARTH has a Right To Life,
  otherwise we all die.

 Read 'Hot, Flat, and Crowded'
 Laugh at http://www.yert.com/film.php



Re: performance of million documents search

2010-04-25 Thread weiqi wang
Hi Erick,

It's very useful.Thank you very much

2010/4/26 Erick Erickson erickerick...@gmail.com

 NGrams might help here, search the SOLR list for NGram
 and I think you'll find that this subject has been discussed
 several times...

 HTH
 Erick

 On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang weiqi...@gmail.com wrote:

  Hi,
 
  I have about 2 million documents in my index.  I want to search them by a
  string field.  Every document have this field such as 'LB681' .
 
  The field is a dynamic Field which type is string.  So, in solr/admin , I
  do
  search by using   PartNo_s:L*   which means started with L,
 
  I can get the result from 2 million documents less than 300 ms.  But,
  when
  I use   PartNo_s:*B68*   which means included B68  to search,
 
  It take more than 2000 ms.  It is too slow for me.
 
  Has anyone know that how can I get the result more faster?
 
 
  thank you very much