How to setup search engine for B2B web app
*Given:* - 1 database per client (business customer) - 5000 clients - Clients have between 2 to 2000 users (avg is ~100 users/client) - 100k to 10 million records per database - Users need to search those records often (it's the best way to navigate their data) *The Question:* How would you setup Solr (or Lucene) search so that each client can only search within its database? How would you setup the index(es)? Where do you store the index(es)? Would you need to add a filter to all search queries? If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet) I asked this question on StackOverflow.comhttp://stackoverflow.com/questions/2707055/how-to-setup-lucene-search-for-a-b2b-web-app. I would like it better, if you answered there. Thanks.
Re: How to setup search engine for B2B web app
Hi Bill, On Sun, Apr 25, 2010 at 12:23 PM, Bill Paetzke billpaet...@gmail.comwrote: *Given:* - 1 database per client (business customer) - 5000 clients - Clients have between 2 to 2000 users (avg is ~100 users/client) - 100k to 10 million records per database - Users need to search those records often (it's the best way to navigate their data) *The Question:* How would you setup Solr (or Lucene) search so that each client can only search within its database? How would you setup the index(es)? I'd look at setting up multiple cores for each client. You may need to setup slaves as well depending on search traffic. Where do you store the index(es)? Setting up 5K cores on one box will not work. So you will need to partition the clients into multiple boxes each having a subset of cores. Would you need to add a filter to all search queries? Nope, but you will need to send the query to the correct host (perhaps a mapping DB will help) If a client cancelled, how would you delete their (part of the) index? (this may be trivial--not sure yet) With different cores for each client, this'd be pretty easy. -- Regards, Shalin Shekhar Mangar.
Re: DIH: inner select fails when outter entity is null/empty
do an onError=skip on the inner entity On Fri, Apr 23, 2010 at 3:56 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hello, Here is a newbie DataImportHandler question: Currently, I have entities with entities. There are some situations where a column value from the outer entity is null, and when I try to use it in the inner entity, the null just gets replaced with an empty string. That in turn causes the SQL query in the inner entity to fail. This seems like a common problem, but I couldn't find any solutions or mention in the FAQ ( http://wiki.apache.org/solr/DataImportHandlerFaq ) What is the best practice to avoid or convert null values to something safer? Would this be done via a Transformer or is there a better mechanism for this? I think the problem I'm describing is similar to what was described here: http://search-lucene.com/m/cjlhtFkG6m ... except I don't have the luxury of rewriting the SQL selects. Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -- - Noble Paul | Systems Architect| AOL | http://aol.com
local vs cloud
I'm working on an app that could grow much faster and bigger than I could scale local resources, at least on certain dates and for other reasons. So I'd like to run a local machine in a dedicated host or even virtual machine at a host. If the load goes up, then queries are sent to the cloud at a certain point. Is this practical, anyone have experience in this? This is obviously a search engine app based on solr/lucene if someone is wondering. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php
Re: [spAm] Solr does not honor facet.mincount and field.facet.mincount
: REQUEST: : http://localhost:8983/solr/select/?q=*%3A*version=2.2rows=0start=0indent=onfacet=truefacet.field=Instrumentfacet.field=Locationfacet.mincount=9 : : RESPONSE: ... : lst name=params ... : str name=facet.minCount9/str ...the REQUST url you listed says facet.mincount, but the response from Solr disagrees. according to it you actaully had a capital C in facet.minCount ... solr params are case sensitive, so Solr is completley ignoring facet.minCount. As for why you don't get any values for the Instrument facet -- understanding that requires you to tell us more about the field/fieldType for Instrument. -Hoss
Re: performance of million documents search
NGrams might help here, search the SOLR list for NGram and I think you'll find that this subject has been discussed several times... HTH Erick On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang weiqi...@gmail.com wrote: Hi, I have about 2 million documents in my index. I want to search them by a string field. Every document have this field such as 'LB681' . The field is a dynamic Field which type is string. So, in solr/admin , I do search by using PartNo_s:L* which means started with L, I can get the result from 2 million documents less than 300 ms. But, when I use PartNo_s:*B68* which means included B68 to search, It take more than 2000 ms. It is too slow for me. Has anyone know that how can I get the result more faster? thank you very much
hybrid approach to using cloud servers for Solr/Lucene
I'm working on an app that could grow much faster and bigger than I could scale local resources, at least on certain dates and for other reasons. So I'd like to run a local machine in a dedicated host or even virtual machine at a host. If the load goes up, then queries are sent to the cloud at a certain point. Is this practical, anyone have experience in this? This is obviously a search engine app based on solr/lucene if someone is wondering. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php
RE: Howto build a function query using the 'query' function
If the 'query' returned a count, yes. But my problem is exactly that as far as I can see from the description of the 'query' function, it does NOT return the count but the score of the search. So my quetion is; How can I write a 'query' function that returns a count, not a score? Cheers, Gert. From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Sun 4/25/2010 2:15 AM To: solr-user@lucene.apache.org Subject: Re: Howto build a function query using the 'query' function Villemos, Gert wrote: I want to build a function expression for a dismax request handler 'bf' field, to boost the documents if it is referenced by other documents. I.e. the more often a document is referenced, the higher the boost. Something like bflinear(query(myQueryReturningACountOfHowOftenThisDocumentIsReference d, 1), 0.01, 1)/bf Intended to mean; if count is 0, then the boost is 0*0.01+1 = 1 if count is 1, then the boost is 1*0.01+1 = 1.01 If count is 100, then the boost is 100*0.01 + 1 = 2 However the query function (http://wiki.apache.org/solr/FunctionQuery#query) seems to only be able to return the score of the query results, not the count of results. Probably I'm missing something, but doesn't just using linear function meet your needs? i.e. linear(myQueryReturningACountOfHowOftenThisDocumentIsReferenced, 0.01,1) Koji -- http://www.rondhuit.com/en/ Please help Logica to respect the environment by not printing this email / Pour contribuer comme Logica au respect de l'environnement, merci de ne pas imprimer ce mail / Bitte drucken Sie diese Nachricht nicht aus und helfen Sie so Logica dabei, die Umwelt zu schützen. / Por favor ajude a Logica a respeitar o ambiente nao imprimindo este correio electronico. This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
Re: DIH: inner select fails when outter entity is null/empty
Hi, Thanks for this tip, Paul. But what if this is not an error. Is this what transformers should be used for somehow? Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, April 25, 2010 9:16:22 AM Subject: Re: DIH: inner select fails when outter entity is null/empty do an onError=skip on the inner entity On Fri, Apr 23, 2010 at 3:56 AM, Otis Gospodnetic href=mailto:otis_gospodne...@yahoo.com;otis_gospodne...@yahoo.com wrote: Hello, Here is a newbie DataImportHandler question: Currently, I have entities with entities. There are some situations where a column value from the outer entity is null, and when I try to use it in the inner entity, the null just gets replaced with an empty string. That in turn causes the SQL query in the inner entity to fail. This seems like a common problem, but I couldn't find any solutions or mention in the FAQ ( href=http://wiki.apache.org/solr/DataImportHandlerFaq; target=_blank http://wiki.apache.org/solr/DataImportHandlerFaq ) What is the best practice to avoid or convert null values to something safer? Would this be done via a Transformer or is there a better mechanism for this? I think the problem I'm describing is similar to what was described here: http://search-lucene.com/m/cjlhtFkG6m ... except I don't have the luxury of rewriting the SQL selects. Thanks, Otis Sematext :: target=_blank http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: hybrid approach to using cloud servers for Solr/Lucene
Hello Dennis If the load goes up, then queries are sent to the cloud at a certain point. My advice is to do load balancing between local and cloud. Your local system seems to be capable as it is a dedicated host. Another option is to do indexing in local and sync it with cloud. Cloud will be only used for search. Hope it helps. Regards Aditya www,findbestopensource.com On Mon, Apr 26, 2010 at 7:47 AM, Dennis Gearon gear...@sbcglobal.netwrote: I'm working on an app that could grow much faster and bigger than I could scale local resources, at least on certain dates and for other reasons. So I'd like to run a local machine in a dedicated host or even virtual machine at a host. If the load goes up, then queries are sent to the cloud at a certain point. Is this practical, anyone have experience in this? This is obviously a search engine app based on solr/lucene if someone is wondering. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php
Re: performance of million documents search
Hi Erick, It's very useful.Thank you very much 2010/4/26 Erick Erickson erickerick...@gmail.com NGrams might help here, search the SOLR list for NGram and I think you'll find that this subject has been discussed several times... HTH Erick On Sat, Apr 24, 2010 at 9:26 PM, weiqi wang weiqi...@gmail.com wrote: Hi, I have about 2 million documents in my index. I want to search them by a string field. Every document have this field such as 'LB681' . The field is a dynamic Field which type is string. So, in solr/admin , I do search by using PartNo_s:L* which means started with L, I can get the result from 2 million documents less than 300 ms. But, when I use PartNo_s:*B68* which means included B68 to search, It take more than 2000 ms. It is too slow for me. Has anyone know that how can I get the result more faster? thank you very much