different indexes for multitenant approach
Hi I want to implement different index strategy where we want to keep indexes with respect to each tennant and we want to maintain indexes separately ... first level of category -- company name second level of category - company name + fields to be indexed then further categories - group of different company name based on some heuristic (hashing) (if it grows furhter) i want to do in the same solr instance. can it be possible ? Thanks Naveen
Re: how to make getJson parameter dynamic
lee carroll: Sorry for this. i did this because i was not getting any response. anyway thanks for letting me know and now i found the solution of the above problem :) now i am facing a very strange problem related to jquery can you please help me out. $(document).ready(function(){ $(#c2).click(function(){ var q=getquerystring() ; $.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonq=+q+json.wrf=?;, function(result){ $.each(result.response.docs, function(i,item){ alert(result.response.docs); alert(item.UID_PK); }); }); }); }); when i use $(#c2).click(function() then it does not enter in $.getJSON() and when i remove $(#c2).click(function() from the code it run fine. Why is so please explain. because i want to get data from a text box on onclickevent and then display response. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3018732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to display search results of solr in to other application.
$.getJSON( http://[server]:[port]/solr/select/?jsoncallback=?;, {q: queryString, version: 2.2, start: 0, rows: 10, indent: on, json.wrf: callbackFunctionToDoSomethingWithOurData, wt: json, fl: field1} ); would you please explain what are queryString and json.wrf: callbackFunctionToDoSomethingWithOurData. and what if i want to change my query string each time. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3018740.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: query routing with shards
Hi Otis, I merely followed on the gmail's suggestion to include other people into the recipients list, Yonik was the first one :) I won't do it next time. Thanks for a rapid reply. The reason for doing this query routing is that we abstract the distributed SOLR from the client code for security reasons (that is, we don't want to expose the entire shard farm to the world, but only the frontend SOLR) and for better decoupling. Is it possible to implement a plugin to SOLR that would map queries to shards? We have other choices too, they'll take quite some time, that's why I decided to quickly ask, if I was missing something from the SOLR main components design and configuration. Dmitry On Fri, Jun 3, 2011 at 8:25 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Dmitry (you may not want to additionally copy Yonik, he's subscribed to this list, too) It sounds like you have the knowledge of which query maps to which shard. If so, why not control/change the value of shards param in the request to your front-end Solr (aka distributed request dispatcher) within your app, which is the one calling Solr? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org; yo...@lucidimagination.com Sent: Thu, June 2, 2011 7:00:53 AM Subject: query routing with shards Hello all, We have currently several pretty fat logically isolated shards with the same schema / solrconfig (indices are separate). We currently have one single front end SOLR (1.4) for the client code calls. Since a client code query usually hits only one shard, we are considering making a smart routing of queries to the shards they map to. Can you please give some pointers as to what would be an optimal way to achieve such a routing inside the front end solr? Is there a way to configure mapping inside the solrconfig? Thanks. -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: How to display search results of solr in to other application.
Hi Romi As per me, you need to understand how ajax with jquery works .. then go for json and then jsonp (if you are fetching from different) query here is dynamic query which you will be trying to hit solr .. (it could be simple text, or more advanced query string) http://wiki.apache.org/solr/CommonQueryParameters Callback is the method name which you will define .. after getting response, this method will be called (callback mechanism) using the response from solr (json format), you need to show the response or analyze the response as per your business need. Thanks Naveen On Fri, Jun 3, 2011 at 12:00 PM, Romi romijain3...@gmail.com wrote: $.getJSON( http://[server]:[port]/solr/select/?jsoncallback=?;, {q: queryString, version: 2.2, start: 0, rows: 10, indent: on, json.wrf: callbackFunctionToDoSomethingWithOurData, wt: json, fl: field1} ); would you please explain what are queryString and json.wrf: callbackFunctionToDoSomethingWithOurData. and what if i want to change my query string each time. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3018740.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: different indexes for multitenant approach
may be you need multi core feature of solr , you can have a single Solr instance with separate configurations and indexes http://wiki.apache.org/solr/CoreAdmin On Fri, Jun 3, 2011 at 12:04 PM, Naveen Gupta nkgiit...@gmail.com wrote: Hi I want to implement different index strategy where we want to keep indexes with respect to each tennant and we want to maintain indexes separately ... first level of category -- company name second level of category - company name + fields to be indexed then further categories - group of different company name based on some heuristic (hashing) (if it grows furhter) i want to do in the same solr instance. can it be possible ? Thanks Naveen -- Chandan Tamrakar * *
Getting query fields in a custom SearchHandler
Hi all, I wrote my own SearchHandler and therefore overrided the handleRequestBody method. This method takes two input parameters : SolrQueryRequest and SolrQueryResponse objects. The thing I'd like to do is to get the query fields that are used in my request. Of course I can use req.getParams().get(q) but it returns the complete query (which can be very complicated). I'd like to have a simple map with field:value. Is there a way to get it? Or do I have to write my own parser for the q parameter? Thanks in advance, Marc.
How to search camel case words using CJKTokenizer
Hi all, I'm using CJKTokenizerFactory tokenizer to handle text which contains both Japanese and alphabet words. However, I noticed that CJKTokenizerFactory converts alphabet to lowercase, so that I cannot use WordDelimiterFilterFactory filter with splitOnCaseChange property for camel case words. I changed to NGramTokenizerFactory (2-gram), but it only parses first 1024 characters. Because of that, I cannot use NGramTokenizerFactory, neither. I tried the following two settings and both of them seem working fine, but I don't know if these are good or not, or if there are some other better solutions. 1) tokenizer class=solr.CJKTokenizerFactory / filter class=solr.NGramFilterFactory maxGramSize=2 minGramSize=2 / 2) tokenizer class=solr.StandardTokenizerFactory / filter class=solr.NGramFilterFactory maxGramSize=1 minGramSize=1 / If anyone can give me any advice, it would be nice. Thank you. Tiffany -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-search-camel-case-words-using-CJKTokenizer-tp3018853p3018853.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query problem in Solr
@ Pravesh: It's 2 seperate cores, not 2 indexes. Sorry for that. @ Erick: Yes, I've seen this suggestion and it seems to be the only possible solution. I'll look into it. Thanks for your answers guys! Kurt On Wed, Jun 1, 2011 at 4:24 PM, Erick Erickson erickerick...@gmail.comwrote: If I read this correctly, one approach is to specify an increment gap in a multiValued field, then search for phrases with a slop less than that increment gap. i.e. incrementGap=100 in your definition, and search for apple orange~99 If this is gibberish, please post some examples and we'll try something else. Best Erick On Wed, Jun 1, 2011 at 4:21 AM, Kurt Sultana kurtanat...@gmail.com wrote: Hi all, We're using Solr to search on a Shop index and a Product index. Currently a Shop has a field `shop_keyword` which also contains the keywords of the products assigned to it. The shop keywords are separated by a space. Consequently, if there is a product which has a keyword apple and another which has orange, a search for shops having `Apple AND Orange` would return the shop for these products. However, this is incorrect since we want that a search for shops having `Apple AND Orange` returns shop(s) having products with both apple and orange as keywords. We tried solving this problem, by making shop keywords multi-valued and assigning the keywords of every product of the shop as a new value in shop keywords. However as was confirmed in another post http://markmail.org/thread/xce4qyzs5367yplo#query:+page:1+mid:76eerw5yqev2aanu+state:results , Solr does not support all words must match in the same value of a multi-valued field. (Hope I explained myself well) How can we go about this? Ideally, we shouldn't change our search infrastructure dramatically. Thanks! Krt_Malta
Return stemmed word
Hi, We have stemming in our Solr search and we need to retrieve the word/phrase after stemming. That is if I search for oranges, through stemming a search for orange is carried out. If I turn on debugQuery I would be able to see this, however we'd like to access it through the result if possible. Basically, we need this, because we pass the searched word as a parameter to a 3rd party application which highlights the word in an online PDF reader. Currently, if a user searches for oranges and a document contains orange, then the PDF wouldn't highlight anything since it tries to highlight oranges not orange. Thanks all in advance, Kurt
Re: Strategy -- Frequent updates in our application
You can use DataImportHandler for your full/incremental indexing. Now NRT indexing could vary as per business requirements (i mean delay cud be 5-mins ,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will be indexed incrementally. BTW, r u having Master+Slave SOLR setup? -- View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting
BTW, why r u sorting on this field? You could also index store this field twice. First, in its original value, and then second, by encoding to some unique code/hash and index it and sort on that. -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-tp3017285p3019055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting algorithm
Hi Tomás Thanks, that makes a lot of sense, and your math is sound. It is working well. An if() function would be great, and it seems its coming soon. Richard -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-algorithm-tp3014549p3019077.html Sent from the Solr - User mailing list archive at Nabble.com.
Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value
Hi, in Solr 4.x (trunk version of mid may) I have noticed a null pointer exception if I activate debugging (debug=true) and use a wildcard to filter by facet value, e.g. if I have a price field ...debug=truefacet.field=pricefq=price[500+TO+*] I get SEVERE: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535) This used to work in Solr 1.4 and I was wondering if it's a bug or a new feature and if there is a trick to get this working again? Best regards, Stefan
Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value
Stefan, i guess there is a colon missing? fq=price:[500+TO+*] should do the trick Regards Stefan On Fri, Jun 3, 2011 at 11:42 AM, Stefan Moises moi...@shoptimax.de wrote: Hi, in Solr 4.x (trunk version of mid may) I have noticed a null pointer exception if I activate debugging (debug=true) and use a wildcard to filter by facet value, e.g. if I have a price field ...debug=truefacet.field=pricefq=price[500+TO+*] I get SEVERE: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535) This used to work in Solr 1.4 and I was wondering if it's a bug or a new feature and if there is a trick to get this working again? Best regards, Stefan
Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value
Hi Stefan, sorry, actually there is a colon, I just forgot it in my example... so the exception also appears for fq=price:[500+TO+*] But only if debug=true... and normal price values work, e.g. fq=price:[500+TO+999] Thanks, Stefan Am 03.06.2011 11:46, schrieb Stefan Matheis: Stefan, i guess there is a colon missing?fq=price:[500+TO+*] should do the trick Regards Stefan On Fri, Jun 3, 2011 at 11:42 AM, Stefan Moisesmoi...@shoptimax.de wrote: Hi, in Solr 4.x (trunk version of mid may) I have noticed a null pointer exception if I activate debugging (debug=true) and use a wildcard to filter by facet value, e.g. if I have a price field ...debug=truefacet.field=pricefq=price[500+TO+*] I get SEVERE: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535) This used to work in Solr 1.4 and I was wondering if it's a bug or a new feature and if there is a trick to get this working again? Best regards, Stefan . -- Mit den besten Grüßen aus Nürnberg, Stefan Moises *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
php library for extractrequest handler
Hi We want to post to solr server with some of the files (rtf,doc,etc) using php .. one way is to post using curl is there any client like java client (solrcell) urls will also help Thanks Naveen
Re: Return stemmed word
Hi Kurt, I think this is a bit more tricky than that. For example, if a user searches for oranges, the stemmer may return orang which is not an existing word. So getting stemmed words might/will not work for your highlighting purpose. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Return-stemmed-word-tp3018880p3019180.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: php library for extractrequest handler
On Fri, Jun 3, 2011 at 3:55 PM, Naveen Gupta nkgiit...@gmail.com wrote: Hi We want to post to solr server with some of the files (rtf,doc,etc) using php .. one way is to post using curl Do not normally use PHP, and have not tried it myself. However, there is a PHP extension for Solr: http://wiki.apache.org/solr/SolPHP http://php.net/manual/en/book.solr.php Regards, Gora
Re: how to update database record after indexing
Hey Erick, i written separate process as you suggested, and achieved task. Thanks a lot Vishal Parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-update-database-record-after-indexing-tp2874171p3019217.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: how to do offline adding/updating index
Thanks to all, i done by using multicore, vishal parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-do-offline-adding-updating-index-tp2923035p3019219.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to concatenate two nodes of xml with xpathentityprocessor
Thanks kbootz your suggestion works fine, vishal parekh -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-concatenate-two-nodes-of-xml-with-xpathentityprocessor-tp2861260p3019223.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ and Range Faceting
Hi Jamie, I don't know why range facets didn't make it into SolrJ. But I've recently opened an issue for this: https://issues.apache.org/jira/browse/SOLR-2523 I hope this will be committed soon. Check the patch out and see if you like it. Martijn On 2 June 2011 18:22, Jamie Johnson jej2...@gmail.com wrote: Currently the range and date faceting in SolrJ acts a bit differently than I would expect. Specifically, range facets aren't parsed at all and date facets end up generating filterQueries which don't have the range, just the lower bound. Is there a reason why SolrJ doesn't support these? I have written some things on my end to handle these and generate filterQueries for date ranges of the form dateTime:[start TO end] and I have a function (which I copied from the date faceting) which parses the range facets, but would prefer not to have to maintain these myself. Is there a plan to implement these? Also is there a plan to update FacetField to not have end be a date, perhaps making it a String like start so we can support date and range queries? -- Met vriendelijke groet, Martijn van Groningen
[Visualizations] from Query Results
Dear Solr experts, I am curious to learn what visualization tools are out there to help me visualize my query results. I am not talking about a language specific client per se but something more like Carrot2 which breaks clusters in to their knowledge tree and expandable pie chart. Sorry if those aren't the correct names for those tools ;-) Anyway, what else is out there like Carrot2 http://project.carrot2.org/ to help me visualize Solr query results? Thanks for your input, Adam
Re: Strategy -- Frequent updates in our application
Hi Pravesh We don't have that setup right now .. we are thinking of doing that for writes we are going to have one instance and for read, we are going to have another... do you have other design in mind .. kindly share Thanks Naveen On Fri, Jun 3, 2011 at 2:50 PM, pravesh suyalprav...@yahoo.com wrote: You can use DataImportHandler for your full/incremental indexing. Now NRT indexing could vary as per business requirements (i mean delay cud be 5-mins ,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will be indexed incrementally. BTW, r u having Master+Slave SOLR setup? -- View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: php library for extractrequest handler
Yes, that one i used and it is working fine .thanks to nabble .. Thanks Naveen On Fri, Jun 3, 2011 at 4:02 PM, Gora Mohanty g...@mimirtech.com wrote: On Fri, Jun 3, 2011 at 3:55 PM, Naveen Gupta nkgiit...@gmail.com wrote: Hi We want to post to solr server with some of the files (rtf,doc,etc) using php .. one way is to post using curl Do not normally use PHP, and have not tried it myself. However, there is a PHP extension for Solr: http://wiki.apache.org/solr/SolPHP http://php.net/manual/en/book.solr.php Regards, Gora
Re: Strategy -- Frequent updates in our application
Hi Naveen: Solr with RankingAlgorithm supports NRT. The performance is about 262 docs / sec. You can get more information about the performance and NRT from here: http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search You can download Solr with RankingAlgorithm from here: http://solr-ra.tgels.com Regards, - Nagendra Nagarajayya http://solr-ra.tgels.com On 6/2/2011 8:29 PM, Naveen Gupta wrote: Hi We are having an application where every 10 mins, we are doing indexing of users docs repository, and eventually, if some thread is being added in that particular discussion, we need to index the thread again (please note we are not doing blind indexing each time, we have various rules to filter out which thread is new and thus that is a candidate for indexing plus new ones which has arrived). So we are doing updates for each user docs repository .. the performance is not looking so far very good. the future is that we are going to get hits in volume(1000 to 10,000 hits per mins), so looking for strategy where we can tune solr in order to index the data in real time and what about NRT, is it fine to apply in this case of scenario. i read that solr NRT is not very good in performance, but i am not going to believe it since it is one of the best open sources ..so it is going to have this problem sorted in near future ..but if any benchmark is there, kindly share with me ... we would like to analyze with our requirements. Is there any way to add incremental indexes which we generally find in other search engine like endeca and etc? i don't know much in detail about solr... since i am newbie, so can you please tell me if we can have some settings which can keep track of incremental indexing? Thanks Naveen
Solr Indexing Patterns
What is the best practice method to index the following in Solr: I'm attempting to use solr for a book store site. Each book will have a price but on occasions this will be discounted. The discounted price exists for a defined time period but there may be many discount periods. Each discount will have a brief synopsis, start and end time. A subset of the desired output would be as follows: ... response:{numFound:1,start:0,docs:[ { name:The Book, price:$9.99, discounts:[ { price:$3.00, synopsis:thanksgiving special, starts:11-24-2011, ends:11-25-2011, }, { price:$4.00, synopsis:Canadian thanksgiving special, starts:10-10-2011, ends:10-11-2011, }, ] }, . A requirement is to be able to search for just discounted publications. I think I could use date faceting for this ( return publications that are within a discount window ). When a discount search is performed no publications that are not currently discounted will be returned. My question are: - Does solr support this type of sub documents In the above example the discounts are the sub documents. I know solr is not a relational DB but I would like to store and index the above representation in a single document if possible. - what is the best method to approach the above I can see in many examples the authors tend to denormalize to solve similar problems. This suggest that for each discount I am required to duplicate the book data or form a document associationhttp://stackoverflow.com/questions/2689399/solr-associations. Which method would you advise? It would be nice if solr could return a response structured as above. Much Thanks
Re: Strategy -- Frequent updates in our application
You can go ahead with the Master/Slave setup provided by SOLR. Its trivial to setup and you also get SOLR's operational scripts for index synch'ing b/w Master-to-Slave(s), OR the Java based replication feature. There is no need to re-invent other architecture :) -- View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019475.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr performance tuning - disk i/o?
Hello, I'm trying to move a VuFind installation from an ailing physical server into a virtualized environment, and I'm running into performance problems. VuFind is a Solr 1.4.1-based application with fairly large and complex records (many stored fields, many words per record). My particular installation contains about a million records in the index, with a total index size around 6GB. The virtual environment has more RAM and better CPUs than the old physical box, and I am satisfied that my Java environment is well-tuned. My index is optimized. Searches that hit the cache respond very well. The problem is that non-cached searches are very slow - the more keywords I add, the slower they get, to the point of taking 6-12 seconds to come back with results on a quiet box and well over a minute under stress testing. (The old box still took a while for equivalent searches, but it was about twice as fast as the new one). My gut feeling is that disk access reading the index is the bottleneck here, but I know little about the specifics of Solr's internals, so it's entirely possible that my gut is wrong. Outside testing does show that the the virtual environment's disk performance is not as good as the old physical server, especially when multiple processes are trying to access the same file simultaneously. So, two basic questions: 1.)Would you agree that I'm dealing with a disk bottleneck, or are there some other factors I should be considering? Any good diagnostics I should be looking at? 2.)If the problem is disk access, is there anything I can tune on the Solr side to alleviate the problems? Thanks, Demian
Re: how to make getJson parameter dynamic
Romi: Please review: http://wiki.apache.org/solr/UsingMailingLists This is the Solr forum. jQuery questions are best directed at a jQuery-specific forum. Best Erick On Fri, Jun 3, 2011 at 2:27 AM, Romi romijain3...@gmail.com wrote: lee carroll: Sorry for this. i did this because i was not getting any response. anyway thanks for letting me know and now i found the solution of the above problem :) now i am facing a very strange problem related to jquery can you please help me out. $(document).ready(function(){ $(#c2).click(function(){ var q=getquerystring() ; $.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonq=+q+json.wrf=?;, function(result){ $.each(result.response.docs, function(i,item){ alert(result.response.docs); alert(item.UID_PK); }); }); }); }); when i use $(#c2).click(function() then it does not enter in $.getJSON() and when i remove $(#c2).click(function() from the code it run fine. Why is so please explain. because i want to get data from a text box on onclickevent and then display response. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-make-getJson-parameter-dynamic-tp3014941p3018732.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value
Hmmm, I just tried it on a trunk from a couple of days ago and it doesn't error out. Could you re-try with a new build? Thanks Erick On Fri, Jun 3, 2011 at 5:51 AM, Stefan Moises moi...@shoptimax.de wrote: Hi Stefan, sorry, actually there is a colon, I just forgot it in my example... so the exception also appears for fq=price:[500+TO+*] But only if debug=true... and normal price values work, e.g. fq=price:[500+TO+999] Thanks, Stefan Am 03.06.2011 11:46, schrieb Stefan Matheis: Stefan, i guess there is a colon missing?fq=price:[500+TO+*] should do the trick Regards Stefan On Fri, Jun 3, 2011 at 11:42 AM, Stefan Moisesmoi...@shoptimax.de wrote: Hi, in Solr 4.x (trunk version of mid may) I have noticed a null pointer exception if I activate debugging (debug=true) and use a wildcard to filter by facet value, e.g. if I have a price field ...debug=truefacet.field=pricefq=price[500+TO+*] I get SEVERE: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535) This used to work in Solr 1.4 and I was wondering if it's a bug or a new feature and if there is a trick to get this working again? Best regards, Stefan . -- Mit den besten Grüßen aus Nürnberg, Stefan Moises *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
Re: [Visualizations] from Query Results
I'm not quite sure what you mean by visualization here. Do you want to see the query parse tree? The results list in something other than XML (see the /browse functionality if so). How documents are ranked? Visualization is another overloaded word G... Best Erick On Fri, Jun 3, 2011 at 7:13 AM, Adam Estrada estrada.adam.gro...@gmail.com wrote: Dear Solr experts, I am curious to learn what visualization tools are out there to help me visualize my query results. I am not talking about a language specific client per se but something more like Carrot2 which breaks clusters in to their knowledge tree and expandable pie chart. Sorry if those aren't the correct names for those tools ;-) Anyway, what else is out there like Carrot2 http://project.carrot2.org/ to help me visualize Solr query results? Thanks for your input, Adam
Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value
Hi Erick sure, thanks for looking into it! I'll let you know if it's working for me there, too... (I'm using edismax btw., but I've also tested with standard and got the exception) Stefan Am 03.06.2011 15:22, schrieb Erick Erickson: Hmmm, I just tried it on a trunk from a couple of days ago and it doesn't error out. Could you re-try with a new build? Thanks Erick On Fri, Jun 3, 2011 at 5:51 AM, Stefan Moisesmoi...@shoptimax.de wrote: Hi Stefan, sorry, actually there is a colon, I just forgot it in my example... so the exception also appears for fq=price:[500+TO+*] But only if debug=true... and normal price values work, e.g. fq=price:[500+TO+999] Thanks, Stefan Am 03.06.2011 11:46, schrieb Stefan Matheis: Stefan, i guess there is a colon missing?fq=price:[500+TO+*] should do the trick Regards Stefan On Fri, Jun 3, 2011 at 11:42 AM, Stefan Moisesmoi...@shoptimax.de wrote: Hi, in Solr 4.x (trunk version of mid may) I have noticed a null pointer exception if I activate debugging (debug=true) and use a wildcard to filter by facet value, e.g. if I have a price field ...debug=truefacet.field=pricefq=price[500+TO+*] I get SEVERE: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538) at org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.NullPointerException at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402) at org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535) This used to work in Solr 1.4 and I was wondering if it's a bug or a new feature and if there is a trick to get this working again? Best regards, Stefan . -- Mit den besten Grüßen aus Nürnberg, Stefan Moises *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de *** . -- Mit den besten Grüßen aus Nürnberg, Stefan Moises *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
Re: Strategy -- Frequent updates in our application
Do be careful how often you pull down indexes on your slaves. A too-short polling interval can lead to some problems. Start with, say, 5 minutes and insure that your autowarm time (see your logs) is less than your polling interval Best Erick On Fri, Jun 3, 2011 at 8:43 AM, pravesh suyalprav...@yahoo.com wrote: You can go ahead with the Master/Slave setup provided by SOLR. Its trivial to setup and you also get SOLR's operational scripts for index synch'ing b/w Master-to-Slave(s), OR the Java based replication feature. There is no need to re-invent other architecture :) -- View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019475.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr performance tuning - disk i/o?
Demian, * You can run iostat or vmstat and see if there is disk IO during your slow queries and compare that to disk IO (if any) with your fast/cached queries * You can make sure you warm up your index well after the first and any new searcher, so that OS and Solr caches are warmed up * You can look at Solr Stats page to make sure your caches are utilized well and adjust their settings if they are not. * ... Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Demian Katz demian.k...@villanova.edu To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Fri, June 3, 2011 8:44:33 AM Subject: Solr performance tuning - disk i/o? Hello, I'm trying to move a VuFind installation from an ailing physical server into a virtualized environment, and I'm running into performance problems. VuFind is a Solr 1.4.1-based application with fairly large and complex records (many stored fields, many words per record). My particular installation contains about a million records in the index, with a total index size around 6GB. The virtual environment has more RAM and better CPUs than the old physical box, and I am satisfied that my Java environment is well-tuned. My index is optimized. Searches that hit the cache respond very well. The problem is that non-cached searches are very slow - the more keywords I add, the slower they get, to the point of taking 6-12 seconds to come back with results on a quiet box and well over a minute under stress testing. (The old box still took a while for equivalent searches, but it was about twice as fast as the new one). My gut feeling is that disk access reading the index is the bottleneck here, but I know little about the specifics of Solr's internals, so it's entirely possible that my gut is wrong. Outside testing does show that the the virtual environment's disk performance is not as good as the old physical server, especially when multiple processes are trying to access the same file simultaneously. So, two basic questions: 1.)Would you agree that I'm dealing with a disk bottleneck, or are there some other factors I should be considering? Any good diagnostics I should be looking at? 2.) If the problem is disk access, is there anything I can tune on the Solr side to alleviate the problems? Thanks, Demian
Re: Solr performance tuning - disk i/o?
This doesn't seem right. Here's a couple of things to try: 1 attach debugQuery=on to your long-running queries. The QTime returned is the time taken to search, NOT including the time to load the docs. That'll help pinpoint whether the problem is the search itself, or assembling the documents. 2 Are you autowarming? If so, be sure it's actually done before querying. 3 Measure queries after the first few, particularly if you're sorting or faceting. 4 What are your JVM settings? How much memory do you have? 5 is enableLazyFieldLoading set to true in your solrconfig.xml? 6 How many docs are you returning? There's more, but that'll do for a start Let us know if you gather more data and it's still slow. Best Erick On Fri, Jun 3, 2011 at 8:44 AM, Demian Katz demian.k...@villanova.edu wrote: Hello, I'm trying to move a VuFind installation from an ailing physical server into a virtualized environment, and I'm running into performance problems. VuFind is a Solr 1.4.1-based application with fairly large and complex records (many stored fields, many words per record). My particular installation contains about a million records in the index, with a total index size around 6GB. The virtual environment has more RAM and better CPUs than the old physical box, and I am satisfied that my Java environment is well-tuned. My index is optimized. Searches that hit the cache respond very well. The problem is that non-cached searches are very slow - the more keywords I add, the slower they get, to the point of taking 6-12 seconds to come back with results on a quiet box and well over a minute under stress testing. (The old box still took a while for equivalent searches, but it was about twice as fast as the new one). My gut feeling is that disk access reading the index is the bottleneck here, but I know little about the specifics of Solr's internals, so it's entirely possible that my gut is wrong. Outside testing does show that the the virtual environment's disk performance is not as good as the old physical server, especially when multiple processes are trying to access the same file simultaneously. So, two basic questions: 1.) Would you agree that I'm dealing with a disk bottleneck, or are there some other factors I should be considering? Any good diagnostics I should be looking at? 2.) If the problem is disk access, is there anything I can tune on the Solr side to alleviate the problems? Thanks, Demian
Re: [Visualizations] from Query Results
Hi Adam, Try this: http://lmgtfy.com/?q=search%20results%20visualizations In practice I find that visualizations are cool and attractive looking, but often text is more useful because it's more direct. But there is room for graphical representation of search results, sure. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Adam Estrada estrada.adam.gro...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 7:13:39 AM Subject: [Visualizations] from Query Results Dear Solr experts, I am curious to learn what visualization tools are out there to help me visualize my query results. I am not talking about a language specific client per se but something more like Carrot2 which breaks clusters in to their knowledge tree and expandable pie chart. Sorry if those aren't the correct names for those tools ;-) Anyway, what else is out there like Carrot2 http://project.carrot2.org/ to help me visualize Solr query results? Thanks for your input, Adam
Re: query routing with shards
Hi Dmitry, Yes, you could also implement your own custom SearchComponent. In this component you could grab the query param, examine the query value, and based on that add the shards URL param with appropriate value, so that when the regular QueryComponent grabs stuff from the request, it has the correct shard in there already. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 2:47:00 AM Subject: Re: query routing with shards Hi Otis, I merely followed on the gmail's suggestion to include other people into the recipients list, Yonik was the first one :) I won't do it next time. Thanks for a rapid reply. The reason for doing this query routing is that we abstract the distributed SOLR from the client code for security reasons (that is, we don't want to expose the entire shard farm to the world, but only the frontend SOLR) and for better decoupling. Is it possible to implement a plugin to SOLR that would map queries to shards? We have other choices too, they'll take quite some time, that's why I decided to quickly ask, if I was missing something from the SOLR main components design and configuration. Dmitry On Fri, Jun 3, 2011 at 8:25 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Dmitry (you may not want to additionally copy Yonik, he's subscribed to this list, too) It sounds like you have the knowledge of which query maps to which shard. If so, why not control/change the value of shards param in the request to your front-end Solr (aka distributed request dispatcher) within your app, which is the one calling Solr? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org; yo...@lucidimagination.com Sent: Thu, June 2, 2011 7:00:53 AM Subject: query routing with shards Hello all, We have currently several pretty fat logically isolated shards with the same schema / solrconfig (indices are separate). We currently have one single front end SOLR (1.4) for the client code calls. Since a client code query usually hits only one shard, we are considering making a smart routing of queries to the shards they map to. Can you please give some pointers as to what would be an optimal way to achieve such a routing inside the front end solr? Is there a way to configure mapping inside the solrconfig? Thanks. -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: java.io.IOException: The specified network name is no longer available
Hi, I'm guessing your index is on some sort of network drive that got detached? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Gaurav Shingala gaurav.shing...@hotmail.com To: Apache SolrUser solr-user@lucene.apache.org Sent: Fri, June 3, 2011 1:52:42 AM Subject: java.io.IOException: The specified network name is no longer available Hi, I am using solr 1.4.1 and at the time of updating index getting following error: 2011-06-03 05:54:06,943 ERROR [org.apache.solr.core.SolrCore] (http-10.38.33.146-8080-4) java.io.IOException: The specified network name is no longer available at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:322) at org.apache.lucene.store.SimpleFSDirectory$SimpleFSIndexInput.readInternal(SimpleFSDirectory.java:132) at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:78) at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129) at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179) at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:57) at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1103) at org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:981) at org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:320) at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:640) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:545) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:581) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:903) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:181) at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285) at org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88) at org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:877) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:654) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:951) at java.lang.Thread.run(Thread.java:619) 2011-06-03 05:54:06,943 INFO [org.apache.solr.core.SolrCore] (http-10.38.33.146-8080-4)
Ignore This Test Message
Hey Guys Just a test mail, please ignore this. -- Thanx Regards Jasneet Sabharwal Software Developer NextGen Invent Corporation
Re: Better to have lots of smaller cores or one really big core?
Thanks Erick for the response. So my data structure is the same, i.e. they all use the same schema. Though I think it makes sense for us to somehow break apart the data, for example by the date it was indexed. I'm just trying to get a feel for how large we should aim to keep those (by day, by week, by month, etc...). So it sounds like we should aim to keep them at a size that one solr server can host to avoid serving multiple cores. One question, there is no real difference (other than configuration) from a server hosting its own index vs. it hosting one core, is there? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3019686.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr performance tuning - disk i/o?
Thanks to you and Otis for the suggestions! Some more information: - Based on the Solr stats page, my caches seem to be working pretty well (few or no evictions, hit rates in the 75-80% range). - VuFind is actually doing two Solr queries per search (one initial search followed by a supplemental spell check search -- I believe this is necessary because VuFind has two separate spelling indexes, one for shingled terms and one for single words). That is probably exaggerating the problem, though based on searches with debugQuery on, it looks like it's always the initial search (rather than the supplemental spelling search) that's consuming the bulk of the time. - enableLazyFieldLoading is set to true. - I'm retrieving 20 documents per page. - My JVM settings: -server -Xloggc:/usr/local/vufind/solr/jetty/logs/gc.log -Xms4096m -Xmx4096m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:NewRatio=5 It appears that a large portion of my problem had to do with autowarming, a topic that I've never had a strong grasp on, though perhaps I'm finally learning (any recommended primer links would be welcome!). I did have some autowarming settings in solrconfig.xml (an arbitrary search for a bunch of random keywords in the newSearcher and firstSearcher events, plus autowarmCount settings on all of my caches). However, when I looked at the debugQuery output, I noticed that a huge amount of time was being wasted loading facets on the first search after restarting Solr, so I changed my newSearcher and firstSearcher events to this: arr name=queries lst str name=q*:*/str str name=start0/str str name=rows10/str str name=facettrue/str str name=facet.mincount1/str str name=facet.fieldcollection/str str name=facet.fieldformat/str str name=facet.fieldpublishDate/str str name=facet.fieldcallnumber-first/str str name=facet.fieldtopic_facet/str str name=facet.fieldauthorStr/str str name=facet.fieldlanguage/str str name=facet.fieldgenre_facet/str str name=facet.fieldera_facet/str str name=facet.fieldgeographic_facet/str /lst /arr Overall performance has now increased dramatically, and now the biggest bottleneck in the debug output seems to be the shingle spell checking! Any other suggestions are welcome, since I suspect there's still room to squeeze more performance out of the system, and I'm still not sure I'm making the most of autowarming... but this seems like a big step in the right direction. Thanks again for the help! - Demian -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, June 03, 2011 9:41 AM To: solr-user@lucene.apache.org Subject: Re: Solr performance tuning - disk i/o? This doesn't seem right. Here's a couple of things to try: 1 attach debugQuery=on to your long-running queries. The QTime returned is the time taken to search, NOT including the time to load the docs. That'll help pinpoint whether the problem is the search itself, or assembling the documents. 2 Are you autowarming? If so, be sure it's actually done before querying. 3 Measure queries after the first few, particularly if you're sorting or faceting. 4 What are your JVM settings? How much memory do you have? 5 is enableLazyFieldLoading set to true in your solrconfig.xml? 6 How many docs are you returning? There's more, but that'll do for a start Let us know if you gather more data and it's still slow. Best Erick On Fri, Jun 3, 2011 at 8:44 AM, Demian Katz demian.k...@villanova.edu wrote: Hello, I'm trying to move a VuFind installation from an ailing physical server into a virtualized environment, and I'm running into performance problems. VuFind is a Solr 1.4.1-based application with fairly large and complex records (many stored fields, many words per record). My particular installation contains about a million records in the index, with a total index size around 6GB. The virtual environment has more RAM and better CPUs than the old physical box, and I am satisfied that my Java environment is well- tuned. My index is optimized. Searches that hit the cache respond very well. The problem is that non-cached searches are very slow - the more keywords I add, the slower they get, to the point of taking 6-12 seconds to come back with results on a quiet box and well over a minute under stress testing. (The old box still took a while for equivalent searches, but it was about twice as fast as the new one). My gut feeling is that disk access reading the index is the bottleneck here, but I know little about the specifics of Solr's internals, so it's entirely possible that my gut is wrong. Outside testing does show that the the virtual environment's disk performance is not as good as the old physical server, especially when
fq null pointer exception
I am noticing something strange with our recent upgrade to solr 3.1 and want to see if anyone has experienced anything similar. I have a solr.StrField field named Status the values are Enabled, Disabled, or '' When I facet on that field it I get Enabled 4409565 Disabled 29185 112 The issue is when I do a filter query This query works select/?q=*:*fq=Status:Enabled But when I run this query I get a NPE select/?q=*:*fq=Status:Disabled Here is part of the stack trace Problem accessing /solr/global_accounts/select/. Reason: null java.lang.NullPointerException at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828) at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686) at org.apache.solr.schema.StrField.write(StrField.java:49) at org.apache.solr.schema.SchemaField.write(SchemaField.java:125) at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369) at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545) at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482) at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) ... Thanks, Dan
Solr Performance
Hi, We migrated to Solr a few days back, but have now after going live we have noticed a performance drop, especially when we do a delta index, which we are executing every 1hours with around 100,000 records . We have a multi core Solr server running on a Linux machine, with 4Gb given to the JVM, its not possible for me to upgrade the ram or give more memory to the Solr currently. So I was considering the option of running a master-slave config, I have another window machine with 4gb ram available on the same network. I have two questions regarding this, . Is this a right path to take ? . How can I do this with minimum down time, given the fact that our index is huge . Can someone point me to the right direction for this? Thanks and Regards, Rohit
Re: Sorting
Because when browsing through legislation, people want to browse in the same order as it is actually printed in the hard copy volumes. It did work by using a copyfield to a lowercase field. On Fri, Jun 3, 2011 at 2:29 AM, pravesh suyalprav...@yahoo.com wrote: BTW, why r u sorting on this field? You could also index store this field twice. First, in its original value, and then second, by encoding to some unique code/hash and index it and sort on that. -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-tp3017285p3019055.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hitting the URI limit, how to get around this?
So here's what I'm seeing: I'm running Solr 3.1 I'm running a java client that executes a Httpget (I tried HttpPost) with a large shard list. If I remove a few shards from my current list it returns fine, when I use my full shard list I get a HTTP/1.1 400 Bad Request. If I execute it in firefox with a few shards removed it returns fine, with the full shard list I get a blank screen returned immediately. My URI works at around 7800 characters but adding one more shard to it blows up. Any ideas? I've tried using SolrJ rather than httpget before but ran into similar issues but with even less shards. See http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html My shards are added dynamically, every few hours I am adding new shards or cores into the cluster. so I cannot have a shard list in the config files unless I can somehow update them while the system is running. -- View this message in context: http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020185.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hitting the URI limit, how to get around this?
It sounds like you're hitting the max URL length (8K is a common default) for the HTTP web server that you're using to run Solr. All of the web servers I know about let you bump this limit up via configuration settings. -- Ken On Jun 3, 2011, at 9:27am, JohnRodey wrote: So here's what I'm seeing: I'm running Solr 3.1 I'm running a java client that executes a Httpget (I tried HttpPost) with a large shard list. If I remove a few shards from my current list it returns fine, when I use my full shard list I get a HTTP/1.1 400 Bad Request. If I execute it in firefox with a few shards removed it returns fine, with the full shard list I get a blank screen returned immediately. My URI works at around 7800 characters but adding one more shard to it blows up. Any ideas? I've tried using SolrJ rather than httpget before but ran into similar issues but with even less shards. See http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html My shards are added dynamically, every few hours I am adding new shards or cores into the cluster. so I cannot have a shard list in the config files unless I can somehow update them while the system is running. -- View this message in context: http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020185.html Sent from the Solr - User mailing list archive at Nabble.com. -- Ken Krugler +1 530-210-6378 http://bixolabs.com custom data mining solutions
Re: Solr Performance
Rohit: Yes, run indexing on one machine (master), searches on the other (slave) and set up replication between them. Don't optimize your index and warm up the searcher and caches on slaves. No downtime. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Rohit ro...@in-rev.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 11:49:28 AM Subject: Solr Performance Hi, We migrated to Solr a few days back, but have now after going live we have noticed a performance drop, especially when we do a delta index, which we are executing every 1hours with around 100,000 records . We have a multi core Solr server running on a Linux machine, with 4Gb given to the JVM, its not possible for me to upgrade the ram or give more memory to the Solr currently. So I was considering the option of running a master-slave config, I have another window machine with 4gb ram available on the same network. I have two questions regarding this, . Is this a right path to take ? . How can I do this with minimum down time, given the fact that our index is huge . Can someone point me to the right direction for this? Thanks and Regards, Rohit
Re: Solr performance tuning - disk i/o?
Right, if you facet results, then your warmup queries should include those facets. The same with sorting. If you sort on fields A and B, then include warmup queries that sort on A and B. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Demian Katz demian.k...@villanova.edu To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Fri, June 3, 2011 11:21:52 AM Subject: RE: Solr performance tuning - disk i/o? Thanks to you and Otis for the suggestions! Some more information: - Based on the Solr stats page, my caches seem to be working pretty well (few or no evictions, hit rates in the 75-80% range). - VuFind is actually doing two Solr queries per search (one initial search followed by a supplemental spell check search -- I believe this is necessary because VuFind has two separate spelling indexes, one for shingled terms and one for single words). That is probably exaggerating the problem, though based on searches with debugQuery on, it looks like it's always the initial search (rather than the supplemental spelling search) that's consuming the bulk of the time. - enableLazyFieldLoading is set to true. - I'm retrieving 20 documents per page. - My JVM settings: -server -Xloggc:/usr/local/vufind/solr/jetty/logs/gc.log -Xms4096m -Xmx4096m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:NewRatio=5 It appears that a large portion of my problem had to do with autowarming, a topic that I've never had a strong grasp on, though perhaps I'm finally learning (any recommended primer links would be welcome!). I did have some autowarming settings in solrconfig.xml (an arbitrary search for a bunch of random keywords in the newSearcher and firstSearcher events, plus autowarmCount settings on all of my caches). However, when I looked at the debugQuery output, I noticed that a huge amount of time was being wasted loading facets on the first search after restarting Solr, so I changed my newSearcher and firstSearcher events to this: arr name=queries lst str name=q*:*/str str name=start0/str str name=rows10/str str name=facettrue/str str name=facet.mincount1/str str name=facet.fieldcollection/str str name=facet.fieldformat/str str name=facet.fieldpublishDate/str str name=facet.fieldcallnumber-first/str str name=facet.fieldtopic_facet/str str name=facet.fieldauthorStr/str str name=facet.fieldlanguage/str str name=facet.fieldgenre_facet/str str name=facet.fieldera_facet/str str name=facet.fieldgeographic_facet/str /lst /arr Overall performance has now increased dramatically, and now the biggest bottleneck in the debug output seems to be the shingle spell checking! Any other suggestions are welcome, since I suspect there's still room to squeeze more performance out of the system, and I'm still not sure I'm making the most of autowarming... but this seems like a big step in the right direction. Thanks again for the help! - Demian -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, June 03, 2011 9:41 AM To: solr-user@lucene.apache.org Subject: Re: Solr performance tuning - disk i/o? This doesn't seem right. Here's a couple of things to try: 1 attach debugQuery=on to your long-running queries. The QTime returned is the time taken to search, NOT including the time to load the docs. That'll help pinpoint whether the problem is the search itself, or assembling the documents. 2 Are you autowarming? If so, be sure it's actually done before querying. 3 Measure queries after the first few, particularly if you're sorting or faceting. 4 What are your JVM settings? How much memory do you have? 5 is enableLazyFieldLoading set to true in your solrconfig.xml? 6 How many docs are you returning? There's more, but that'll do for a start Let us know if you gather more data and it's still slow. Best Erick On Fri, Jun 3, 2011 at 8:44 AM, Demian Katz demian.k...@villanova.edu wrote: Hello, I'm trying to move a VuFind installation from an ailing physical server into a virtualized environment, and I'm running into performance problems. VuFind is a Solr 1.4.1-based application with fairly large and complex records (many stored fields, many words per record). My particular installation contains about a million records in the index, with a total index size around 6GB. The virtual environment has more RAM and better CPUs than the old physical box, and I am satisfied that my Java environment is well- tuned. My index is optimized. Searches that hit
Re: fq null pointer exception
Dan, does the problem go away if you get rid of those 112 documents with empty Status or replace their empty status value with, say, Unknown? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dan whelan d...@adicio.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 11:46:46 AM Subject: fq null pointer exception I am noticing something strange with our recent upgrade to solr 3.1 and want to see if anyone has experienced anything similar. I have a solr.StrField field named Status the values are Enabled, Disabled, or '' When I facet on that field it I get Enabled 4409565 Disabled 29185 112 The issue is when I do a filter query This query works select/?q=*:*fq=Status:Enabled But when I run this query I get a NPE select/?q=*:*fq=Status:Disabled Here is part of the stack trace Problem accessing /solr/global_accounts/select/. Reason: null java.lang.NullPointerException at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828) at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686) at org.apache.solr.schema.StrField.write(StrField.java:49) at org.apache.solr.schema.SchemaField.write(SchemaField.java:125) at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369) at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545) at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482) at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) ... Thanks, Dan
Re: query routing with shards
Hi Otis, Thanks! This sounds promising. This custom implementation, will it hurt in any way the stability of the front end SOLR? After implementing it, can I run some tests to verify the stability / performance? Dmitry On Fri, Jun 3, 2011 at 4:49 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Dmitry, Yes, you could also implement your own custom SearchComponent. In this component you could grab the query param, examine the query value, and based on that add the shards URL param with appropriate value, so that when the regular QueryComponent grabs stuff from the request, it has the correct shard in there already. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 2:47:00 AM Subject: Re: query routing with shards Hi Otis, I merely followed on the gmail's suggestion to include other people into the recipients list, Yonik was the first one :) I won't do it next time. Thanks for a rapid reply. The reason for doing this query routing is that we abstract the distributed SOLR from the client code for security reasons (that is, we don't want to expose the entire shard farm to the world, but only the frontend SOLR) and for better decoupling. Is it possible to implement a plugin to SOLR that would map queries to shards? We have other choices too, they'll take quite some time, that's why I decided to quickly ask, if I was missing something from the SOLR main components design and configuration. Dmitry On Fri, Jun 3, 2011 at 8:25 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Dmitry (you may not want to additionally copy Yonik, he's subscribed to this list, too) It sounds like you have the knowledge of which query maps to which shard. If so, why not control/change the value of shards param in the request to your front-end Solr (aka distributed request dispatcher) within your app, which is the one calling Solr? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org; yo...@lucidimagination.com Sent: Thu, June 2, 2011 7:00:53 AM Subject: query routing with shards Hello all, We have currently several pretty fat logically isolated shards with the same schema / solrconfig (indices are separate). We currently have one single front end SOLR (1.4) for the client code calls. Since a client code query usually hits only one shard, we are considering making a smart routing of queries to the shards they map to. Can you please give some pointers as to what would be an optimal way to achieve such a routing inside the front end solr? Is there a way to configure mapping inside the solrconfig? Thanks. -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: query routing with shards
Nah, if you can quickly figure out which shard a given query maps to, then all this component needs to do is stick the appropriate shards param value in the request and let the request pass through to the other SearchComponents in the chain, including QueryComponent, which will know what to do with the shards param. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 12:56:15 PM Subject: Re: query routing with shards Hi Otis, Thanks! This sounds promising. This custom implementation, will it hurt in any way the stability of the front end SOLR? After implementing it, can I run some tests to verify the stability / performance? Dmitry On Fri, Jun 3, 2011 at 4:49 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Dmitry, Yes, you could also implement your own custom SearchComponent. In this component you could grab the query param, examine the query value, and based on that add the shards URL param with appropriate value, so that when the regular QueryComponent grabs stuff from the request, it has the correct shard in there already. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 2:47:00 AM Subject: Re: query routing with shards Hi Otis, I merely followed on the gmail's suggestion to include other people into the recipients list, Yonik was the first one :) I won't do it next time. Thanks for a rapid reply. The reason for doing this query routing is that we abstract the distributed SOLR from the client code for security reasons (that is, we don't want to expose the entire shard farm to the world, but only the frontend SOLR) and for better decoupling. Is it possible to implement a plugin to SOLR that would map queries to shards? We have other choices too, they'll take quite some time, that's why I decided to quickly ask, if I was missing something from the SOLR main components design and configuration. Dmitry On Fri, Jun 3, 2011 at 8:25 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Dmitry (you may not want to additionally copy Yonik, he's subscribed to this list, too) It sounds like you have the knowledge of which query maps to which shard. If so, why not control/change the value of shards param in the request to your front-end Solr (aka distributed request dispatcher) within your app, which is the one calling Solr? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org; yo...@lucidimagination.com Sent: Thu, June 2, 2011 7:00:53 AM Subject: query routing with shards Hello all, We have currently several pretty fat logically isolated shards with the same schema / solrconfig (indices are separate). We currently have one single front end SOLR (1.4) for the client code calls. Since a client code query usually hits only one shard, we are considering making a smart routing of queries to the shards they map to. Can you please give some pointers as to what would be an optimal way to achieve such a routing inside the front end solr? Is there a way to configure mapping inside the solrconfig? Thanks. -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: [Visualizations] from Query Results
Otis and Erick, Believe it or not, I did Google this and didn't come up with anything all that useful. I was at the Lucene Revolution conference last year and saw some prezos that had some sort of graphical representation of the query results. The one from Basic Tech especially caught my attention because it simply showed a graph of hits over time. I can do that using jQuery or Raphael as he suggested. I have also been playing with the Carrot2 visualization tools which are pretty cool too which is why I pointed them out in my original email. I was just curious to see if there were any speciality type projects out there like Carrot2 that folks in the Solr community are using. Adam On Fri, Jun 3, 2011 at 9:42 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Adam, Try this: http://lmgtfy.com/?q=search%20results%20visualizations In practice I find that visualizations are cool and attractive looking, but often text is more useful because it's more direct. But there is room for graphical representation of search results, sure. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Adam Estrada estrada.adam.gro...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 7:13:39 AM Subject: [Visualizations] from Query Results Dear Solr experts, I am curious to learn what visualization tools are out there to help me visualize my query results. I am not talking about a language specific client per se but something more like Carrot2 which breaks clusters in to their knowledge tree and expandable pie chart. Sorry if those aren't the correct names for those tools ;-) Anyway, what else is out there like Carrot2 http://project.carrot2.org/ to help me visualize Solr query results? Thanks for your input, Adam
Feature: skipping caches and info about cache use
Hi, Is it just me, or would others like things like: * The ability to tell Solr (by passing some URL param?) to skip one or more of its caches and get data from the index * An additional attrib in the Solr response that shows whether the query came from the cache or not * Maybe something else along these lines? Or maybe some of this is already there and I just don't know about it? :) Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: query routing with shards
Got it, I can quickly figure the shard out, thanks a lot Otis! Dmitry On Fri, Jun 3, 2011 at 8:00 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Nah, if you can quickly figure out which shard a given query maps to, then all this component needs to do is stick the appropriate shards param value in the request and let the request pass through to the other SearchComponents in the chain, including QueryComponent, which will know what to do with the shards param. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 12:56:15 PM Subject: Re: query routing with shards Hi Otis, Thanks! This sounds promising. This custom implementation, will it hurt in any way the stability of the front end SOLR? After implementing it, can I run some tests to verify the stability / performance? Dmitry On Fri, Jun 3, 2011 at 4:49 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Dmitry, Yes, you could also implement your own custom SearchComponent. In this component you could grab the query param, examine the query value, and based on that add the shards URL param with appropriate value, so that when the regular QueryComponent grabs stuff from the request, it has the correct shard in there already. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 2:47:00 AM Subject: Re: query routing with shards Hi Otis, I merely followed on the gmail's suggestion to include other people into the recipients list, Yonik was the first one :) I won't do it next time. Thanks for a rapid reply. The reason for doing this query routing is that we abstract the distributed SOLR from the client code for security reasons (that is, we don't want to expose the entire shard farm to the world, but only the frontend SOLR) and for better decoupling. Is it possible to implement a plugin to SOLR that would map queries to shards? We have other choices too, they'll take quite some time, that's why I decided to quickly ask, if I was missing something from the SOLR main components design and configuration. Dmitry On Fri, Jun 3, 2011 at 8:25 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Dmitry (you may not want to additionally copy Yonik, he's subscribed to this list, too) It sounds like you have the knowledge of which query maps to which shard. If so, why not control/change the value of shards param in the request to your front-end Solr (aka distributed request dispatcher) within your app, which is the one calling Solr? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Dmitry Kan dmitry@gmail.com To: solr-user@lucene.apache.org; yo...@lucidimagination.com Sent: Thu, June 2, 2011 7:00:53 AM Subject: query routing with shards Hello all, We have currently several pretty fat logically isolated shards with the same schema / solrconfig (indices are separate). We currently have one single front end SOLR (1.4) for the client code calls. Since a client code query usually hits only one shard, we are considering making a smart routing of queries to the shards they map to. Can you please give some pointers as to what would be an optimal way to achieve such a routing inside the front end solr? Is there a way to configure mapping inside the solrconfig? Thanks. -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan -- Regards, Dmitry Kan
RE: Hitting the URI limit, how to get around this?
It sounds like you need to increase the HTTP header size. In tomcat the default is 4096 bytes, and to change it you need to add maxHttpHeaderSize=value to the connector definition in server.xml Colin. -Original Message- From: Ken Krugler [mailto:kkrugler_li...@transpac.com] Sent: Friday, June 03, 2011 12:39 PM To: solr-user@lucene.apache.org Subject: Re: Hitting the URI limit, how to get around this? It sounds like you're hitting the max URL length (8K is a common default) for the HTTP web server that you're using to run Solr. All of the web servers I know about let you bump this limit up via configuration settings. -- Ken On Jun 3, 2011, at 9:27am, JohnRodey wrote: So here's what I'm seeing: I'm running Solr 3.1 I'm running a java client that executes a Httpget (I tried HttpPost) with a large shard list. If I remove a few shards from my current list it returns fine, when I use my full shard list I get a HTTP/1.1 400 Bad Request. If I execute it in firefox with a few shards removed it returns fine, with the full shard list I get a blank screen returned immediately. My URI works at around 7800 characters but adding one more shard to it blows up. Any ideas? I've tried using SolrJ rather than httpget before but ran into similar issues but with even less shards. See http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td 2748556.html http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td 2748556.html My shards are added dynamically, every few hours I am adding new shards or cores into the cluster. so I cannot have a shard list in the config files unless I can somehow update them while the system is running. -- View this message in context: http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-t his-tp3017837p3020185.html Sent from the Solr - User mailing list archive at Nabble.com. -- Ken Krugler +1 530-210-6378 http://bixolabs.com custom data mining solutions
Re: fq null pointer exception
Otis, I just deleted the documents and committed and I still get that error. Thanks, Dan On 6/3/11 9:43 AM, Otis Gospodnetic wrote: Dan, does the problem go away if you get rid of those 112 documents with empty Status or replace their empty status value with, say, Unknown? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dan wheland...@adicio.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 11:46:46 AM Subject: fq null pointer exception I am noticing something strange with our recent upgrade to solr 3.1 and want to see if anyone has experienced anything similar. I have a solr.StrField field named Status the values are Enabled, Disabled, or '' When I facet on that field it I get Enabled 4409565 Disabled 29185 112 The issue is when I do a filter query This query works select/?q=*:*fq=Status:Enabled But when I run this query I get a NPE select/?q=*:*fq=Status:Disabled Here is part of the stack trace Problem accessing /solr/global_accounts/select/. Reason: null java.lang.NullPointerException at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828) at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686) at org.apache.solr.schema.StrField.write(StrField.java:49) at org.apache.solr.schema.SchemaField.write(SchemaField.java:125) at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369) at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545) at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482) at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) ... Thanks, Dan
Re: Strategy -- Frequent updates in our application
On Jun 2, 2011, at 8:29 PM, Naveen Gupta wrote: and what about NRT, is it fine to apply in this case of scenario Is NRT really what's wanted here? I'm asking the experts, as I have a situation not too different from the b.p. It appears to me (from the dox) that NRT makes a difference in the lag between a document being added and it being available in searches. But the BP really sounds to me like a concern over documents-added-per-second. Does the RankingAlgorithm form of NRT improve the docs-added-per-second performance? My add-to-view limits aren't really threatened by Solr performance today; something like 30 seconds is just fine. But I am feeling close enough to the documents-per-second boundary that I'm pondering measures like master/slave. If NRT only improvs add-to-view lag, I'm not overly interested, but if it can improve add throughput, I'm all over it ;-) -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep PGP.sig Description: This is a digitally signed message part
Re: Hitting the URI limit, how to get around this?
Hi, Why not use HTTP POST? Dmitry On Fri, Jun 3, 2011 at 8:27 PM, Colin Bennett cbenn...@job.com wrote: It sounds like you need to increase the HTTP header size. In tomcat the default is 4096 bytes, and to change it you need to add maxHttpHeaderSize=value to the connector definition in server.xml Colin. -Original Message- From: Ken Krugler [mailto:kkrugler_li...@transpac.com] Sent: Friday, June 03, 2011 12:39 PM To: solr-user@lucene.apache.org Subject: Re: Hitting the URI limit, how to get around this? It sounds like you're hitting the max URL length (8K is a common default) for the HTTP web server that you're using to run Solr. All of the web servers I know about let you bump this limit up via configuration settings. -- Ken On Jun 3, 2011, at 9:27am, JohnRodey wrote: So here's what I'm seeing: I'm running Solr 3.1 I'm running a java client that executes a Httpget (I tried HttpPost) with a large shard list. If I remove a few shards from my current list it returns fine, when I use my full shard list I get a HTTP/1.1 400 Bad Request. If I execute it in firefox with a few shards removed it returns fine, with the full shard list I get a blank screen returned immediately. My URI works at around 7800 characters but adding one more shard to it blows up. Any ideas? I've tried using SolrJ rather than httpget before but ran into similar issues but with even less shards. See http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td 2748556.html http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td 2748556.html My shards are added dynamically, every few hours I am adding new shards or cores into the cluster. so I cannot have a shard list in the config files unless I can somehow update them while the system is running. -- View this message in context: http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-t his-tp3017837p3020185.html Sent from the Solr - User mailing list archive at Nabble.com. -- Ken Krugler +1 530-210-6378 http://bixolabs.com custom data mining solutions -- Regards, Dmitry Kan
How to know how many documents are indexed? Anything more elegant than parsing numFound?
$ curl http://192.168.34.51:8080/solr/select?q=*%3A*rows=0; resp.xml $ xmlstarlet sel -t -v //@numFound resp.xml -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Hitting the URI limit, how to get around this?
Yep that was my issue. And like Ken said on Tomcat I set maxHttpHeaderSize=65536. -- View this message in context: http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020774.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to know how many documents are indexed? Anything more elegant than parsing numFound?
: How to know how many documents are indexed? Anything more elegant than : parsing numFound? $ curl http://192.168.34.51:8080/solr/select?q=*%3A*rows=0; resp.xml $ xmlstarlet sel -t -v //@numFound resp.xml solr/admin/stats.jsp is actually an xml too and contains numDocs and maxDoc info. I think you can get numDocs with jmx too. http://wiki.apache.org/solr/SolrJmx
Getting payloads in Highlighter
Hi all, I need to highlight searched words in the original text (xml) of a document. So I'm trying to develop a new Highlighter which uses the defaultHighlighter to highlight some fields and then retrieve the original text file/document (external or internal storage) and put the highlighted parts into them. I'm using an additional field for the field offsets for each field in each document. To store the offsets (and perhaps other infos) I'm using the payloads. (I cannot wait for the future DocValues). now my question, what is the fastest way to retrieve payloads (TermPositions ?) for a given document a given field and a given term ? If other methods exist to do that, I'm open :) Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-payloads-in-Highlighter-tp3020885p3020885.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: fq null pointer exception
And what happens if you add fl=your id field here? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dan whelan d...@adicio.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 1:38:33 PM Subject: Re: fq null pointer exception Otis, I just deleted the documents and committed and I still get that error. Thanks, Dan On 6/3/11 9:43 AM, Otis Gospodnetic wrote: Dan, does the problem go away if you get rid of those 112 documents with empty Status or replace their empty status value with, say, Unknown? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dan wheland...@adicio.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 11:46:46 AM Subject: fq null pointer exception I am noticing something strange with our recent upgrade to solr 3.1 and want to see if anyone has experienced anything similar. I have a solr.StrField field named Status the values are Enabled, Disabled, or '' When I facet on that field it I get Enabled 4409565 Disabled 29185 112 The issue is when I do a filter query This query works select/?q=*:*fq=Status:Enabled But when I run this query I get a NPE select/?q=*:*fq=Status:Disabled Here is part of the stack trace Problem accessing /solr/global_accounts/select/. Reason: null java.lang.NullPointerException at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828) at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686) at org.apache.solr.schema.StrField.write(StrField.java:49) at org.apache.solr.schema.SchemaField.write(SchemaField.java:125) at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369) at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545) at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482) at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) ... Thanks, Dan
Re: Strategy -- Frequent updates in our application
Yes, when people talk about NRT search they refer to 'add to view lag'. In a typical Solr master-slave setup this is dominated by waiting for replication, doing the replication, and then warming up. If your problem is indexing speed then that's a separate story that I think you'll find answers to on http://search-lucene.com/ or if you can't find them we can repeat :) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Jack Repenning jrepenn...@collab.net To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 2:10:27 PM Subject: Re: Strategy -- Frequent updates in our application On Jun 2, 2011, at 8:29 PM, Naveen Gupta wrote: and what about NRT, is it fine to apply in this case of scenario Is NRT really what's wanted here? I'm asking the experts, as I have a situation not too different from the b.p. It appears to me (from the dox) that NRT makes a difference in the lag between a document being added and it being available in searches. But the BP really sounds to me like a concern over documents-added-per-second. Does the RankingAlgorithm form of NRT improve the docs-added-per-second performance? My add-to-view limits aren't really threatened by Solr performance today; something like 30 seconds is just fine. But I am feeling close enough to the documents-per-second boundary that I'm pondering measures like master/slave. If NRT only improvs add-to-view lag, I'm not overly interested, but if it can improve add throughput, I'm all over it ;-) -==- Jack Repenning Technologist Codesion Business Unit CollabNet, Inc. 8000 Marina Boulevard, Suite 600 Brisbane, California 94005 office: +1 650.228.2562 twitter: http://twitter.com/jrep
Re: Getting payloads in Highlighter
To clarify a bit more, I took a look to this function : termPositions public TermPositions termPositions() throws IOException Description copied from class: IndexReader Returns an unpositioned TermPositions enumerator. But it returns an unpositioned enumerator, is there a way to get a TermPositions directly positioned on a document, a field and a term ? Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-payloads-in-Highlighter-tp3020885p3020922.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting payloads in Highlighter
I need to highlight searched words in the original text (xml) of a document. Why don't you remove xml tags in an analyzer? You can highlight xml by doing so.
Re: How to know how many documents are indexed? Anything more elegant than parsing numFound?
$ curl --fail http://192.168.34.51:8080/solr/admin/stats.jsp; resp.xml $ xmlstarlet sel -t -v //@numDocs resp.xml *Extra content at the end of the document* On Fri, Jun 3, 2011 at 8:56 PM, Ahmet Arslan iori...@yahoo.com wrote: : How to know how many documents are indexed? Anything more elegant than : parsing numFound? $ curl http://192.168.34.51:8080/solr/select?q=*%3A*rows=0; resp.xml $ xmlstarlet sel -t -v //@numFound resp.xml solr/admin/stats.jsp is actually an xml too and contains numDocs and maxDoc info. I think you can get numDocs with jmx too. http://wiki.apache.org/solr/SolrJmx -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: fq null pointer exception
It returned results when I added the fl param. Strange... wonder what is going on there Thanks, Dan On 6/3/11 12:17 PM, Otis Gospodnetic wrote: And what happens if you addfl=your id field here? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dan wheland...@adicio.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 1:38:33 PM Subject: Re: fq null pointer exception Otis, I just deleted the documents and committed and I still get that error. Thanks, Dan On 6/3/11 9:43 AM, Otis Gospodnetic wrote: Dan, does the problem go away if you get rid of those 112 documents with empty Status or replace their empty status value with, say, Unknown? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dan wheland...@adicio.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 11:46:46 AM Subject: fq null pointer exception I am noticing something strange with our recent upgrade to solr 3.1 and want to see if anyone has experienced anything similar. I have a solr.StrField field named Status the values are Enabled, Disabled, or '' When I facet on that field it I get Enabled 4409565 Disabled 29185 112 The issue is when I do a filter query This query works select/?q=*:*fq=Status:Enabled But when I run this query I get a NPE select/?q=*:*fq=Status:Disabled Here is part of the stack trace Problem accessing /solr/global_accounts/select/. Reason: null java.lang.NullPointerException at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828) at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686) at org.apache.solr.schema.StrField.write(StrField.java:49) at org.apache.solr.schema.SchemaField.write(SchemaField.java:125) at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369) at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545) at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482) at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) ... Thanks, Dan
Re: Solr performance tuning - disk i/o?
Quick impressions: The faceting is usually best done on fields that don't have lots of unique values for three reasons: 1 It's questionable how much use to the user to have a gazillion facets. In the case of a unique field per document, in fact, it's useless. 2 resource requirements go up as a function of the number of unique terms. This is true for faceting and sorting. 3 warmup times grow the more terms have to be read into memory. Glancing at your warmup stuff, things like publishDate, authorStr and maybe callnumber-first are questionable. publishDate depends on how coarse the resolution is. If it's by day, that's not really much use. authorStr.. How many authors have more than one publication? Would this be better served by some kind of autosuggest rather than facets? callnumber-first... I don't really know, but if it's unique per document it's probably not something the user would find useful as a facet. The admin page will help you determine the number of unique terms per field, which may guide you whether or not to continue to facet on these fields. As Otis said, doing a sort on the fields during warmup will also help. Watch your polling interval for any slaves in relation to the warmup times. If your polling interval is shorter than the warmup times, you run a risk of runaway warmups. As you've figured out, measuring responses to the first few queries doesn't always measure what you really need G.. I don't have the pages handy, but autowarming is a good topic to understand, so you might spend some time tracking it down. Best Erick On Fri, Jun 3, 2011 at 11:21 AM, Demian Katz demian.k...@villanova.edu wrote: Thanks to you and Otis for the suggestions! Some more information: - Based on the Solr stats page, my caches seem to be working pretty well (few or no evictions, hit rates in the 75-80% range). - VuFind is actually doing two Solr queries per search (one initial search followed by a supplemental spell check search -- I believe this is necessary because VuFind has two separate spelling indexes, one for shingled terms and one for single words). That is probably exaggerating the problem, though based on searches with debugQuery on, it looks like it's always the initial search (rather than the supplemental spelling search) that's consuming the bulk of the time. - enableLazyFieldLoading is set to true. - I'm retrieving 20 documents per page. - My JVM settings: -server -Xloggc:/usr/local/vufind/solr/jetty/logs/gc.log -Xms4096m -Xmx4096m -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:NewRatio=5 It appears that a large portion of my problem had to do with autowarming, a topic that I've never had a strong grasp on, though perhaps I'm finally learning (any recommended primer links would be welcome!). I did have some autowarming settings in solrconfig.xml (an arbitrary search for a bunch of random keywords in the newSearcher and firstSearcher events, plus autowarmCount settings on all of my caches). However, when I looked at the debugQuery output, I noticed that a huge amount of time was being wasted loading facets on the first search after restarting Solr, so I changed my newSearcher and firstSearcher events to this: arr name=queries lst str name=q*:*/str str name=start0/str str name=rows10/str str name=facettrue/str str name=facet.mincount1/str str name=facet.fieldcollection/str str name=facet.fieldformat/str str name=facet.fieldpublishDate/str str name=facet.fieldcallnumber-first/str str name=facet.fieldtopic_facet/str str name=facet.fieldauthorStr/str str name=facet.fieldlanguage/str str name=facet.fieldgenre_facet/str str name=facet.fieldera_facet/str str name=facet.fieldgeographic_facet/str /lst /arr Overall performance has now increased dramatically, and now the biggest bottleneck in the debug output seems to be the shingle spell checking! Any other suggestions are welcome, since I suspect there's still room to squeeze more performance out of the system, and I'm still not sure I'm making the most of autowarming... but this seems like a big step in the right direction. Thanks again for the help! - Demian -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, June 03, 2011 9:41 AM To: solr-user@lucene.apache.org Subject: Re: Solr performance tuning - disk i/o? This doesn't seem right. Here's a couple of things to try: 1 attach debugQuery=on to your long-running queries. The QTime returned is the time taken to search, NOT including the time to load the docs. That'll help pinpoint whether the problem is the search itself, or assembling the documents. 2 Are you autowarming? If so, be sure it's actually done before querying. 3 Measure queries after the first few, particularly if
Re: Better to have lots of smaller cores or one really big core?
Nope, cores are just a self-contained index, really. What is the point of breaking them up? If you have some kind of rolling currency (i.e. you only want to keep the last N days/weeks/months) then you can always delete-by-query to age-out the relevant docs. You'll be able to fit more on one server if it's in a single core, but what the ratio is I'm not sure. My take would be go for the simplest, which would be a single core (index) for administrative purposes if for no other reason, but that may well just be personal preference... Best Erick On Fri, Jun 3, 2011 at 10:10 AM, JohnRodey timothydd...@yahoo.com wrote: Thanks Erick for the response. So my data structure is the same, i.e. they all use the same schema. Though I think it makes sense for us to somehow break apart the data, for example by the date it was indexed. I'm just trying to get a feel for how large we should aim to keep those (by day, by week, by month, etc...). So it sounds like we should aim to keep them at a size that one solr server can host to avoid serving multiple cores. One question, there is no real difference (other than configuration) from a server hosting its own index vs. it hosting one core, is there? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3019686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Getting payloads in Highlighter
The original document is not indexed. Currently it is just stored and could be stored in an filesystem or a database in the future. The different parts of a document are indexed in multiple different fields with some different analyzers (stemming, multiple languages, regex,...). So, I don't think your solution can be applied, but if I'm wrong, could you please explain me how ? Thanks, Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-payloads-in-Highlighter-tp3020885p3021383.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Feature: skipping caches and info about cache use
Why, I'm just wondering? For a case where you know the next query would not be possible to be already in the cache because it is so different from the norm? Just for timing information for instrumentation used for tuning (ie so you can compare cached response times vs non-cached response times)? -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, June 03, 2011 10:02 AM To: solr-user@lucene.apache.org Subject: Feature: skipping caches and info about cache use Hi, Is it just me, or would others like things like: * The ability to tell Solr (by passing some URL param?) to skip one or more of its caches and get data from the index * An additional attrib in the Solr response that shows whether the query came from the cache or not * Maybe something else along these lines? Or maybe some of this is already there and I just don't know about it? :) Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: Feature: skipping caches and info about cache use
On Fri, Jun 3, 2011 at 1:02 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Is it just me, or would others like things like: * The ability to tell Solr (by passing some URL param?) to skip one or more of its caches and get data from the index Yeah, we've needed this for a long time, and I believe there's a JIRA issue open for it. It really needs to be on a per query basis though... so a localParam that has cache=true/false would be ideal. -Yonik http://www.lucidimagination.com
Re: fq null pointer exception
Dan, this doesn't really have anything to do with your filter on the Status field except that it causes different documents to be selected. The root cause is a schema mismatch with your index. A string field (or so the schema is saying it's a string field) is returning null for a value, which is impossible (null values aren't stored... they are simply missing). This can happen when the field is actually stored as binary (as is the case for numeric fields). So my guess is that a field that was previously a numeric field is now declared to be of type string by the current schema. You can try varying the fl parameter to see what field is causing the issue, or try luke or the luke request handler for a lower-level view of the index. -Yonik http://www.lucidimagination.com On Fri, Jun 3, 2011 at 11:46 AM, dan whelan d...@adicio.com wrote: I am noticing something strange with our recent upgrade to solr 3.1 and want to see if anyone has experienced anything similar. I have a solr.StrField field named Status the values are Enabled, Disabled, or '' When I facet on that field it I get Enabled 4409565 Disabled 29185 112 The issue is when I do a filter query This query works select/?q=*:*fq=Status:Enabled But when I run this query I get a NPE select/?q=*:*fq=Status:Disabled Here is part of the stack trace Problem accessing /solr/global_accounts/select/. Reason: null java.lang.NullPointerException at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828) at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686) at org.apache.solr.schema.StrField.write(StrField.java:49) at org.apache.solr.schema.SchemaField.write(SchemaField.java:125) at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369) at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545) at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482) at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) ... Thanks, Dan
Re: fq null pointer exception
Right, so now try adding different fields and see which one breaks it again. Then you know which field is a problem and you can dig deeper around that field. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dan whelan d...@adicio.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 4:34:40 PM Subject: Re: fq null pointer exception It returned results when I added the fl param. Strange... wonder what is going on there Thanks, Dan On 6/3/11 12:17 PM, Otis Gospodnetic wrote: And what happens if you addfl=your id field here? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dan wheland...@adicio.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 1:38:33 PM Subject: Re: fq null pointer exception Otis, I just deleted the documents and committed and I still get that error. Thanks, Dan On 6/3/11 9:43 AM, Otis Gospodnetic wrote: Dan, does the problem go away if you get rid of those 112 documents with empty Status or replace their empty status value with, say, Unknown? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: dan wheland...@adicio.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 11:46:46 AM Subject: fq null pointer exception I am noticing something strange with our recent upgrade to solr 3.1 and want to see if anyone has experienced anything similar. I have a solr.StrField field named Status the values are Enabled, Disabled, or '' When I facet on that field it I get Enabled 4409565 Disabled 29185 112 The issue is when I do a filter query This query works select/?q=*:*fq=Status:Enabled But when I run this query I get a NPE select/?q=*:*fq=Status:Disabled Here is part of the stack trace Problem accessing /solr/global_accounts/select/. Reason: null java.lang.NullPointerException at org.apache.solr.response.XMLWriter.writePrim(XMLWriter.java:828) at org.apache.solr.response.XMLWriter.writeStr(XMLWriter.java:686) at org.apache.solr.schema.StrField.write(StrField.java:49) at org.apache.solr.schema.SchemaField.write(SchemaField.java:125) at org.apache.solr.response.XMLWriter.writeDoc(XMLWriter.java:369) at org.apache.solr.response.XMLWriter$3.writeDocs(XMLWriter.java:545) at org.apache.solr.response.XMLWriter.writeDocuments(XMLWriter.java:482) at org.apache.solr.response.XMLWriter.writeDocList(XMLWriter.java:519) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:582) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) ... Thanks, Dan
Re: Feature: skipping caches and info about cache use
Robert, Mainly so that you can tell how fast the search itself is when query or documents or filters are not cached. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Robert Petersen rober...@buy.com To: solr-user@lucene.apache.org Sent: Fri, June 3, 2011 5:58:43 PM Subject: RE: Feature: skipping caches and info about cache use Why, I'm just wondering? For a case where you know the next query would not be possible to be already in the cache because it is so different from the norm? Just for timing information for instrumentation used for tuning (ie so you can compare cached response times vs non-cached response times)? -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, June 03, 2011 10:02 AM To: solr-user@lucene.apache.org Subject: Feature: skipping caches and info about cache use Hi, Is it just me, or would others like things like: * The ability to tell Solr (by passing some URL param?) to skip one or more of its caches and get data from the index * An additional attrib in the Solr response that shows whether the query came from the cache or not * Maybe something else along these lines? Or maybe some of this is already there and I just don't know about it? :) Thanks, Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: How to disable QueryElevationComponent
Romi, If you don't have a unique ID field, you can always create a UUID - see http://search-lucene.com/?q=uuidfc_type=javadoc If you don't want to use QEC, remove it from the list of components in solrconfig.xml Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Romi romijain3...@gmail.com To: solr-user@lucene.apache.org Sent: Fri, May 27, 2011 5:36:22 AM Subject: How to disable QueryElevationComponent Hi, in my indexed document i do not want a uniqueKey field, but when i do not give any uniqueKey in schema.xml then it shows an exception org.apache.solr.common.SolrException: QueryElevationComponent requires the schema to have a uniqueKeyField. it means QueryElevationComponent requires a uniqueKey field.then how can i disable this QueryEvelationComponent. please reply. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-disable-QueryElevationComponent-tp2992195p2992195.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nutch Crawl error
Roger, wrong list. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Roger Shah rs...@caci.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thu, May 26, 2011 3:06:15 PM Subject: Nutch Crawl error I ran the command bin/nutch crawl urls -dir crawl -depth 3 crawl.log When I viewed crawl.log I found some errors such as: Can't retrieve Tika parser for mime-typeapplication/x-shockwave-flash, and some other similar messages for other types such as application/xml, etc. Do I need to download Tika for these errors to go away? Where can I download Tika so that it can work with Nutch? If there are instructions to install Tika to work with Nutch please send them to me. Thanks, Roger
found a bug in query parser upgrading from 1.4.1 to 3.1
Greeting all, I found a bug today while trying to upgrade from 1.4.1 to 3.1 In 1.4.1 I was able to insert this doc: ?xml version=1.0 encoding=UTF-8?adddocfield name=idUser 14914457/fieldfield name=typeUser/fieldfield name=city_sSan Francisco/fieldfield name=name_textjtoy/fieldfield name=login_textjtoy/fieldfield name=description_textlife hacker/fieldfield name=scores:rails_f0.05/field/doc/add And then I can run the query: http://localhost:8983/solr/select?q=lifeqf=description_textdefType=dismaxsort=scores:rails_f+desc and I will get results. If I insert the same document into solr 3.1 and run the same query I get the error: Problem accessing /solr/select. Reason: undefined field scores For some reason, solr has cutoff the column name from the colon forward so scores:rails_f becomes scores I can see in the lucene index that the data for scores:rails_f is in the document. For that reason I believe the bug is in solr and not in lucene. Jason Toy socmetrics http://socmetrics.com @jtoy