Re: Boosting by facets with standard query
What you indicated here is for a different purpose, is it not? I already do something similar with my 'q'. For example a sample query logged in 'catalina.out' looks like webapp=/search path=/select params={rows=15start=0q=(+(content:umts)+OR+(title:umts)^2+OR+(urltext:umts)^2)} when the search term is umts. I am looking for this term umts in the fields - (a) content, (b) title (boosted by a factor of 2) and (c) urltext (boosted by a factor of 2). So the presense of the term umts in title or url is weighed more than its presense in the regular content. So far so good. Now, I have other fields as well, like document type, file type etc... that serve as facets to telescope down. Among the above set of search results, I want to boost a specific document type 'white_papers' a specific file type pdf. By boosting I mean that these white_paper pdf documents should float to the top of the heap in the search results, if such documents are at all present in the search results. So would I simply add the following to the above q? q=(+(content:umts)+OR+(title:umts)^2+OR+(urltext:umts)^2)+AND+(doctype:white_papers)^2+AND+(filetype:pdf)^2 But wouldn't the above give 0 results if there are no white_papers pdfs (because of the AND)? If I use OR, then the meaning of the query is lost altogether. What we need is for the white_papers pdfs to be boosted, but if and only if such doucments are valid results to the search term in question. How would I write my above 'q' to accomplish that? Thanks - ashok Shalin Shekhar Mangar wrote: On Fri, Apr 17, 2009 at 1:03 AM, ashokc ash...@qualcomm.com wrote: I have a query that yields results binned in several facets. How can I boost the results that fall in certain facets over the rest of them that do not belong to those facets? I use the standard query format. Thank you I'm not sure what you mean by boosting by facet. Do you mean that you want to boost documents which match a term query? If yes, you can use your_field_name:value^2.0 in the q parameter. -- Regards, Shalin Shekhar Mangar. -- View this message in context: http://www.nabble.com/Boosting-by-facets-with-standard-query-tp23084860p23091586.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: OutofMemory on Highlightling
I tried hl.maxAnalyzedChars=500 but still the same issue. I get OOM for row size 20 only. -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, April 16, 2009 9:56 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Hi, Have you tried: http://wiki.apache.org/solr/HighlightingParameters#head-2ca22f63cb8d1b2b a3ff0cfc05e85b94898c59cf Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Gargate, Siddharth sgarg...@ptc.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 6:33:46 AM Subject: OutofMemory on Highlightling Hi, I am analyzing the memory usage for my Solr setup. I am testing with 500 text documents of 2 MB each. I have defined a field for displaying the teasers and storing 1 MB of text in it. I am testing with just 128 MB maxHeap(I know I should be increasing it but just testing the worst case scenario). If I search for all 500 documents with row size as 500 and highlighting disabled, it works fine. But if I enable highlighting I get OutofMemoryError. Looks like stored field for all the matched results are read into the memory. How to avoid this memory consumption? Thanks, Siddharth
Re: Boosting by facets with standard query
On Fri, Apr 17, 2009 at 11:32 AM, ashokc ash...@qualcomm.com wrote: What we need is for the white_papers pdfs to be boosted, but if and only if such doucments are valid results to the search term in question. How would I write my above 'q' to accomplish that? Thanks for explaining in detail. Basically, all you want to do is sort the results in the following order: 1. White papers 2. PDFs 3. Others or maybe #1 and #2 are equivalent and can be intermingled. Easiest way to do this is to index a new field whose values (when sorted) give you the desired order. Then you can simply sort on that field and score. -- Regards, Shalin Shekhar Mangar.
Re: DataImport, remove doc when marked as deleted
I have now :-) Thanks , missed that in the Wiki. Ruben On Apr 16, 2009, at 7:10 PM, Noble Paul നോബിള് नोब्ळ् wrote: did you try the deletedPkQuery? On Thu, Apr 16, 2009 at 7:49 PM, Ruben Chadien ruben.chad...@aspiro.com wrote: Hi I am new to Solr, but have been using Lucene for a while. I am trying to rewrite some old lucene indexing code using the Jdbc DataImport i Solr, my problem: I have Entities that can be marked in the db as deleted, these i don't want to index and thats no problem when doing a full-import. When doing a delta- import my deltaQuery will catch Entities that has been marked as deleted since last index, but how do i get it to delete those from the index ? I tried making the deltaImportQuery so that in don't return the Entity if its deleted, that didnt help... Any ideas ? Thanks Ruben -- --Noble Paul
Re: Faceted Search
if you are querying using a http request you can add these two parameters: facet=true facet.field=field_for_faceting and optionally this one to set the max number of facets: facet.limit=facet_limit I don't know if it's what you need... On Fri, Apr 17, 2009 at 6:17 AM, Sajith Weerakoon saji...@zone24x7.comwrote: Hi all, Can someone of you tell me how to implement a faceted search? Thanks, Regards, Sajith Vimukthi Weerakoon.
Re: Authentication Error
It is fixed in the trunk On Thu, Apr 16, 2009 at 10:47 PM, Allahbaksh Asadullah allahbaks...@gmail.com wrote: Thanks Noble.Regards, Allahbaksh 2009/4/16 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah allahbaks...@gmail.com wrote: Hi,I have followed the procedure given on this blog to setup the solr Below is my code. I am trying to index the data but I am not able to connect to server and getting authentication error. HttpClient client=new HttpClient(); client.getState().setCredentials(new AuthScope(localhost, 80, AuthScope.ANY_SCHEME), new UsernamePasswordCredentials(admin, admin)); Can you please let me know what may be the problem. The other problem which I am facing is using Load Banlancing SolrServer lbHttpSolrServer = new LBHttpSolrServer( http://localhost:8080/solr,http://localhost:8983/solr;); Now the problem is the first server is down then I will get an error. If I swap the server in constructor by giving port 8983 server as first and 8080 as second it works fine. The thing Problem is If only the last server which is set is active and the rest of other are down then Solr throws and exception and search is not performed. I shall write a testcase and let you know Regards, Allahbaksh -- --Noble Paul -- Allahbaksh Mohammedali Asadullah, Software Engineering Technology Labs, Infosys Technolgies Limited, Electronic City, Hosur Road, Bangalore 560 100, India. (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927. Fax: 91-80-28520362 | Mobile: 91-9845505322. -- --Noble Paul
Using Lucene MultiFieldQueryParser with SOLR
Hello, I am searching for a way to use the Lucene MultiFieldQueryParser in my SOLR Installation. Is there a chance to change the solrQueryParser ? In my old Lucene Setting I used to combine many different types of QueryParser in my Querry... Or is there a chance to get MultiFieldQueryParser functions in SOLR ? Greets -Ralf-
Re: Sorting performance + replication of index between cores
Hi Christophe, Did you find a way to fix up your problem, cuz even with replication will have this problem, lot of update means clear cache and manage that. I've the same issue, I just wondering if I won't turn off servers during update ??? How did you fix that ? Thanks, sunny christophe-2 wrote: Hi, After fully reloading my index, using another field than a Data does not help that much. Using a warmup query avoids having the first request slow, but: - Frequents commits means that the Searcher is reloaded frequently and, as the warmup takes time, the clients must wait. - Having warmup slows down the index process (I guess this is because after a commit, the Searchers are recreated) So I'm considering, as suggested, to have two instances: one for indexing and one for searching. I was wondering if there are simple ways to replicate the index in a single Solr server running two cores ? Any such config already tested ? I guess that the standard replication based on rsync can be simplified a lot in this case as the two indexes are on the same server. Thanks Christophe Beniamin Janicki wrote: :so you can send your updates anytime you want, and as long as you only :commit every 5 minutes (or commit on a master as often as you want, but :only run snappuller/snapinstaller on your slaves every 5 minutes) your :results will be at most 5minutes + warming time stale. This is what I do as well ( commits are done once per 5 minutes ). I've got master - slave configuration. Master has turned off all caches (commented in solrconfig.cml) and setup only 2 maxWarmingSearchers. Index size has 5GB ,Xmx= 1GB and committing takes around 10 secs ( on default configuration with warming it took from 30 mins up to 2 hours). Slave caches are configured to have autowarmCount=0 and maxWarmingSearchers=1 , and I have new data 1 second after snapshoot is done. I haven't noticed any huge delays while serving search request. Try to use those values - may be they'll help in your case too. Ben Janicki -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: 22 October 2008 04:56 To: solr-user@lucene.apache.org Subject: Re: Sorting performance : The problem is that I will have hundreds of users doing queries, and a : continuous flow of document coming in. : So a delay in warming up a cache could be acceptable if I do it a few times : per day. But not on a too regular basis (right now, the first query that loads : the cache takes 150s). : : However: I'm not sure why it looks not to be a good idea to update the caches you can refresh the caches automaticly after updating, the newSearcher event is fired whenever a searcher is opened (but before it's used by clients) so you can configure warming queries for it -- it doesn't have to be done manually (or by the first user to use that reader) so you can send your updates anytime you want, and as long as you only commit every 5 minutes (or commit on a master as often as you want, but only run snappuller/snapinstaller on your slaves every 5 minutes) your results will be at most 5minutes + warming time stale. -Hoss -- View this message in context: http://www.nabble.com/Sorting-performance-tp20037712p23094174.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Lucene MultiFieldQueryParser with SOLR
Think there's no search handler that uses MultiFieldQueryParser in Solr. But check DismaxRequestHandler, probably will do the job. Yo can specify all the fields where you want to search in and it will build the query using boolean queries. It includes also many more features: http://wiki.apache.org/solr/DisMaxRequestHandler Kraus, Ralf | pixelhouse GmbH wrote: Hello, I am searching for a way to use the Lucene MultiFieldQueryParser in my SOLR Installation. Is there a chance to change the solrQueryParser ? In my old Lucene Setting I used to combine many different types of QueryParser in my Querry... Or is there a chance to get MultiFieldQueryParser functions in SOLR ? Greets -Ralf- -- View this message in context: http://www.nabble.com/Using-Lucene-MultiFieldQueryParser-with-SOLR-tp23094412p23094692.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Lucene MultiFieldQueryParser with SOLR
Marc Sturlese schrieb: Think there's no search handler that uses MultiFieldQueryParser in Solr. But check DismaxRequestHandler, probably will do the job. Yo can specify all the fields where you want to search in and it will build the query using boolean queries. It includes also many more features: http://wiki.apache.org/solr/DisMaxRequestHandler Is there a chance to combine RequestHandler ? I need to use some additional normal boolean and integer querries ! Greets -Ralf-
Re: Using Lucene MultiFieldQueryParser with SOLR
Marc Sturlese schrieb: Think there's no search handler that uses MultiFieldQueryParser in Solr. But check DismaxRequestHandler, probably will do the job. Yo can specify all the fields where you want to search in and it will build the query using boolean queries. It includes also many more features: http://wiki.apache.org/solr/DisMaxRequestHandler THX A LOT ! You really made my day ! Greets -Ralf-
Re: Authentication Error
Hi Noble. Thank you very much. I will download the latest solr nightly build. Please note this is the another problem which I think is bug. I am trying out load balancing feature in Solr 1.4 using LBHTTPSolrServer. Below is setup I have three solr server. A, B and C. Now the problem is if I make first two solr server (Note I have specified A, B, C in order) i.e A and B down then it throws and exception. It does not check it with server C. Though the server C is still active. In short the if only last server specified in the constructor is active then I get a Exception and query doesnot get fired. Is it a bug or what may be the exact problem. Regards, Allahbaksh 2009/4/17 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com It is fixed in the trunk On Thu, Apr 16, 2009 at 10:47 PM, Allahbaksh Asadullah allahbaks...@gmail.com wrote: Thanks Noble.Regards, Allahbaksh 2009/4/16 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah allahbaks...@gmail.com wrote: Hi,I have followed the procedure given on this blog to setup the solr Below is my code. I am trying to index the data but I am not able to connect to server and getting authentication error. HttpClient client=new HttpClient(); client.getState().setCredentials(new AuthScope(localhost, 80, AuthScope.ANY_SCHEME), new UsernamePasswordCredentials(admin, admin)); Can you please let me know what may be the problem. The other problem which I am facing is using Load Banlancing SolrServer lbHttpSolrServer = new LBHttpSolrServer( http://localhost:8080/solr,http://localhost:8983/solr;); Now the problem is the first server is down then I will get an error. If I swap the server in constructor by giving port 8983 server as first and 8080 as second it works fine. The thing Problem is If only the last server which is set is active and the rest of other are down then Solr throws and exception and search is not performed. I shall write a testcase and let you know Regards, Allahbaksh -- --Noble Paul -- Allahbaksh Mohammedali Asadullah, Software Engineering Technology Labs, Infosys Technolgies Limited, Electronic City, Hosur Road, Bangalore 560 100, India. (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927. Fax: 91-80-28520362 | Mobile: 91-9845505322. -- --Noble Paul -- Allahbaksh Mohammedali Asadullah, Software Engineering Technology Labs, Infosys Technolgies Limited, Electronic City, Hosur Road, Bangalore 560 100, India. (Board: 91-80-28520261 | Extn: 73927 | Direct: 41173927. Fax: 91-80-28520362 | Mobile: 91-9845505322.
Re: Using Lucene MultiFieldQueryParser with SOLR
Well dismax has a q.alt parameter where you can specify a query in lucene sintax. The query must be empty to use q.alt: http://.../select?q=q.alt=phone_number:1234567 This would search in the field phone_number independly of what fields you have configured in teh dismax. Another way would be to confiure various requesthandlers (one with dismax and one standard for the filed that you want for example). You can tell Solr wich to use in the url request Don't know if this is what you need... Kraus, Ralf | pixelhouse GmbH wrote: Marc Sturlese schrieb: Think there's no search handler that uses MultiFieldQueryParser in Solr. But check DismaxRequestHandler, probably will do the job. Yo can specify all the fields where you want to search in and it will build the query using boolean queries. It includes also many more features: http://wiki.apache.org/solr/DisMaxRequestHandler Is there a chance to combine RequestHandler ? I need to use some additional normal boolean and integer querries ! Greets -Ralf- -- View this message in context: http://www.nabble.com/Using-Lucene-MultiFieldQueryParser-with-SOLR-tp23094412p23097365.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Lucene MultiFieldQueryParser with SOLR
Marc Sturlese schrieb: Well dismax has a q.alt parameter where you can specify a query in lucene sintax. The query must be empty to use q.alt: http://.../select?q=q.alt=phone_number:1234567 This would search in the field phone_number independly of what fields you have configured in teh dismax. Now I use the fq parameter in combination with q.alt ... Runs fine yet :-) The fq parameter sets my additional query parameter :-) Greets -Ralf-
Re: Using Lucene MultiFieldQueryParser with SOLR
Marc Sturlese schrieb: The only problem I found with q.alt is that it doesn't allow highlighting (or at least it doesn't showed it for me). If you find out how to do it let me know. I use highlighting only with the normal querry ! My q.alt is *.* But its really sad that the dismax dont support wildcarts :-( Greets -Ralf-
Re: Using Lucene MultiFieldQueryParser with SOLR
The only problem I found with q.alt is that it doesn't allow highlighting (or at least it doesn't showed it for me). If you find out how to do it let me know. Thanks! Kraus, Ralf | pixelhouse GmbH wrote: Marc Sturlese schrieb: Well dismax has a q.alt parameter where you can specify a query in lucene sintax. The query must be empty to use q.alt: http://.../select?q=q.alt=phone_number:1234567 This would search in the field phone_number independly of what fields you have configured in teh dismax. Now I use the fq parameter in combination with q.alt ... Runs fine yet :-) The fq parameter sets my additional query parameter :-) Greets -Ralf- -- View this message in context: http://www.nabble.com/Using-Lucene-MultiFieldQueryParser-with-SOLR-tp23094412p23097737.html Sent from the Solr - User mailing list archive at Nabble.com.
EventListeners of DIM
Hey there, I have seen the new feature of EventListeners of DIH in trunk. dataConfig document onImportStart =com.FooStart onImportEnd=comFooEnd /document /dataConfig These events are called at the begining and end of the whole indexing process or at the begining and end of indexing just a document. My idea is to update a field of a row of a mysl table every time a doc is indexed. Is this possible or I should I save all doc ids and do the update of the row of the table using onImportEnd? Thanks in advance! -- View this message in context: http://www.nabble.com/EventListeners-of-DIM-tp23098357p23098357.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: EventListeners of DIM
these are for the beginning and end of the whoke indexing process On Fri, Apr 17, 2009 at 7:38 PM, Marc Sturlese marc.sturl...@gmail.com wrote: Hey there, I have seen the new feature of EventListeners of DIH in trunk. dataConfig document onImportStart =com.FooStart onImportEnd=comFooEnd /document /dataConfig These events are called at the begining and end of the whole indexing process or at the begining and end of indexing just a document. My idea is to update a field of a row of a mysl table every time a doc is indexed. Is this possible or I should I save all doc ids and do the update of the row of the table using onImportEnd? Thanks in advance! -- View this message in context: http://www.nabble.com/EventListeners-of-DIM-tp23098357p23098357.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: Customizing solr with my lucene
Hey Erik, I also checked the index using luke and the index shows that the terms are indexed as they should have been. So that implies that something is wrong with the querying only and the results are not getting retrieved.(As i said earlier even the parsed query is the way it should be according to the changes i have made to lucene.) Any ideas you have on this. Why this could be happening. One more thing... tried to query the solr index using luke ...but still no resultsmay be the index is not stored correctlycould it be changes in the lucene api???should i revert to an older version of solr??? -- View this message in context: http://www.nabble.com/Customizing-solr-with-my-lucene-tp23038007p23098700.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Garbage Collectors
I would also include the -XX:+HeapDumpOnOutOfMemoryError option to get a heap dump when the JVM runs out of heap space. On Thu, Apr 16, 2009 at 9:43 PM, Bryan Talbot btal...@aeriagames.comwrote: If you're using java 5 or 6 jmap is a useful tool in tracking down memory leaks. http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html jmap -histo:live pid will print a histogram of all live objects in the heap. Start at the top and work your way down until you find something suspicious -- the trick is in knowing what is suspicious of course. -Bryan On Apr 16, 2009, at Apr 16, 3:40 PM, David Baker wrote: Otis Gospodnetic wrote: Personally, I'd start from scratch: -Xmx -Xms... -server is not even needed any more. If you are not using Java 1.6, I suggest you do. Next, I'd try to investigate why objects are not being cleaned up - this should not be happening in the first place. Is Solr the only webapp running? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Baker dav...@mate1inc.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 3:33:18 PM Subject: Garbage Collectors I have an issue with garbage collection on our solr servers. We have an issue where the old generation never gets cleaned up on one of our servers. This server has a little over 2 million records which are updated every hour or so. I have tried the parallel GC and the concurrent GC. The parallel seems more stable for us, but both end up running out of memory. I have increased the memory allocated to the servers, but this just seems to delay the problem. My question is, what are the suggested options for using the parallel GC. Currently we are using something of this nature: -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy -XX:+UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m -XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr I am new to solr and GC tuning, so any advice is appreciated. Thanks for the reply, yes, solr is the only app running under this tomcat server. I will remove -server, and other options except the heap allocation options and see how it performs. Any suggestions on how to go about finding out why objects are not being cleaned up if these changes dont work?
Re: CollapseFilter with the latest Solr in trunk
We are currently trying to do the same thing. With the patch unaltered we can use fq as long as collapsing is turned on. If we just send a normal document level query with an fq parameter it blows up. Additionally, it does not appear that the collapse.facet option works at all. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: climbingrose climbingr...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Fri, 17 Apr 2009 16:53:00 +1000 To: solr-user solr-user@lucene.apache.org Subject: CollapseFilter with the latest Solr in trunk Hi all, Have any one try to use CollapseFilter with the latest version of Solr in trunk? However, it looks like Solr 1.4 doesn't allow calling setFilterList() and setFilter() on one instance of the QueryCommand. I modified the code in QueryCommand to allow this: public QueryCommand setFilterList(Query f) { // if( filter != null ) { //throw new IllegalArgumentException( Either filter or filterList may be set in the QueryCommand, but not both. ); // } filterList = null; if (f != null) { filterList = new ArrayListQuery(2); filterList.add(f); } return this; } However, I still have a problem which prevent query filters from working when used in conjunction with CollapseFilter. In other words, query filters doesn't seem to have any effects on the result set when CollapseFilter is used. The other problem is related to OpenBitSet: java.lang.ArrayIndexOutOfBoundsException: 2183 at org.apache.lucene.util.OpenBitSet.fastSet(OpenBitSet.java:242) at org.apache.solr.search.CollapseFilter.addDoc(CollapseFilter.java:202) at org.apache.solr.search.CollapseFilter.adjacentCollapse(CollapseFilter.java:161 ) at org.apache.solr.search.CollapseFilter.lt;initgt;(CollapseFilter.java:141) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:2 17) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandle r.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.ja va:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303 ) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:23 2) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFi lterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChai n.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java :213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java :178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:1 07) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processCon nection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java: 527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWork erThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java: 684) at java.lang.Thread.run(Thread.java:619) I think CollapseFilter is rather an important function in Solr that gets used quite frequently. Does anyone have a solution for this? -- Regards, Cuong Hoang
WordDelimiterFilterFactory removes words when options set to 0
In trying to understand the various options for WordDelimiterFilterFactory, I tried setting all options to 0. This seems to prevent a number of words from being output at all. In particular can't and 99dxl don't get output, nor do any wods containing hypens. Is this correct behavior? Here is what the Solr Analyzer output org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 1 2 3 4 5 6 7 8 9 term text ca-55 99_3_a9 55-67 powerShot ca999x15foo-bar can't joe's 99dxl org.apache.solr.analysis.WordDelimiterFilterFactory {splitOnCaseChange=0, generateNumberParts=0, catenateWords=0, generateWordParts=0, catenateAll=0, catenateNumbers=0} term position 1 5 term text powerShot joe term type wordword source start,end20,29 53,56 Here is the schema fieldtype name=mbooksOcrXPatLike class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=0 generateWordParts=0 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldtype Tom
Re: python response handler treats unschema'd fields differently
Seems like we could handle this 2 ways... leave out the field if it's not defined in the schema, or include it and write it out as a string. I think either would probably be more useful than throwing an error (which isn't really a request error but rather a schema/indexing error). Thoughts? -Yonik http://www.lucidimagination.com On Fri, Apr 17, 2009 at 4:36 PM, Brian Whitman br...@echonest.com wrote: I have a solr index where we removed a field from the schema but it still had some documents with that field in it. Queries using the standard response handler had no problem but the wt=python handler would break on any query (with fl=* or asking for that field directly) with: SolrHTTPException: HTTP code=400, reason=undefined_field_oldfield I fixed it by putting that field back in the schema. One related weirdness is that fl=oldfield would cause the exception but not fl=othernonschemafield -- that is, it would only break on field names that were not in schema but were in the documents. I know this is undefined behavior territory but it was still weird that the standard response writer does not do this-- if you give a nonexistent field name to fl on wt=standard, either one that is in documents or is not -- it happily performs the query just skipping the ones that are not in the schema.
Re: Hierarchal Faceting Field Type
: level one# : level one#level two# : level one#level two#level three# : : Trying to find the right combination of field type and query to get the : desired results. Saw some previous posts about hierarchal facets which helped : in the generating the right query but having an issue using the built in text : field which ignores our delimiter and the string field which prevents us from : doing a start with search. Does anyone have any insight into the field : declaration? Use TextField, with a PatternTokenizer BTW: if this isn't thread you've already seen, it's handy to know about... http://www.nabble.com/Hierarchical-Faceting-to20090898.html#a20176326 -Hoss
Re: SNMP monitoring
: How would I set up SNMP monitoring of my Solr server? I've done some : searching of the wiki and Google and have come up with a blank. Any : pointers? it depends on what you want to monitor. if you just want to know what the JVM is running, this should be fairly easy... if you wnat to be able to get Solr specific stats/data your best bet is probably to look into ways to access JMX MBeans via SNMP (there seem to be some tools out there to do things like this) http://blogs.sun.com/jmxetc/entry/jmx_vs_snmp http://www.google.co.uk/search?hl=enq=jmx+snmp -Hoss
Re: Seattle / PNW Hadoop + Lucene User Group?
OK, we've got 3 people... that's enough for a party? :) Surely there must be dozens more of you guys out there... c'mon, accelerate your knowledge! Join us in Seattle! On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle area? I can donate some facilities, etc. -- I also always have topics to speak about :) Cheers, Bradford
Re: Garbage Collectors
The only thing that comes to mind is running Solr under a profiler (e.g. YourKit) and figuring out which objects are not getting cleaned up and who's holding references to them. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Baker dav...@mate1inc.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 6:40:31 PM Subject: Re: Garbage Collectors Otis Gospodnetic wrote: Personally, I'd start from scratch: -Xmx -Xms... -server is not even needed any more. If you are not using Java 1.6, I suggest you do. Next, I'd try to investigate why objects are not being cleaned up - this should not be happening in the first place. Is Solr the only webapp running? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Baker To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 3:33:18 PM Subject: Garbage Collectors I have an issue with garbage collection on our solr servers. We have an issue where the old generation never gets cleaned up on one of our servers. This server has a little over 2 million records which are updated every hour or so. I have tried the parallel GC and the concurrent GC. The parallel seems more stable for us, but both end up running out of memory. I have increased the memory allocated to the servers, but this just seems to delay the problem. My question is, what are the suggested options for using the parallel GC. Currently we are using something of this nature: -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy -XX:+UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m -XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr I am new to solr and GC tuning, so any advice is appreciated. Thanks for the reply, yes, solr is the only app running under this tomcat server. I will remove -server, and other options except the heap allocation options and see how it performs. Any suggestions on how to go about finding out why objects are not being cleaned up if these changes dont work?
Re: dual of method - CommonsHttpSolrServer(url) to close and destroy underlying httpclient connection
httpClient.getHttpConnectionManager().closeIdleConnections(); --Noble On Sat, Apr 18, 2009 at 1:31 AM, Rakesh Sinha rakesh.use...@gmail.com wrote: When we instantiate a commonshttpsolrserver - we use the following method. CommonsHttpSolrServer server = new CommonsHttpSolrServer(this.endPoint); how do we do we a 'kill all' of all the underlying httpclient connections ? server.getHttpClient() returns a HttpClient reference, but I am trying to figure out the right method to close all currently active httpclient connections . -- --Noble Paul