Re: DataImporter : Java heap space
On Thu, Apr 16, 2009 at 10:31 AM, Mani Kumar manikumarchau...@gmail.comwrote: Aah, Bryan you got it ... Thanks! Noble: so i can hope that it'll be fixed soon :) thank you for fixing it ... please lemme know when its done.. This is fixed in trunk. The next nightly build should have this fix. -- Regards, Shalin Shekhar Mangar.
truncating indexed docs
Is it possible to truncate large documents once they are indexed? (Can this be done without re-indexing) Regards, CI
Re: Using CSV for indexing ... Remote Streaming disabled
Any help on this? Could this error be because of something else (not remote streaming issue)? Thanks. On Wed, Apr 15, 2009 at 10:04 AM, vivek sar vivex...@gmail.com wrote: Hi, I'm trying using CSV (Solr 1.4, 03/29) for indexing following wiki (http://wiki.apache.org/solr/UpdateCSV). I've updated the solrconfig.xml to have this lines, requestDispatcher handleSelect=true requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=20480 / ... /requestDispatcher requestHandler name=/update/csv class=solr.CSVRequestHandler startup=lazy / When I try to upload the csv, curl 'http://localhost:8080/solr/20090414_1/update/csv?commit=trueseparator=%09escape=%5cstream.file=/Users/opal/temp/afterchat/data/csv/1239759267339.csv' I get following response, /headbodyh1HTTP Status 400 - Remote Streaming is disabled./h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uRemote Streaming is disabled./u/ppbdescription/b uThe request sent by the client was syntactically incorrect (Remote Streaming is disabled.)./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/html Why is it complaining about the remote streaming if it's already enabled? Is there anything I'm missing? Thanks, -vivek
Invalid_Date_String on posting XML to the index
Hi all, I'm encountering a problem when I try to add records with a date field to the index. The records I'm adding have very little date precision, usually MMDD but some only have year and month, others only have a year. I'm trying to get around this by using a text pattern factory to modify the field before indexing. This seems to work fine if the class is solr.TextField and a date will be converted from eg 1953 to 1953-01-01T00:00:00.000Z and then inserted into the index. However, if I want to have the field as an actual date field (for doing range searches etc) I get the following error when I post the XML file. SimplePostTool: FATAL: Solr returned an error: Invalid_Date_String1953 The corresponding stack trace from the solr server is: Apr 15, 2009 4:27:26 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'1953' at org.apache.solr.schema.DateField.parseMath(DateField.java:167) at org.apache.solr.schema.DateField.toInternal(DateField.java:138) at org.apache.solr.schema.FieldType.createField(FieldType.java:179) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:93) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 243) at org .apache .solr .update .processor .RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58) at org .apache .solr .handler .XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196) at org .apache .solr .handler .XmlUpdateRequestHandler .handleRequestBody(XmlUpdateRequestHandler.java:123) at org .apache .solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232) at org.mortbay.jetty.servlet.ServletHandler $CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 405) at org .mortbay .jetty .handler.ContextHandlerCollection.handle(ContextHandlerCollection.java: 211) at org .mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java: 114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 502) at org.mortbay.jetty.HttpConnection $RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector $Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool $PoolThread.run(BoundedThreadPool.java:442) My schema.xml file looks something like this: ... fieldType name=dateFormatter class=solr.DateField sortMissingLast=true omitNorms=true analyzer filter class=solr.TrimFilterFactory / tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.PatternReplaceFilterFactory pattern=^(\d{4}) $ replacement=$1.01.01 replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(\d{4})\. (\d{2})$ replacement=$1.$2.01 replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(\d{4})\. (\d{2})\.(\d{2})$ replacement=$1-$2-$3T00:00:00.000Z replace=all / /analyzer /fieldType ... field name=DateRecorded type=dateFormatter indexed=true stored=true multiValued=false/ ... My thinking is that Solr is trying to add the field directly as '1953' before doing the text factory stuff and is therefore not in the right format for indexing. Does that sound like a reasonable assumption and am I missing something which is causing it to go wrong? Can anyone help please? I was originally storing the date in YYMMDD format as a text field and searching with wildcards, but that strikes me as somewhat inefficient. I could go back to doing that if necessary, but I'd rather do it the right way if I can. Many thanks for your help. Mark PS. Apologies if this message comes through twice - I sent it yesterday afternoon but it hasn't turned up on the mailing list yet, so I'm trying again. -- The University of Edinburgh is a charitable body, registered in Scotland,
Re: Invalid_Date_String on posting XML to the index
On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan mark.al...@ed.ac.uk wrote: My thinking is that Solr is trying to add the field directly as '1953' before doing the text factory stuff and is therefore not in the right format for indexing. Does that sound like a reasonable assumption and am I missing something which is causing it to go wrong? Can anyone help please? That is correct. You'll need to do the date creation in your own code so that you send a well-formed date to Solr. -- Regards, Shalin Shekhar Mangar.
Re: Invalid_Date_String on posting XML to the index
On 16 Apr 2009, at 9:00 am, Shalin Shekhar Mangar wrote: On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan mark.al...@ed.ac.uk wrote: My thinking is that Solr is trying to add the field directly as '1953' before doing the text factory stuff and is therefore not in the right format for indexing. Does that sound like a reasonable assumption and am I missing something which is causing it to go wrong? Can anyone help please? That is correct. You'll need to do the date creation in your own code so that you send a well-formed date to Solr. Hi, thanks for your prompt reply. I'm a bit confused though - the only way to do this is a two-step process? I have to write code to munge the XML into another document which is exactly the same except for the format of the Date field, and then import that second file? Isn't that the whole purpose of having an analyzer with the solr.PatternReplaceFilterFactory filters? What's odd is that the pattern replacement works if I store the field as text but not as a date. Are you sure this isn't a bug? Mark -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: Invalid_Date_String on posting XML to the index
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan mark.al...@ed.ac.uk wrote: Hi, thanks for your prompt reply. I'm a bit confused though - the only way to do this is a two-step process? I have to write code to munge the XML into another document which is exactly the same except for the format of the Date field, and then import that second file? Isn't that the whole purpose of having an analyzer with the solr.PatternReplaceFilterFactory filters? What's odd is that the pattern replacement works if I store the field as text but not as a date. Are you sure this isn't a bug? Analyzers are applied only for the indexed value but not the stored value. A value which is added to DateField is converted to the same internal format (for both indexing and storing purposes) and then added to the index. The DateField#toInternal method is the one which is attempting to parse the string into a date and failing when the field is created. There is another option. You could create a class which extends DateField and overrides toInternal(String) to do the conversion. You can specify this class in the schema.xml instead of DateField. -- Regards, Shalin Shekhar Mangar.
OutofMemory on Highlightling
Hi, I am analyzing the memory usage for my Solr setup. I am testing with 500 text documents of 2 MB each. I have defined a field for displaying the teasers and storing 1 MB of text in it. I am testing with just 128 MB maxHeap(I know I should be increasing it but just testing the worst case scenario). If I search for all 500 documents with row size as 500 and highlighting disabled, it works fine. But if I enable highlighting I get OutofMemoryError. Looks like stored field for all the matched results are read into the memory. How to avoid this memory consumption? Thanks, Siddharth
Re: Invalid_Date_String on posting XML to the index
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan mark.al...@ed.ac.uk wrote: Hi, thanks for your prompt reply. I'm a bit confused though - the only way to do this is a two-step process? I have to write code to munge the XML into another document which is exactly the same except for the format of the Date field, and then import that second file? Isn't that the whole purpose of having an analyzer with the solr.PatternReplaceFilterFactory filters? What's odd is that the pattern replacement works if I store the field as text but not as a date. Are you sure this isn't a bug? Analyzers are applied only for the indexed value but not the stored value. A value which is added to DateField is converted to the same internal format (for both indexing and storing purposes) and then added to the index. The DateField#toInternal method is the one which is attempting to parse the string into a date and failing when the field is created. There is another option. You could create a class which extends DateField and overrides toInternal(String) to do the conversion. You can specify this class in the schema.xml instead of DateField. -- Regards, Shalin Shekhar Mangar.
Re: using multisearcher
Thanks Hoss. I haven't had time to try it yet, but that is exactly the kind of help I was looking for. Brent Chris Hostetter wrote: : As for the second part, I was thinking of trying to replace the standard : SolrIndexSearcher with one that employs a MultiSearcher. But I'm not very : familiar with the workings of Solr, especially with respect to the caching : that goes on. I thought that maybe people who are more familiar with it might : have some tips on how to go about it. Or perhaps there are reasons that make : this a bad idea. If your indexes are all local, then using a MultiReader would be simpler trying to shoehorn MultiSearcher type logic into SolrIndexSearcher. https://issues.apache.org/jira/browse/SOLR-243 -Hoss -- Brent Palmer Widernet.org University of Iowa 319-335-2200
Stored Document encoding
I'm using the DataImportHandler and my database is in latin1. When i retreive documents that i have indexed in solr they seem to have been converted in utf-8. Is it normal ? Is it possible to store in latin1 in solr ? -- View this message in context: http://www.nabble.com/Stored-Document-encoding-tp23078724p23078724.html Sent from the Solr - User mailing list archive at Nabble.com.
DataImport, remove doc when marked as deleted
Hi I am new to Solr, but have been using Lucene for a while. I am trying to rewrite some old lucene indexing code using the Jdbc DataImport i Solr, my problem: I have Entities that can be marked in the db as deleted, these i don't want to index and thats no problem when doing a full-import. When doing a delta- import my deltaQuery will catch Entities that has been marked as deleted since last index, but how do i get it to delete those from the index ? I tried making the deltaImportQuery so that in don't return the Entity if its deleted, that didnt help... Any ideas ? Thanks Ruben
Re: solr 1.3 + tomcat 5.5
No there is no such file there. How can i configure more detailed error reporting for this message? 2009/4/15 Shalin Shekhar Mangar shalinman...@gmail.com: From the log it seems like there is a solr.xml inside var/lib/tomcat5/webapps/ which tomcat is trying deploy and failing. Very strange. You should remove that file and see if that fixes it. On Tue, Apr 14, 2009 at 11:35 PM, andrysha nihuhoid nihuh...@gmail.comwrote: Hi, got problem setting up solr + tomcat Tomcat5.5 + apache solr 1.3.0 + centos 5.3 I don't familiar with java at all, so sorry if it's dumb question. Here is what i did: placed solr.war in webapps folder changed solr home to /etc/solr copied contents of solr distribution example folder to /etc/solr tomcat starting successfully and i even can access admin interface but following error appears in catalina.out every 10 seconds: SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:14 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Apr 14, 2009 1:30:24 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:24 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Apr 14, 2009 1:30:34 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:34 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Apr 14, 2009 1:30:44 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:44 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Apr 14, 2009 1:30:54 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor var#lib#tomcat5#webapps#solr.xml Apr 14, 2009 1:30:54 PM org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor etc#solr#.xml Googled about 3 hours. tried to set allow write permissions for all to /etc, /etc/solr /var/ lib/tomcat5/webapps tried to create empty file named solr.xml in /etc, /etc/solr tried to copy solrconfig.xml to /etc/, /etc/solr -- Regards, Shalin Shekhar Mangar.
Re: OutofMemory on Highlightling
Hi, Have you tried: http://wiki.apache.org/solr/HighlightingParameters#head-2ca22f63cb8d1b2ba3ff0cfc05e85b94898c59cf Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Gargate, Siddharth sgarg...@ptc.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 6:33:46 AM Subject: OutofMemory on Highlightling Hi, I am analyzing the memory usage for my Solr setup. I am testing with 500 text documents of 2 MB each. I have defined a field for displaying the teasers and storing 1 MB of text in it. I am testing with just 128 MB maxHeap(I know I should be increasing it but just testing the worst case scenario). If I search for all 500 documents with row size as 500 and highlighting disabled, it works fine. But if I enable highlighting I get OutofMemoryError. Looks like stored field for all the matched results are read into the memory. How to avoid this memory consumption? Thanks, Siddharth
Re: Using CSV for indexing ... Remote Streaming disabled
Hi, Are you absolutely sure you are changing the correct config file? What is the 20090414_1 part in your URL? The name of the core? Be sure to change ITS config (you can get to it from Solr Admin page) and to restart Solr. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 3:34:42 AM Subject: Re: Using CSV for indexing ... Remote Streaming disabled Any help on this? Could this error be because of something else (not remote streaming issue)? Thanks. On Wed, Apr 15, 2009 at 10:04 AM, vivek sar wrote: Hi, I'm trying using CSV (Solr 1.4, 03/29) for indexing following wiki (http://wiki.apache.org/solr/UpdateCSV). I've updated the solrconfig.xml to have this lines, multipartUploadLimitInKB=20480 / ... startup=lazy / When I try to upload the csv, curl 'http://localhost:8080/solr/20090414_1/update/csv?commit=trueseparator=%09escape=%5cstream.file=/Users/opal/temp/afterchat/data/csv/1239759267339.csv' I get following response, HTTP Status 400 - Remote Streaming is disabled. type Status report message Remote Streaming is disabled. description The request sent by the client was syntactically incorrect (Remote Streaming is disabled.). Apache Tomcat/6.0.18 Why is it complaining about the remote streaming if it's already enabled? Is there anything I'm missing? Thanks, -vivek
Re: truncating indexed docs
Hi, No, you typically truncate (i.e. index first N terms) them while indexing using maxFieldLength setting in solrconfig.xml. You can, however, limit how many characters (or bytes?) to copy when using copyField functionality. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: CIF Search cifsea...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 2:44:22 AM Subject: truncating indexed docs Is it possible to truncate large documents once they are indexed? (Can this be done without re-indexing) Regards, CI
Re: Question on StreamingUpdateSolrServer
Hi, Lots of little things to look at here. You should do lsof as root, and it looks like you aren't doing that. You should double-check Tomcat's maxThreads param in server.xml. You should give Jetty a try. I don't think you said anything about looking at the container's or solr logs and finding errors. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Wednesday, April 15, 2009 7:28:57 PM Subject: Re: Question on StreamingUpdateSolrServer Thanks Otis. I did increase the number of file descriptors to 22K, but I still get this problem. I've noticed following so far, 1) As soon as I get to around 1140 index segments (this is total over multiple cores) I start seeing this problem. 2) When the problem starts occassionally the index request (solrserver.commit) also fails with the following error, java.net.SocketException: Connection reset 3) Whenever the commit fails, I'm able to access Solr by the browser (http://ets11.co.com/solr). If the commit is succssfull and going on I get blank page on Firefox. Even the telnet to 8080 fails with Connection closed by foreign host. It does seem like there is some resource issue as it happens only once we reach a breaking point (too many index segment files) - lsof at this point usually shows at 1400, but my ulimit is much higher than that. I already use compound format for index files. I can also run optimize occassionally (though not preferred as it blocks the whole index cycle for a long time). I do want to find out what resource limitation is causing this and it has to do something with when Indexer is committing the records where there are large number of segment files. Any other ideas? Thanks, -vivek On Wed, Apr 15, 2009 at 3:10 PM, Otis Gospodnetic wrote: One more thing. I don't think this was mentioned, but you can: - optimize your indices - use compound index format That will lower the number of open file handles. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar To: solr-user@lucene.apache.org Sent: Friday, April 10, 2009 5:59:37 PM Subject: Re: Question on StreamingUpdateSolrServer I also noticed that the Solr app has over 6000 file handles open - lsof | grep solr | wc -l - shows 6455 I've 10 cores (using multi-core) managed by the same Solr instance. As soon as start up the Tomcat the open file count goes up to 6400. Few questions, 1) Why is Solr holding on to all the segments from all the cores - is it because of auto-warmer? 2) How can I reduce the open file count? 3) Is there a way to stop the auto-warmer? 4) Could this be related to Tomcat returning blank page for every request? Any ideas? Thanks, -vivek On Fri, Apr 10, 2009 at 1:48 PM, vivek sar wrote: Hi, I was using CommonsHttpSolrServer for indexing, but having two threads writing (10K batches) at the same time was throwing, ProtocolException: Unbuffered entity enclosing request can not be repeated. I switched to StreamingUpdateSolrServer (using addBeans) and I don't see the problem anymore. The speed is very fast - getting around 25k/sec (single thread), but I'm facing another problem. When the indexer using StreamingUpdateSolrServer is running I'm not able to send any url request from browser to Solr web app. I just get blank page. I can't even get to the admin interface. I'm also not able to shutdown the Tomcat running the Solr webapp when the Indexer is running. I've to first stop the Indexer app and then stop the Tomcat. I don't have this problem when using CommonsHttpSolrServer. Here is how I'm creating it, server = new StreamingUpdateSolrServer(url, 1000,3); I simply call server.addBeans(...) on it. Is there anything else I need to do to make use of StreamingUpdateSolrServer? Why does Tomcat become unresponsive when Indexer using StreamingUpdateSolrServer is running (though, indexing happens fine)? Thanks, -vivek
Re: httpclient.ProtocolException using Solrj
I don't think you gain anything on the Solr end of things by using multiple threads if you are already using StreamingUpdateSolrServer. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, April 9, 2009 5:47:11 PM Subject: Re: httpclient.ProtocolException using Solrj Here is what I'm doing, SolrServer server = new StreamingUpdateSolrServer(url, 1000,5); server.addBeans(dataList); //where dataList is Listwith 10K elements I run two threads each using the same server object and then each call server.addBeans(...). I'm able to get 50K/sec inserted using that, but the commit after that (after 100k records) takes 70sec - which messes up the avg time. There are two problems here, 1) Once in a while I get connection reset error, Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) Note: if I use CommonsHttpSolrServer I get the buffer error. 2) The commit takes way too long for every 100k (I may commit more often if this can not be improved) I'm trying to fix this error problem which happens only if I run two threads both calling addBeans (10k at a time). One thread work fine. I'm not sure how can I use the MultiThreadedConnectionManager to create StreamingUpdateSolrServer and if they would help? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् : using a single request is the fatest http://wiki.apache.org/solr/Solrj#head-2046bbaba3759b6efd0e33e93f5502038c01ac65 I could index at the rate of 10,000 docs/sec using this and BinaryRequestWriter On Thu, Apr 9, 2009 at 10:36 PM, vivek sar wrote: I'm inserting 10K in a batch (using addBeans method). I read somewhere in the wiki that it's better to use the same instance of SolrServer for better performance. Would MultiThreadedConnectionManager help? How do I use it? I also wanted to know how can use EmbeddedSolrServer - does my app needs to be running in the same jvm with Solr webapp? Thanks, -vivek 2009/4/9 Noble Paul നോബിള് नोब्ळ् : how many documents are you inserting ? may be you can create multiple instances of CommonshttpSolrServer and upload in parallel On Thu, Apr 9, 2009 at 11:58 AM, vivek sar wrote: Thanks Shalin and Paul. I'm not using MultipartRequest. I do share the same SolrServer between two threads. I'm not using MultiThreadedHttpConnectionManager. I'm simply using CommonsHttpSolrServer to create the SolrServer. I've also tried StreamingUpdateSolrServer, which works much faster, but does throws connection reset exception once in a while. Do I need to use MultiThreadedHttpConnectionManager? I couldn't find anything on it on Wiki. I was also thinking of using EmbeddedSolrServer - in what case would I be able to use it? Does my application and the Solr web app need to run into the same JVM for this to work? How would I use the EmbeddedSolrServer? Thanks, -vivek On Wed, Apr 8, 2009 at 10:46 PM, Shalin Shekhar Mangar wrote: Vivek, do you share the same SolrServer instance between your two threads? If so, are you using the MultiThreadedHttpConnectionManager when creating the HttpClient instance? On Wed, Apr 8, 2009 at 10:13 PM, vivek sar wrote: single thread everything works fine. Two threads are fine too for a while and all the sudden problem starts happening. I tried indexing using REST services as well (instead of Solrj), but with that too I get following error after a while, 2009-04-08 10:04:08,126 ERROR [indexerThreadPool-5] Indexer - indexData()- Failed to index java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) at java.net.SocketOutputStream.write(SocketOutputStream.java:136) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) at java.io.FilterOutputStream.write(FilterOutputStream.java:80) at org.apache.commons.httpclient.methods.StringRequestEntity.writeRequest(StringRequestEntity.java:145) at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:499) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at
Re: Field Collapsing Patch
I know of a company that used it, but then determined it was this component that was slowing down their search. They might have modified it some, too, I don't recall now. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matthew Runo mr...@zappos.com To: solr-user@lucene.apache.org Sent: Wednesday, April 8, 2009 1:18:08 PM Subject: Field Collapsing Patch Hello folks - Is anyone using the Field Collapsing patch from SOLR-236 (https://issues.apache.org/jira/browse/SOLR-236) in their production environment? We're considering using it, but wanted to ensure it was at a point where it could be used before spending a lot of time on it. Any thoughts on the patch / issue? Any reasons not to use it? Thanks for your time! Matthew Runo Software Engineer, Zappos.com mr...@zappos.com - 702-943-7833
Re: hardware requirements for solr
Roman, This depends on multiple factors - amount of data, type of data/analysis, query rate and query complexity, etc. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Roman Dissertori r.dissert...@ecom.at To: solr-user@lucene.apache.org Sent: Wednesday, April 8, 2009 9:39:07 AM Subject: hardware requirements for solr Hello all, I want to use solr only(for performance reason) on a single server. What would be the minimum hardware requirements and what OS would you suggest for that? Thanks and regards, Roman Dissertori
Re: Solr Search Error
Hi, I'm using the Solr 1.4 (03/29 nightly build) and when searching on a large index (40G) I get the same exception as in this thread, HTTP Status 500 - 13724 java.lang.ArrayIndexOutOfBoundsException: 13724 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74) at org.apache.lucene.search.TermScorer.score(TermScorer.java:61) at org.apache.lucene.search.IndexSearcher.doSearch(IndexSearcher.java:262) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1072) at ... The search url is, http://think2.co.com:8080/solr/20090415_1/select/?q=japanversion=2.2start=0rows=10indent=on It would have millions of records matching this term, but I guess that shouldn't throw this exception. I saw a similar jira to ArrayOutOfBoundException, https://issues.apache.org/jira/browse/SOLR-450 (it's not the same though). I also see someone reported this same problem back in 2007 so I'm not sure whether it's a real bug or some configuration issue, http://www.nabble.com/ArrayIndexOutOfBoundsException-on-TermScorer-td11750899.html#a11750899 Any ideas? Thanks, -vivek On Fri, Mar 27, 2009 at 10:11 AM, Narayanan, Karthikeyan karthikeyan.naraya...@gs.com wrote: Hi Otis, Thanks for the recommendation. Will try with latest nightly build.. I did couple of full data import and got this error at few times while searching.. Thanks. Karthik -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Friday, March 27, 2009 12:57 PM To: solr-user@lucene.apache.org Subject: Re: Solr Search Error Hi Karthik, First thing I'd do is get the latest Solr nightly build. If that doesn't fix thing, I'd grab the latest Lucene nightly build and use it to replace Lucene jars that are in your version of Solr. If that doesn't work I'd email the ML with a bit more info about the type of search that causes this (e.g. Do all searches cause this or only some? What do those that trigger this error look like or have in common?) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Narayanan, Karthikeyan karthikeyan.naraya...@gs.com To: solr-user@lucene.apache.org Sent: Friday, March 27, 2009 11:42:12 AM Subject: Solr Search Error Hi All, I am intermittently getting this Exception when I do the search. What could be the reason?. Caused by: org.apache.solr.common.SolrException: 11938 java.lang.ArrayIndexOutOfBoundsException: 11938 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74) at org.apache.lucene.search.TermScorer.score(TermScorer.java:61) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:137) at org.apache.lucene.search.Searcher.search(Searcher.java:126) at org.apache.lucene.search.Searcher.search(Searcher.java:105) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher. java:966) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.j ava:838) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:2 69) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent. java:160) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:169) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:210) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:174) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(Authenticator Base.java:433) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:1 51) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:87 0) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.proc essConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint
Re: Question on StreamingUpdateSolrServer
On Wed, Apr 15, 2009 at 7:28 PM, vivek sar vivex...@gmail.com wrote: lsof at this point usually shows at 1400, but my ulimit is much higher than that. Could you be hitting a kernel limit? cat /proc/sys/fs/file-max cat /proc/sys/fs/file-nr http://www.netadmintools.com/art295.html -Yonik http://www.lucidimagination.com
Authentication Error
Hi,I have followed the procedure given on this blog to setup the solr Below is my code. I am trying to index the data but I am not able to connect to server and getting authentication error. HttpClient client=new HttpClient(); client.getState().setCredentials(new AuthScope(localhost, 80, AuthScope.ANY_SCHEME), new UsernamePasswordCredentials(admin, admin)); Can you please let me know what may be the problem. The other problem which I am facing is using Load Banlancing SolrServer lbHttpSolrServer = new LBHttpSolrServer( http://localhost:8080/solr,http://localhost:8983/solr;); Now the problem is the first server is down then I will get an error. If I swap the server in constructor by giving port 8983 server as first and 8080 as second it works fine. The thing Problem is If only the last server which is set is active and the rest of other are down then Solr throws and exception and search is not performed. Regards, Allahbaksh
Re: DataImport, remove doc when marked as deleted
did you try the deletedPkQuery? On Thu, Apr 16, 2009 at 7:49 PM, Ruben Chadien ruben.chad...@aspiro.com wrote: Hi I am new to Solr, but have been using Lucene for a while. I am trying to rewrite some old lucene indexing code using the Jdbc DataImport i Solr, my problem: I have Entities that can be marked in the db as deleted, these i don't want to index and thats no problem when doing a full-import. When doing a delta-import my deltaQuery will catch Entities that has been marked as deleted since last index, but how do i get it to delete those from the index ? I tried making the deltaImportQuery so that in don't return the Entity if its deleted, that didnt help... Any ideas ? Thanks Ruben -- --Noble Paul
Re: Authentication Error
On Thu, Apr 16, 2009 at 10:34 PM, Allahbaksh Asadullah allahbaks...@gmail.com wrote: Hi,I have followed the procedure given on this blog to setup the solr Below is my code. I am trying to index the data but I am not able to connect to server and getting authentication error. HttpClient client=new HttpClient(); client.getState().setCredentials(new AuthScope(localhost, 80, AuthScope.ANY_SCHEME), new UsernamePasswordCredentials(admin, admin)); Can you please let me know what may be the problem. The other problem which I am facing is using Load Banlancing SolrServer lbHttpSolrServer = new LBHttpSolrServer( http://localhost:8080/solr,http://localhost:8983/solr;); Now the problem is the first server is down then I will get an error. If I swap the server in constructor by giving port 8983 server as first and 8080 as second it works fine. The thing Problem is If only the last server which is set is active and the rest of other are down then Solr throws and exception and search is not performed. I shall write a testcase and let you know Regards, Allahbaksh -- --Noble Paul
Re: Stored Document encoding
I guess strings are stored by lucene in utf-8 always. BTW As you pass the Object as a String the encoding is lost On Thu, Apr 16, 2009 at 7:37 PM, AlexxelA alexandre.boudrea...@canoe.ca wrote: I'm using the DataImportHandler and my database is in latin1. When i retreive documents that i have indexed in solr they seem to have been converted in utf-8. Is it normal ? Is it possible to store in latin1 in solr ? -- View this message in context: http://www.nabble.com/Stored-Document-encoding-tp23078724p23078724.html Sent from the Solr - User mailing list archive at Nabble.com. -- --Noble Paul
Re: What is QTime a measure of?
Not sure if you got the answer - QTime represents the number of milliseconds it took Solr to execute a search. It does not include the time it takes to send back the response (that depends on its size, network speed...) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Andrew McCombe eupe...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, April 6, 2009 7:08:45 AM Subject: What is QTime a measure of? Hi Just started using Solr/Lucene and am getting to grips with it. Great product! What is the QTime a measure of? is it milliseconds, seconds? I tried a Google search but couldn't fins anything definitive. Thanks In Advance Andrew McCombe
Re: Phrase Query Issue
Let me second this. People ask for this pretty often. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Erik Hatcher e...@ehatchersolutions.com To: solr-user@lucene.apache.org Sent: Saturday, April 4, 2009 8:33:46 PM Subject: Re: Phrase Query Issue On Apr 4, 2009, at 1:25 AM, dabboo wrote: Erik, Thanks a lot for your reply. I have made some changes in the solr code and now field clauses are working fine with dismax request. Not only this, wildcard characters are also working with dismax and q query parameter. If you want I can share modified code with you. That'd be good to share. Simply open a Solr JIRA issue with this enhancement request and post your code there. Test cases and documentation always appreciated too, but working code to start with is fine. Erik
Re: Spelling Component
Hi, It looks like your spellchecker index did get created (doesn't it get created automatically when Solr starts?), but it looks rather empty. :) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Anoop Atre anoop.a...@mnsu.edu To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Friday, April 3, 2009 3:46:10 PM Subject: Re: Spelling Component Shalin, I think I did build the spellcheck index, I made the changes to solrconfig and schema, restarted, passed a spellcheck.build=true which created the index. ls -ltr ./spellchecker -rw-r--r-- 1 XXX users 20 2009-04-03 13:23 segments.gen -rw-r--r-- 1 XXX users 28 2009-04-03 13:23 segments_f Hmm...how would I know if the word is in the index? As for the threshold do you mean reduce the 0.7 entry in solrconfig? Thanks! Shalin Shekhar Mangar wrote: On Sat, Apr 4, 2009 at 12:01 AM, Anoop Atre wrote: I still don't get any suggestions when I do /spellCheckCompRH?q=helultrasharspellcheck=truespellcheck.collate=true Did you build the spellcheck index? Try specifying a correct word which you know is in the index. See if spellchecker returns it. If it does, then it might be that no suggestions are available or there are no suggestions above the configured threshold.
Garbage Collectors
I have an issue with garbage collection on our solr servers. We have an issue where the old generation never gets cleaned up on one of our servers. This server has a little over 2 million records which are updated every hour or so. I have tried the parallel GC and the concurrent GC. The parallel seems more stable for us, but both end up running out of memory. I have increased the memory allocated to the servers, but this just seems to delay the problem. My question is, what are the suggested options for using the parallel GC. Currently we are using something of this nature: -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy -XX:+UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m -XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr I am new to solr and GC tuning, so any advice is appreciated.
Boosting by facets with standard query
I have a query that yields results binned in several facets. How can I boost the results that fall in certain facets over the rest of them that do not belong to those facets? I use the standard query format. Thank you - ashok -- View this message in context: http://www.nabble.com/Boosting-by-facets-with-standard-query-tp23084860p23084860.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Garbage Collectors
Personally, I'd start from scratch: -Xmx -Xms... -server is not even needed any more. If you are not using Java 1.6, I suggest you do. Next, I'd try to investigate why objects are not being cleaned up - this should not be happening in the first place. Is Solr the only webapp running? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Baker dav...@mate1inc.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 3:33:18 PM Subject: Garbage Collectors I have an issue with garbage collection on our solr servers. We have an issue where the old generation never gets cleaned up on one of our servers. This server has a little over 2 million records which are updated every hour or so. I have tried the parallel GC and the concurrent GC. The parallel seems more stable for us, but both end up running out of memory. I have increased the memory allocated to the servers, but this just seems to delay the problem. My question is, what are the suggested options for using the parallel GC. Currently we are using something of this nature: -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy -XX:+UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m -XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr I am new to solr and GC tuning, so any advice is appreciated.
Re: NPE creating EmbeddedSolrServer
This worked great. Thanks! The only catch is you have to (eventually) call CoreContainer.shutdown(), otherwise the app just hangs. Alexandre Rafalovitch wrote: To reply to my own message. The following worked starting from scratch (example): SolrConfig solrConfig = new SolrConfig( D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm\\solr, solrconfig.xml, null); IndexSchema indexSchema = new IndexSchema( solrConfig, schema.xml, null); CoreContainer container = new CoreContainer(new SolrResourceLoader(SolrResourceLoader.locateInstanceDir())); CoreDescriptor dcore = new CoreDescriptor(container, , solrConfig.getResourceLoader().getInstanceDir()); dcore.setConfigName(solrConfig.getResourceName()); dcore.setSchemaName(indexSchema.getResourceName()); SolrCore core = new SolrCore( null, D:\\Projects\\FutureTerm\\apache-solr-1.3.0\\futureterm\\solr\\data, solrConfig, indexSchema, dcore); container.register(, core, false); SolrServer server = new EmbeddedSolrServer(container, ); Not sure I get the magical sequence yet, but maybe it will save somebody else half a day. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ Research group: http://www.clt.mq.edu.au/Research/ On Tue, Mar 17, 2009 at 6:22 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: Hello, I am trying to create a basic single-core embedded Solr instance. I figured out how to setup a single core instance and got (I believe) all files in right places. However, I am unable to run trivial code without exception: -- View this message in context: http://www.nabble.com/NPE-creating-EmbeddedSolrServer-tp22569143p23086774.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Non-linear structure for search and index documents
: I need index/search words extracted from pdf files with coordinates and page : number, so I have this structure: : :- index the document id :- a document has many pages :- a page has many words :- a word has geometry[w,h,x,y](inside of page) : : Is this possible with solr? : If yes, how the best way to do that? Is using field collapsing? it's possible, but Solr doesn't currenlty have any features that make it *easy*. the main things you have to ask yourself, before deciding what the best way approach this problem is, are: what do i want to be ableto do with this data? if you need to search for docs where dog appears inside a certain x1,y1,x2,y2 box then you have to structure your index much differntly then if you just need to find all docs containing dog and then as part or your result get the w,h,x,y coordinates for each instance of the word. The main Lucene feature that's probably going to be at the core of any work like this is Payloads ... but there'sgoing to be a signifficant about of java coding needed to take advantage of it in any of the ways i can think of that you might be wanting. -Hoss
Re: Dictionary lookup possibilities
: For instance, my dictionary holds the following terms: : 1 - a b c d : 2 - c d e : 3 - a b : 4 - a e f g h : : If I put the sentence [a b c d f g h] in as a query, I want to recieve : dictionary items 1 (matching all words a b c d) and 3 (matching words a b) : as matches this is a pretty hard problem in general ... in my mind i call it the longest matching sub-phrase problem, but i have no idea if it has a real name. the only solution i know of using Lucene is to construct a phrase query for each of the sub phrases, giving a bigger query boost to the longer phrases ... but it might be possible to design a customer query impl for solving this problem. (i've never had an important enough use case to dedicate a significant amount of time to figuring it out) -Hoss
Seattle / PNW Hadoop + Lucene User Group?
Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle area? I can donate some facilities, etc. -- I also always have topics to speak about :) Cheers, Bradford
Re: special characters in Solr search query.
: the special characters but the issue is while the document which I am : going to index contains any of these special characters it is throwing : query parse exception. Can anyone give pointer over this? Thanks in your question is kind of vauge ... for instance: it seems like you are saying that you get query parse exceptions when you try to *index* a document containing one of these characters ... which would be really odd. can you give some specifics about what exactly it is you are doing? ... either the literal Solr URLs that you are GETing manually, or the Code that you are using to talk to solr? -Hoss
Advice on moving from 1.3 to 1.4-dev or trunk?
Hello, I'm using solr 1.3 with solr.py. We have a basic schema.xml, nothing custom or out of the ordinary. I need the following the feature from http://svn.apache.org/repos/asf/lucene/solr/trunk/CHANGES.txt SOLR-911: Add support for multi-select faceting by allowing filters to be tagged and facet commands to exclude certain filters. This patch also added the ability to change the output key for facets in the response, and optimized distributed faceting refinement by lowering parsing overhead and by making requests and responses smaller. Since this requires 1.4, looks like I have to upgrade (or roll my own solution that this feature provides.) I'm looking for a bit of advice. I have looked through the bugs here http://issues.apache.org/jira/browse/SOLR/fixforversion/12313351 1. I would need to get the source for 1.4 and build it, right? No release yet, eh? 2. Any one using 1.4 in production without issue; is this wise? Or should I wait? 3. Will I need to make changes to my schema.xml to support my current field set under 1.4? 4. Do I need to reindex all my data? thanks gene
Re: Solr posts xml
: I installed Solr on tomcat 6 and whenever I click search it displays the xml : like I am editing it? : : is that normal? I'm afraid i don't really understand your question ... if you mean you get an XML formated response when you click the Search button on the admin screen, then yes -- that is normal. by default SOlr returns results in XML. other output formats (like json, etc..) are supported or you can use an XSLT to transform the XML to HTML or anything else you might want. -Hoss
Re: Garbage Collectors
If you're using java 5 or 6 jmap is a useful tool in tracking down memory leaks. http://java.sun.com/javase/6/docs/technotes/tools/share/jmap.html jmap -histo:live pid will print a histogram of all live objects in the heap. Start at the top and work your way down until you find something suspicious -- the trick is in knowing what is suspicious of course. -Bryan On Apr 16, 2009, at Apr 16, 3:40 PM, David Baker wrote: Otis Gospodnetic wrote: Personally, I'd start from scratch: -Xmx -Xms... -server is not even needed any more. If you are not using Java 1.6, I suggest you do. Next, I'd try to investigate why objects are not being cleaned up - this should not be happening in the first place. Is Solr the only webapp running? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Baker dav...@mate1inc.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 3:33:18 PM Subject: Garbage Collectors I have an issue with garbage collection on our solr servers. We have an issue where the old generation never gets cleaned up on one of our servers. This server has a little over 2 million records which are updated every hour or so. I have tried the parallel GC and the concurrent GC. The parallel seems more stable for us, but both end up running out of memory. I have increased the memory allocated to the servers, but this just seems to delay the problem. My question is, what are the suggested options for using the parallel GC. Currently we are using something of this nature: -server -Xmx4096m -Xms512m -XX:+UseAdaptiveSizePolicy -XX: +UseParallelOldGC -XX:GCTimeRatio=19 -XX:NewSize=128m - XX:SurvivorRatio=2 -Dsolr.solr.home=/usr/local/solr-tomcat-fi/solr I am new to solr and GC tuning, so any advice is appreciated. Thanks for the reply, yes, solr is the only app running under this tomcat server. I will remove -server, and other options except the heap allocation options and see how it performs. Any suggestions on how to go about finding out why objects are not being cleaned up if these changes dont work?
Re: Boosting by facets with standard query
On Fri, Apr 17, 2009 at 1:03 AM, ashokc ash...@qualcomm.com wrote: I have a query that yields results binned in several facets. How can I boost the results that fall in certain facets over the rest of them that do not belong to those facets? I use the standard query format. Thank you I'm not sure what you mean by boosting by facet. Do you mean that you want to boost documents which match a term query? If yes, you can use your_field_name:value^2.0 in the q parameter. -- Regards, Shalin Shekhar Mangar.
Re: Dictionary lookup possibilities
On Fri, Apr 17, 2009 at 3:37 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: this is a pretty hard problem in general ... in my mind i call it the longest matching sub-phrase problem, but i have no idea if it has a real name. the only solution i know of using Lucene is to construct a phrase query for each of the sub phrases, giving a bigger query boost to the longer phrases ... but it might be possible to design a customer query impl for solving this problem. There was an issue opened for something similar but there is not patch yet. https://issues.apache.org/jira/browse/SOLR-633 -- Regards, Shalin Shekhar Mangar.
Faceted Search
Hi all, Can someone of you tell me how to implement a faceted search? Thanks, Regards, Sajith Vimukthi Weerakoon.
The facetd Search
Hi all, I am developing a search tool and it uses solr as the key querying technique. At the moment I have got a very much stable version and I need to enhance the application by introducing a faceted search. I went through the documentation and did some modifications to my code. I could not get anything out of it done. Is there a configuration involved? How can I implement the faceted search? Thanks, Regards, Sajith Vimukthi Weerakoon.
Re: The facetd Search
On Fri, Apr 17, 2009 at 9:58 AM, Sajith Weerakoon saji...@zone24x7.comwrote: Hi all, I am developing a search tool and it uses solr as the key querying technique. At the moment I have got a very much stable version and I need to enhance the application by introducing a faceted search. I went through the documentation and did some modifications to my code. I could not get anything out of it done. Is there a configuration involved? How can I implement the faceted search? What were the modifications that you did? What was not working? Also see: http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr http://wiki.apache.org/solr/SimpleFacetParameters -- Regards, Shalin Shekhar Mangar.