Re: field collapse using 'adjacent' 'includeCollapsedDocs' + 'sort' query field
Hi Martijn, Thanks for your insight of collapsedDocs, and what I need to modify if I need the functionality I want. Michael Martijn v Groningen wrote: Hi Micheal, What you are saying seems logical, but that is currently not the case with the collapsedDocs functionality. This functionality was build with computing aggregated statistics in mind and not really to have a separate collapse group search result. Although the collapsed documents are collected in the order the appear in the search result (only if collapsetype is adjacent) they are not saved in the order they appear. If you really need to have the collapse group search result in the order they were collapsed you need to tweak the code. What you can do is change the CollapsedDocumentCollapseCollector class in the DocumentFieldsCollapseCollectorFactory.java source file. Currently the document ids are stored inside a OpenBitSet per collapse group. You can change that into an ArrayListInteger for example. In this way the order in where the documents were collapsed is preserved. I think the downside of this change will be to increase of memory usage. OpenBitSet is memory wise more efficient then an ArrayList of integers. I think that this will only be a real problem when the collapse groups become very large. I hope this will answer your question. Martijn 2009/11/14 michael8 mich...@saracatech.com: Hi, This almost seems like a bug, but I can't be sure so I'm seeking confirmation. Basically I am building a site that presents search results in reverse chronologically order. I am also leveraging the field collapse feature so that I can group results using 'adjacent' mode and have solr return the collapsed results as well via 'includeCollapsedDocs'. My collapsing field is a custom grouping_id that I have specified. What I'm noticing is that, my search results are coming back in the correct order by descending time (via 'sort' param in the main query) as expected. However, the results returned within the 'collapsedDocs' section via 'includeCollapsedDocs' are not in the same descending time order. My question is, shouldn't the collapsedDocs results also be in the same 'sort' order and key I have specified in the overall query, particularly since 'adjacent' mode is enabled, and that would mean results that are 'adjacent' in the sort order of the results. I'm using Solr 1.4.0 + field collapse patch as of 10/27/2009 Thanks, Michael -- View this message in context: http://old.nabble.com/field-collapse-using-%27adjacent%27---%27includeCollapsedDocs%27-%2B-%27sort%27-query-field-tp26351840p26351840.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/field-collapse-%27includeCollapsedDocs%27-doesn%27t-return-results-within-%27collapsedDocs%27-in-%27sort%27-order-specified-tp26351840p26360433.html Sent from the Solr - User mailing list archive at Nabble.com.
solr stops running periodically
We have 4 machines running solr. On one of the machines, every 2-3 days solr stops running. By that I mean that the java/tomcat process just disappears. If I look at the catalina logs, I see normal log entries and then nothing. There is no shutdown messages like you would normally see if you sent a SIGTERM to the process. Obviously this is a problem. I''m new to solr/java so if there are more diagnostic things I can do I'd appreciate any tips/advice. thanks in advance Athir
Re: Spell check suggestion and correct way of implementation and some Questions
On Wed, Nov 4, 2009 at 12:31 AM, darniz rnizamud...@edmunds.com wrote: Thanks i included the buildOncommit and buildOnOptimize as true and indexed some documents and it automatically builds the dictionary. Are there any performance issues we should be aware of, with this approach. Well, it depends. Each commit/optimize will re-create the spell check index with those options. So, it is best if you test it out with your index, queries and load. -- Regards, Shalin Shekhar Mangar.
Newbie tips: migrating from mysql fulltext search / PHP integration
Hi, I am looking for alternatives to MySQL fulltext searches. The combo Lucene/Solr is one of my options and I'd like to gather as much information I can before choosing and even build a prototype. My current need does not seem to be different. - fast response time (currently some searches can take more than 11sec) - API to add/update/delete documents to the collection - way to add synonymous or similar words for misspelled ones (ex. Sony = Soni) - way to define relevance of results (ex. If I search for LCD return products that belong to the LCD category, contains LCD in the product definition or ara marked as special offer) I know that I may have to add external code, for example, to take the results and apply some business logic to resort the results but I'd like to know, besides the wiki and the solr 1.4 Enterprise Seacrh Server book (which I am considering to buy) the tips for solr usage.
[OT] Webinar on spatial search using Lucene and Solr
From Here to There, You Can Find it Anywhere: Building Local/Geo-Search with Apache Lucene and Solr Join us for a free webinar hosted by TechTarget / TheServerSide.com Wednesday, November 18th 2009 10:00 AM PST / 1:00 PM EST Click here to sign up http://theserversidecom.bitpipe.com/detail/RES/1257457967_42.htmlamp;asrc=CL_PRM_Lucid_11_18_09_camp;li=252934 With new advances in the flexibility and customizability of Apache Lucene/Solr open source search, the ubiquity of location-aware devices and vast amounts of spatial data, tremendous opportunities open up to deliver more powerful and effective geo-aware search results. We'll hear from Grant Ingersoll, co-founder of Lucid Imagination and chairman of the Apache Lucene PMC, for an in-depth technical workshop on the potential and application of the newly released Lucene and Solr geo-search functions. Grant will be joined by thought leaders: Ryan McKinley, co-founder of Voyager GIS and Apache Lucene PMC member; and Sameer Maggon, of ATT Interactive, which manages and delivers online and mobile advertising products across ATT's media platforms. Features and benefits of using spatial data in a search engine Representing and leveraging spatial data in Lucene to empower Local Search Spatial search in action, a peek at Voyager GIS, a tool to index and search geographic data How ATT Interactive uses Solr/Lucene to power local search at YP.com Click here to sign up http://theserversidecom.bitpipe.com/detail/RES/1257457967_42.htmlamp;asrc=CL_PRM_Lucid_11_18_09_camp;li=252934 About the presenters: Grant Ingersoll Co-founder of Lucid Imagination Grant Ingersoll, co-founder of Lucid Imagination, is a published expert in search and Natural Language Processing, with many articles published on Lucene, Solr, findability, relevance, and is co-founder of the Apache Mahout machine learning project. Grant's the author of the forthcoming book Taming Text, from Manning publications. Ryan McKinley Co-founder of Voyager GIS Ryan McKinley, co-founder of Voyager GIS, works with technology to help find, share, and distribute information. He has built many sites using solr, including: ludb.clui.org andwww.digitalcommonwealth.org. He was a partner at Squid Labs and co-founded www.instructables.com. Ryan is a member of Lucid Imagination's Technical Advisory Board. Sameer Maggon ATT Interactive Sameer Maggon leads the Search Engineering Team at ATT Interactive. He helped the company launch YP.com that uses Solr underneath. Before joining ATT Interactive, he worked with Siderean (http://www.siderean.com) working on an enterprise search and navigation product that used Lucene and was ultimately responsible for delivering the technology behind their new product. Sameer has been been an active Lucene user since 2001.
Re: solr stops running periodically
Have you looked in other logs, like your syslogs? I've never seen Solr/Tomcat just disappear w/o so much as a blip. I'd think if a process just died from an error condition there would be some note of it somewhere. I'd try to find some other events taking place at that time which might give a hint. On Nov 15, 2009, at 1:45 PM, athir nuaimi wrote: We have 4 machines running solr. On one of the machines, every 2-3 days solr stops running. By that I mean that the java/tomcat process just disappears. If I look at the catalina logs, I see normal log entries and then nothing. There is no shutdown messages like you would normally see if you sent a SIGTERM to the process. Obviously this is a problem. I''m new to solr/java so if there are more diagnostic things I can do I'd appreciate any tips/advice. thanks in advance Athir
RE: Segment file not found error - after replicating
Yes. We have tried Solr 1.4 and so far its been great success. Still I am investigating why Solr 1.3 gave an issue like before. Currently seems to me org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to figure out correct segment file name. (May be index replication issue -- leading to not fully replicated.. but its so hard to believe as both master and slave are having 100% same data now!) Anyway.. will keep on trying till I find something useful.. and will let you know. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, 11 November 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It sounds like your index is not being fully replicated. I can't tell why, but I can suggest you try the new Solr 1.4 replication. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, November 10, 2009 5:42:44 PM Subject: RE: Segment file not found error - after replicating Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and restart the Solr web-app from Tomcat manager, this resolves. But it's just a work around, never good enough for the production deployments. My next plan is to do a remote debug to see what exactly happening in the code. Any other things I should looking at? Any help is really appreciated on this matter. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 1:14 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Madu, So are you saying that all slaves have the exact same index, and that index is exactly the same as the one on the master, yet only some of those slaves exhibit this error, while others do not? Mind listing index directories of 1) master 2) slave without errors, 3) slave with errors and doing: du -s /path/to/index/on/master du -s /path/to/index/on/slave/without/errors du -s /path/to/index/on/slave/with/errors Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Mon, November 9, 2009 7:47:04 PM Subject: RE: Segment file not found error - after replicating Thanks Otis! Yes, I checked the index directories and they are 100% same, both timestamp and size wise. Not all the slaves face this issue. I would say roughly 50% has this trouble. Logs do not have any errors too :-( Any other things I should do/look at? Cheers Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 9:26 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It's hard to troubleshoot blindly like this, but have you tried manually comparing the contents of the index dir on the master and on the slave(s)? If they are out of sync, have you tried forcing of replication to see if one of the subsequent replication attempts gets the dirs in sync? Do you have more than 1 slave and do they all start having this problem at the same time? Any errors in the logs for any of the scripts involved in replication in 1.3? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Sun, November 8, 2009 10:30:44 PM Subject: Segment file not found error - after replicating Hi guys, We use Solr 1.3 for indexing large amounts of data (50G avg) on Linux environment and use the replication scripts to make replicas those live in load balancing slaves. The issue we face quite often (only in Linux servers) is that they tend to not been able to find the segment file (segment_x etc) after the replicating completed. As this has become quite common, we started hitting a serious issue. Below is a stack trace, if that helps and any help on this matter is greatly appreciated. Nov 5, 2009 11:34:46 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created /admin/: org.apache.solr.handler.admin.AdminHandlers
RE: Segment file not found error - after replicating
Just found out the root cause: * The segments.gen file does not get replicated to slave all the time. For some reason, this small (20bytes) file lives in memory and does not get updated to the master's hard disk. Therefore it is not obviously transferred to slaves. Solution was to shut down the master web app (must be a clean shut down!, not kill of Tomcat). Then do the replication. Also, if the timestamp/size (size won't change anyway!) is not changed, Rsync does not seem to copy over this file too. So enforcing in the replication scripts solved the problem. Thanks Otis and everyone for all your support! Madu -Original Message- From: Maduranga Kannangara Sent: Monday, 16 November 2009 12:37 PM To: solr-user@lucene.apache.org Subject: RE: Segment file not found error - after replicating Yes. We have tried Solr 1.4 and so far its been great success. Still I am investigating why Solr 1.3 gave an issue like before. Currently seems to me org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to figure out correct segment file name. (May be index replication issue -- leading to not fully replicated.. but its so hard to believe as both master and slave are having 100% same data now!) Anyway.. will keep on trying till I find something useful.. and will let you know. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, 11 November 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It sounds like your index is not being fully replicated. I can't tell why, but I can suggest you try the new Solr 1.4 replication. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, November 10, 2009 5:42:44 PM Subject: RE: Segment file not found error - after replicating Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and restart the Solr web-app from Tomcat manager, this resolves. But it's just a work around, never good enough for the production deployments. My next plan is to do a remote debug to see what exactly happening in the code. Any other things I should looking at? Any help is really appreciated on this matter. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 1:14 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Madu, So are you saying that all slaves have the exact same index, and that index is exactly the same as the one on the master, yet only some of those slaves exhibit this error, while others do not? Mind listing index directories of 1) master 2) slave without errors, 3) slave with errors and doing: du -s /path/to/index/on/master du -s /path/to/index/on/slave/without/errors du -s /path/to/index/on/slave/with/errors Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Mon, November 9, 2009 7:47:04 PM Subject: RE: Segment file not found error - after replicating Thanks Otis! Yes, I checked the index directories and they are 100% same, both timestamp and size wise. Not all the slaves face this issue. I would say roughly 50% has this trouble. Logs do not have any errors too :-( Any other things I should do/look at? Cheers Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 9:26 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It's hard to troubleshoot blindly like this, but have you tried manually comparing the contents of the index dir on the master and on the slave(s)? If they are out of sync, have you tried forcing of replication to see if one of the subsequent replication attempts gets the dirs in sync? Do you have more than 1 slave and do they all start having this problem at the same time? Any errors in the logs for any of the scripts involved in replication in 1.3? Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Sun, November 8,
Re: Segment file not found error - after replicating
Thats odd - that file is normally not used - its a backup method to figure out the current generation in case it cannot be determined with a directory listing - its basically for NFS. Maduranga Kannangara wrote: Just found out the root cause: * The segments.gen file does not get replicated to slave all the time. For some reason, this small (20bytes) file lives in memory and does not get updated to the master's hard disk. Therefore it is not obviously transferred to slaves. Solution was to shut down the master web app (must be a clean shut down!, not kill of Tomcat). Then do the replication. Also, if the timestamp/size (size won't change anyway!) is not changed, Rsync does not seem to copy over this file too. So enforcing in the replication scripts solved the problem. Thanks Otis and everyone for all your support! Madu -Original Message- From: Maduranga Kannangara Sent: Monday, 16 November 2009 12:37 PM To: solr-user@lucene.apache.org Subject: RE: Segment file not found error - after replicating Yes. We have tried Solr 1.4 and so far its been great success. Still I am investigating why Solr 1.3 gave an issue like before. Currently seems to me org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to figure out correct segment file name. (May be index replication issue -- leading to not fully replicated.. but its so hard to believe as both master and slave are having 100% same data now!) Anyway.. will keep on trying till I find something useful.. and will let you know. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, 11 November 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It sounds like your index is not being fully replicated. I can't tell why, but I can suggest you try the new Solr 1.4 replication. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, November 10, 2009 5:42:44 PM Subject: RE: Segment file not found error - after replicating Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and restart the Solr web-app from Tomcat manager, this resolves. But it's just a work around, never good enough for the production deployments. My next plan is to do a remote debug to see what exactly happening in the code. Any other things I should looking at? Any help is really appreciated on this matter. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 1:14 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Madu, So are you saying that all slaves have the exact same index, and that index is exactly the same as the one on the master, yet only some of those slaves exhibit this error, while others do not? Mind listing index directories of 1) master 2) slave without errors, 3) slave with errors and doing: du -s /path/to/index/on/master du -s /path/to/index/on/slave/without/errors du -s /path/to/index/on/slave/with/errors Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Mon, November 9, 2009 7:47:04 PM Subject: RE: Segment file not found error - after replicating Thanks Otis! Yes, I checked the index directories and they are 100% same, both timestamp and size wise. Not all the slaves face this issue. I would say roughly 50% has this trouble. Logs do not have any errors too :-( Any other things I should do/look at? Cheers Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 9:26 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It's hard to troubleshoot blindly like this, but have you tried manually comparing the contents of the index dir on the master and on the slave(s)? If they are out of sync, have you tried forcing of replication to see if one of the subsequent replication attempts gets the dirs in sync? Do you have more than 1 slave and do they all start having this problem at the same time? Any errors in the logs for any of the scripts involved in replication in 1.3? Otis --
RE: Segment file not found error - after replicating
Yes, I too believed so.. The logic in earlier said method does the gen number calculation using segment files available (genA) and using segment.gen file content (genB). Which ever larger, would be the gen number used to look up for segment file. When the file is not properly replicated (due to that is not being written to hard disk, or rsync ed) and segment gen number in the segment.gen file (genB) is larger than the file based calculation (genA) we hit the pre-said issue. Cheers Madu -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, 16 November 2009 2:19 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Thats odd - that file is normally not used - its a backup method to figure out the current generation in case it cannot be determined with a directory listing - its basically for NFS. Maduranga Kannangara wrote: Just found out the root cause: * The segments.gen file does not get replicated to slave all the time. For some reason, this small (20bytes) file lives in memory and does not get updated to the master's hard disk. Therefore it is not obviously transferred to slaves. Solution was to shut down the master web app (must be a clean shut down!, not kill of Tomcat). Then do the replication. Also, if the timestamp/size (size won't change anyway!) is not changed, Rsync does not seem to copy over this file too. So enforcing in the replication scripts solved the problem. Thanks Otis and everyone for all your support! Madu -Original Message- From: Maduranga Kannangara Sent: Monday, 16 November 2009 12:37 PM To: solr-user@lucene.apache.org Subject: RE: Segment file not found error - after replicating Yes. We have tried Solr 1.4 and so far its been great success. Still I am investigating why Solr 1.3 gave an issue like before. Currently seems to me org.apache.lucene.index.SegmentInfos.FindSegmentFile.run() is not able to figure out correct segment file name. (May be index replication issue -- leading to not fully replicated.. but its so hard to believe as both master and slave are having 100% same data now!) Anyway.. will keep on trying till I find something useful.. and will let you know. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Wednesday, 11 November 2009 10:03 AM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating It sounds like your index is not being fully replicated. I can't tell why, but I can suggest you try the new Solr 1.4 replication. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara mkannang...@infomedia.com.au To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Tue, November 10, 2009 5:42:44 PM Subject: RE: Segment file not found error - after replicating Thanks Otis, I did the du -s for all three index directories as you said right after replicating and when I find errors. All three gave me the exact same value. This time I found the error in a rather small index too (31Mb). BTW, if I copy the segment_x file to what Solr is looking for, and restart the Solr web-app from Tomcat manager, this resolves. But it's just a work around, never good enough for the production deployments. My next plan is to do a remote debug to see what exactly happening in the code. Any other things I should looking at? Any help is really appreciated on this matter. Thanks Madu -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, 10 November 2009 1:14 PM To: solr-user@lucene.apache.org Subject: Re: Segment file not found error - after replicating Madu, So are you saying that all slaves have the exact same index, and that index is exactly the same as the one on the master, yet only some of those slaves exhibit this error, while others do not? Mind listing index directories of 1) master 2) slave without errors, 3) slave with errors and doing: du -s /path/to/index/on/master du -s /path/to/index/on/slave/without/errors du -s /path/to/index/on/slave/with/errors Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Maduranga Kannangara To: solr-user@lucene.apache.org Sent: Mon, November 9, 2009 7:47:04 PM Subject: RE: Segment file not found error - after replicating Thanks Otis! Yes, I checked the index directories and they are 100% same, both timestamp and size wise. Not all the slaves face this issue. I would say roughly 50% has this trouble. Logs do not have any errors too :-( Any other things I should do/look at? Cheers Madu -Original Message-
Re: Newbie Solr questions
Take a look at the example schema - you can have dynamic fields that are used based on wildcard matching to the field name if a field doesn't mtach the name of an existing field. -Peter On Sun, Nov 15, 2009 at 10:50 AM, yz5od2 woods5242-outdo...@yahoo.com wrote: Thanks for the reply: I follow the schema.xml concept, but what if my requirement is more dynamic in nature? I.E. I would like my developers to be able to annotate a POJO and submit it to the Solr server (embedded) to be indexed according to public properties OR annotations. Is that possible? If that is not possible, can I programatically define documents and fields (and the field options) in straight Java? I.E. in pseudo code below... // this is made up but this is what I would like to be able to do SolrDoc document = new SolrDoc(); SolrField field = new SolrField() field.isIndexed=true; field.isStored=true; field.name = 'myField' field.value = myPOJO.getValue(); solrServer.index(document); On Nov 15, 2009, at 12:50 AM, Avlesh Singh wrote: a) Since Solr is built on top of lucene, using SolrJ, can I still directly create custom documents, specify the field specifics etc (indexed, stored etc) and then map POJOs to those documents, simular to just using the straight lucene API? b) I took a quick look at the SolrJ javadocs but did not see anything in there that allowed me to customize if a field is stored, indexed, not indexed etc. How do I do that with SolrJ without having to go directly to the lucene apis? c) The SolrJ beans package. By annotating a POJO with @Field, how exactly does SolrJ treat that field? Indexed/stored, or just indexed? Is there any other way to control this? The answer to all your questions above is the magical file called schema.xml. For more read here - http://wiki.apache.org/solr/SchemaXml. SolrJ is simply a java client to access (read and update from) the solr server. c) If I create a custom index outside of Solr using straight lucene, is it easy to import a pre-exisiting lucene index into a Solr Server? As long as the Lucene index matches the definitions in your schema you can use the same index. The data however needs to copied into a predictable location inside SOLR_HOME. Cheers Avlesh On Sun, Nov 15, 2009 at 9:26 AM, yz5od2 woods5242-outdo...@yahoo.comwrote: Hi, I am new to Solr but fairly advanced with lucene. In the past I have created custom Lucene search engines that indexed objects in a Java application, so my background is coming from this requirement a) Since Solr is built on top of lucene, using SolrJ, can I still directly create custom documents, specify the field specifics etc (indexed, stored etc) and then map POJOs to those documents, simular to just using the straight lucene API? b) I took a quick look at the SolrJ javadocs but did not see anything in there that allowed me to customize if a field is stored, indexed, not indexed etc. How do I do that with SolrJ without having to go directly to the lucene apis? c) The SolrJ beans package. By annotating a POJO with @Field, how exactly does SolrJ treat that field? Indexed/stored, or just indexed? Is there any other way to control this? c) If I create a custom index outside of Solr using straight lucene, is it easy to import a pre-exisiting lucene index into a Solr Server? thanks! -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Newbie Solr questions
ok, so what I am hearing, there is no way to create custom documents/ fields via the SolrJ client @ runtime. Instead you have to use the schema.xml ahead of time OR create a custom index via the lucene APIs then import the indexes into Solr for searching? On Nov 15, 2009, at 9:16 PM, Peter Wolanin wrote: Take a look at the example schema - you can have dynamic fields that are used based on wildcard matching to the field name if a field doesn't mtach the name of an existing field. -Peter On Sun, Nov 15, 2009 at 10:50 AM, yz5od2 woods5242- outdo...@yahoo.com wrote: Thanks for the reply: I follow the schema.xml concept, but what if my requirement is more dynamic in nature? I.E. I would like my developers to be able to annotate a POJO and submit it to the Solr server (embedded) to be indexed according to public properties OR annotations. Is that possible? If that is not possible, can I programatically define documents and fields (and the field options) in straight Java? I.E. in pseudo code below... // this is made up but this is what I would like to be able to do SolrDoc document = new SolrDoc(); SolrField field = new SolrField() field.isIndexed=true; field.isStored=true; field.name = 'myField' field.value = myPOJO.getValue(); solrServer.index(document); On Nov 15, 2009, at 12:50 AM, Avlesh Singh wrote: a) Since Solr is built on top of lucene, using SolrJ, can I still directly create custom documents, specify the field specifics etc (indexed, stored etc) and then map POJOs to those documents, simular to just using the straight lucene API? b) I took a quick look at the SolrJ javadocs but did not see anything in there that allowed me to customize if a field is stored, indexed, not indexed etc. How do I do that with SolrJ without having to go directly to the lucene apis? c) The SolrJ beans package. By annotating a POJO with @Field, how exactly does SolrJ treat that field? Indexed/stored, or just indexed? Is there any other way to control this? The answer to all your questions above is the magical file called schema.xml. For more read here - http://wiki.apache.org/solr/SchemaXml . SolrJ is simply a java client to access (read and update from) the solr server. c) If I create a custom index outside of Solr using straight lucene, is it easy to import a pre-exisiting lucene index into a Solr Server? As long as the Lucene index matches the definitions in your schema you can use the same index. The data however needs to copied into a predictable location inside SOLR_HOME. Cheers Avlesh On Sun, Nov 15, 2009 at 9:26 AM, yz5od2 woods5242-outdo...@yahoo.comwrote: Hi, I am new to Solr but fairly advanced with lucene. In the past I have created custom Lucene search engines that indexed objects in a Java application, so my background is coming from this requirement a) Since Solr is built on top of lucene, using SolrJ, can I still directly create custom documents, specify the field specifics etc (indexed, stored etc) and then map POJOs to those documents, simular to just using the straight lucene API? b) I took a quick look at the SolrJ javadocs but did not see anything in there that allowed me to customize if a field is stored, indexed, not indexed etc. How do I do that with SolrJ without having to go directly to the lucene apis? c) The SolrJ beans package. By annotating a POJO with @Field, how exactly does SolrJ treat that field? Indexed/stored, or just indexed? Is there any other way to control this? c) If I create a custom index outside of Solr using straight lucene, is it easy to import a pre-exisiting lucene index into a Solr Server? thanks! -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: solr stops running periodically
Look for the HotSpot dump files that Sun's Java leaves on disk when it dies. I think their names start with hs. Luckily, I don't have any of them handy to tell you the exact name pattern. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Grant Ingersoll gsing...@apache.org To: solr-user@lucene.apache.org Sent: Sun, November 15, 2009 8:15:47 PM Subject: Re: solr stops running periodically Have you looked in other logs, like your syslogs? I've never seen Solr/Tomcat just disappear w/o so much as a blip. I'd think if a process just died from an error condition there would be some note of it somewhere. I'd try to find some other events taking place at that time which might give a hint. On Nov 15, 2009, at 1:45 PM, athir nuaimi wrote: We have 4 machines running solr. On one of the machines, every 2-3 days solr stops running. By that I mean that the java/tomcat process just disappears. If I look at the catalina logs, I see normal log entries and then nothing. There is no shutdown messages like you would normally see if you sent a SIGTERM to the process. Obviously this is a problem. I''m new to solr/java so if there are more diagnostic things I can do I'd appreciate any tips/advice. thanks in advance Athir
Re: Newbie tips: migrating from mysql fulltext search / PHP integration
Hi, I'm not sure if you have a specific question there. But regarding PHP integration part, I just learned PHP now has native Solr (1.3 and 1.4) support: http://twitter.com/otisg/status/5757184282 Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: mbneto mbn...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, November 15, 2009 4:56:15 PM Subject: Newbie tips: migrating from mysql fulltext search / PHP integration Hi, I am looking for alternatives to MySQL fulltext searches. The combo Lucene/Solr is one of my options and I'd like to gather as much information I can before choosing and even build a prototype. My current need does not seem to be different. - fast response time (currently some searches can take more than 11sec) - API to add/update/delete documents to the collection - way to add synonymous or similar words for misspelled ones (ex. Sony = Soni) - way to define relevance of results (ex. If I search for LCD return products that belong to the LCD category, contains LCD in the product definition or ara marked as special offer) I know that I may have to add external code, for example, to take the results and apply some business logic to resort the results but I'd like to know, besides the wiki and the solr 1.4 Enterprise Seacrh Server book (which I am considering to buy) the tips for solr usage.
Re: Is there a way to skip cache for a query
I don't think that is supported today. It might be useful, though (e.g. something I'd use with an external monitoring service, so that it doesn't always get fast results from the cache). Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Bertie Shen bertie.s...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, November 14, 2009 9:43:25 PM Subject: Is there a way to skip cache for a query Hey, I do not want to disable cache completely by changing the setting in solrconfig.xml. I just want to sometimes skip cache for a query for testing purpose. So is there a parameter like skipcache=true to specify in select/?q=hotversion=2.2start=0rows=10skipcache=true to skip cache for the query [hot]. skipcache can by default be false. Thanks.
Re: converting over from sphinx
Something doesn't sound right here. Why do you need wildcards for queries in the first place? Are you finding that with stopword removal and stemming you are not matching some docs that you think should be matched? If so, we may be able to help if you provide a few examples. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Cory Ondrejka cory.ondre...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, November 14, 2009 12:57:56 PM Subject: converting over from sphinx I've been using Sphinx for full text search, but since I want to move my project over to Heroku, need to switch to Solr. Everything's up and running using the acts_as_solr plugin, but I'm curious if I'm using Solr the right way. In particular, I'm doing phrase searching into a corpus of descriptions, such as I need help with a foo where I have a bunch of foo: a foo is a subset of a bar often used to create briznatzes, etc. With Sphinx, I could convert I need help with a foo into *need* *help* *with* *foo* and get pretty nice matches. With Solr, my understanding is that you can only do wildcard matches on the suffix. In addition, stemming only happens on non-wildcard terms. So, my first thought would be to convert I need help with a foo into need need* help help* with with* foo foo*. Thanks in advance for any help. -- Cory Ondrejka cory.ondre...@gmail.com http://ondrejka.net/
RE: Is there a way to skip cache for a query
See https://issues.apache.org/jira/browse/SOLR-1363 -- it's currently scheduled for 1.5. Jake -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Sunday, November 15, 2009 11:17 PM To: solr-user@lucene.apache.org Subject: Re: Is there a way to skip cache for a query I don't think that is supported today. It might be useful, though (e.g. something I'd use with an external monitoring service, so that it doesn't always get fast results from the cache). Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Bertie Shen bertie.s...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, November 14, 2009 9:43:25 PM Subject: Is there a way to skip cache for a query Hey, I do not want to disable cache completely by changing the setting in solrconfig.xml. I just want to sometimes skip cache for a query for testing purpose. So is there a parameter like skipcache=true to specify in select/?q=hotversion=2.2start=0rows=10skipcache=true to skip cache for the query [hot]. skipcache can by default be false. Thanks.
Re: Some guide about setting up local/geo search at solr
Nota bene: My understanding is the external versions of Local Lucene/Solr are eventually going to be deprecated in favour of what we have in contrib. Here's a stub page with a link to the spatial JIRA issue: http://wiki.apache.org/solr/SpatialSearch Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Bertie Shen bertie.s...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, November 14, 2009 3:32:01 AM Subject: Some guide about setting up local/geo search at solr Hey, I spent some times figuring out how to set up local/geo/spatial search at solr. I hope the following description can help given the current status. 1) Download localsolr. I download it from http://developer.k-int.com/m2snapshots/localsolr/localsolr/1.5/ and put jar file (in my case, localsolr-1.5.jar) in your application's WEB_INF/lib directory of application server. 2) Download locallucene. I download it from http://sourceforge.net/projects/locallucene/ and put jar file (in my case, locallucene.jar in locallucene_r2.0/dist/ diectory) in your application's WEB_INF/lib directory of application server. I also need to copy gt2-referencing-2.3.1.jar, geoapi-nogenerics-2.1-M2.jar, and jsr108-0.01.jar under locallucene_r2.0/lib/ directory to WEB_INF/lib. Do not copy lucene-spatial-2.9.1.jar under Lucene codebase. The namespace has been changed from com.pjaol.blah.blah.blah to org.apache.blah blah. 3) Update your solrconfig.xml and schema.xml. I copy it from http://www.gissearch.com/localsolr. 4) Restart application server and try a query /solr/select?qt=geolat=xx.xxlong=yy.yyq=abcradius=zz.
Re: exclude some fields from copying dynamic fields | schema.xml
Thanks for response Defining field is not working :( Is there any way to stop copy task for particular set of values Thanks ~Vikrant Lance Norskog-2 wrote: There is no direct way. Let's say you have a nocopy_s and you do not want a copy nocopy_str_s. This might work: declare nocopy_str_s as a field and make it not indexed and not stored. I don't know if this will work. It requires two overrides to work: 1) that declaring a field name that matches a wildcard will override the default wildcard rule, and 2) that stored=false indexed=false works. On Fri, Nov 13, 2009 at 3:23 AM, Vicky_Dev vikrantv_shirbh...@yahoo.co.in wrote: Hi, we are using the following entry in schema.xml to make a copy of one type of dynamic field to another : copyField source=*_s dest=*_str_s / Is it possible to exclude some fields from copying. We are using Solr1.3 ~Vikrant -- View this message in context: http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26335109.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- View this message in context: http://old.nabble.com/exclude-some-fields-from-copying-dynamic-fields-%7C-schema.xml-tp26335109p26367099.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: javabin in .NET?
For a client the marshal() part is not important.unmarshal() is probably all you need On Sun, Nov 15, 2009 at 12:36 AM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Original code is here: http://bit.ly/hkCbI I just started porting it here: http://bit.ly/37hiOs It needs: tests/debugging, porting NamedList, SolrDocument, SolrDocumentList Thanks for any help! Cheers, Mauricio 2009/11/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com OK. Is there anyone trying it out? where is this code ? I can try to help .. On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: I meant the standard IO libraries. They are different enough that the code has to be manually ported. There were some automated tools back when Microsoft introduced .Net, but IIRC they never really worked. Anyway it's not a big deal, it should be a straightforward job. Testing it thoroughly cross-platform is another thing though. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: javabin in .NET?
start with a JavabinDecoder only so that the class is simple to start with. 2009/11/16 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: For a client the marshal() part is not important.unmarshal() is probably all you need On Sun, Nov 15, 2009 at 12:36 AM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Original code is here: http://bit.ly/hkCbI I just started porting it here: http://bit.ly/37hiOs It needs: tests/debugging, porting NamedList, SolrDocument, SolrDocumentList Thanks for any help! Cheers, Mauricio 2009/11/14 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com OK. Is there anyone trying it out? where is this code ? I can try to help .. On Fri, Nov 13, 2009 at 8:10 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: I meant the standard IO libraries. They are different enough that the code has to be manually ported. There were some automated tools back when Microsoft introduced .Net, but IIRC they never really worked. Anyway it's not a big deal, it should be a straightforward job. Testing it thoroughly cross-platform is another thing though. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com The javabin format does not have many dependencies. it may have 3-4 classes an that is it. On Fri, Nov 13, 2009 at 6:05 PM, Mauricio Scheffer mauricioschef...@gmail.com wrote: Nope. It has to be manually ported. Not so much because of the language itself but because of differences in the libraries. 2009/11/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Is there any tool to directly port java to .Net? then we can etxract out the client part of the javabin code and convert it. On Thu, Nov 12, 2009 at 9:56 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Has anyone looked into using the javabin response format from .NET (instead of SolrJ)? It's mainly a curiosity. How much better could performance/bandwidth/throughput be? How difficult would it be to implement some .NET code (C#, I'd guess being the best choice) to handle this response format? Thanks, Erik -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr date and string search problem
Hi Lance Norskog , Thanks for your reply. Let me first put the config files details. These are the fields i have defined fieldType class=solr.TextField name=alphaOnlySort omitNorms=true sortMissingLast=true analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter catenateAll=0 catenateNumbers=0 catenateWords=0 class=solr.WordDelimiterFilterFactory generateNumberParts=1 generateWordParts=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType fieldType class=solr.TextField name=alphaOnlySortFacet omitNorms=true sortMissingLast=true analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType fieldType class=solr.TextField name=specialFacet omitNorms=true sortMissingLast=true analyzer type=query tokenizer class=solr.CommaTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory/ /analyzer /fieldType /types field indexed=true multiValued=true name=text stored=false type=text/ field indexed=true name=id required=true stored=true type=string/ field indexed=true name=status stored=true type=alphaOnlySortFacet/ field indexed=false name=noofViews stored=true type=integer/ field indexed=true name=uploadedBy stored=true type=text/ field indexed=true name=uploadedOn stored=true type=date/ field indexed=true name=popularity stored=true type=float/ field indexed=true name=Plant stored=true type=specialFacet/ field indexed=true name=PlantSearch stored=true type=alphaOnlySort/ field indexed=true name=Geography stored=true type=alphaOnlySortFacet/ field indexed=true name=GeographySearch stored=true type=alphaOnlySort/ field indexed=true name=Region stored=true type=alphaOnlySortFacet/ field indexed=true name=RegionSearch stored=true type=alphaOnlySort/ field indexed=true name=Country stored=true type=alphaOnlySortFacet/ field indexed=true name=CountrySearch stored=true type=alphaOnlySort/ field indexed=true name=BusUnit stored=true type=specialFacet/ field indexed=true name=BusUnitSearch stored=true type=alphaOnlySort/ field indexed=true name=BusinessFunction stored=true type=alphaOnlySortFacet/ field indexed=true name=BusinessFunctionSearch stored=true type=alphaOnlySort/ field indexed=true name=Functionality stored=true type=alphaOnlySortFacet/ field indexed=true name=FunctionalitySearch stored=true type=alphaOnlySort/ field indexed=true name=Businessprocesses stored=true type=text/ field indexed=true name=UploadedDate stored=true type=date/ and this is my requestHandler configuration requestHandler class=solr.DisMaxRequestHandler name=dismaxRelAndPop lst name=defaults str name=echoParamsexplicit/str float name=tie0.01/float str name=qfPlantSearch^1 GeographySearch^1 RegionSearch^1 CountrySearch^1 BusUnitSearch^1 BusinessFunctionSearch^1 Businessprocesses^1 LifecycleStatus^1 ApplicationNature^1 UploadedDate^1 /str str name=pfPlantSearch^1 GeographySearch^1 RegionSearch^1 CountrySearch^1 BusUnitSearch^1 BusinessFunctionSearch^1 Businessprocesses^1 LifecycleStatus^1 ApplicationNature^1 UploadedDate^1 /str str name=fl*,score/str str name=bf ord(popularity)^0.5 recip(rord(popularity),1,1000,1000)^0.3 /str str name=q.alt*:*/str str name=mm 10lt;50% /str /lst /requestHandler and this is the query thats been fired. facet.limit=-1rows=10start=0facet=truefacet.mincount=1facet.field=Geographyfacet.field=Countryfacet.field=Functionalityfacet.field=BusinessFunctionfacet.field=BusUnitfacet.field=Regionfacet.field=PGServiceManagerfacet.field=AppNamefacet.field=Plantfacet.field=statusq=Behaviorfacet.sort=true i clearly understand where the problem is happening , but dont know how to resolve it . i have defined UploadedDate as date field and i have defined in my request handler to search in UploadedDate field also ( UploadedDate^1 .) but what happens is every query that is been fired is converted to date and it throws me an error. if i remove UploadedDate from request handler it works fine. so i dont know how to have some tring fields and some date fields together co exist in a request handler ?? and according to the given query solr should filter it out in all the fields and should give me the result back . is there an way to do tat?? sorry for a such a long repsone :) thanks --- Ashok Lance Norskog-2 wrote: This line is the key: SEVERE: org.apache.solr.core.SolrException: Invalid Date String:'Behavior' at org.apache.solr.schema.DateField.toInternal(DateField.java:108) at The string 'Behavior' is being parsed as a date, and fails. Your query is attempting to find this as a date. Please post your query.
Re: Newbie tips: migrating from mysql fulltext search / PHP integration
On Mon, Nov 16, 2009 at 12:34 AM, Mattmann, Chris A (388J) chris.a.mattm...@jpl.nasa.gov wrote: WOW, +1!! Great job, PHP! Cheers, Chris On 11/15/09 10:13 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I'm not sure if you have a specific question there. But regarding PHP integration part, I just learned PHP now has native Solr (1.3 and 1.4) support: http://twitter.com/otisg/status/5757184282 Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: mbneto mbn...@gmail.com To: solr-user@lucene.apache.org Sent: Sun, November 15, 2009 4:56:15 PM Subject: Newbie tips: migrating from mysql fulltext search / PHP integration Hi, I am looking for alternatives to MySQL fulltext searches. The combo Lucene/Solr is one of my options and I'd like to gather as much information I can before choosing and even build a prototype. My current need does not seem to be different. - fast response time (currently some searches can take more than 11sec) - API to add/update/delete documents to the collection - way to add synonymous or similar words for misspelled ones (ex. Sony = Soni) - way to define relevance of results (ex. If I search for LCD return products that belong to the LCD category, contains LCD in the product definition or ara marked as special offer) I know that I may have to add external code, for example, to take the results and apply some business logic to resort the results but I'd like to know, besides the wiki and the solr 1.4 Enterprise Seacrh Server book (which I am considering to buy) the tips for solr usage. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/http://sunset.usc.edu/%7Emattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ Hi, There is native support for Solr in PHP but currently you have to build it as a PECL extension. It is currently not bundled with the PHP source yet but it is down loadable from the PECL project homepage http://pecl.php.net/package/solr If you currently have pecl support built into your php installation you can install it by running the following command pecl install solr-beta Some usage examples are available here http://us3.php.net/manual/en/solr.examples.php More details are available here http://www.php.net/manual/en/book.solr.php I use Solr with PHP 5.2 - In PHP, the SolrClient class has methods to add, update, delete and rollback changes to the index made since the last commit. - There are also built-in tools in Solr that allow you to analyze and modify the data before indexing it and when searching for it. - with Solr you can define synonyms (check the wiki for more details) - Solr also allows you to sort by score (relevance) - You can specify the fields that you want either as (optional, required or prohibited) My last two points could take care of your last requirement. Solr is awesome and most of the search I perform return sub-second response times. Its several hundred folds easier and more efficient than MySQL fulltext. believe me. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.