Phrase matching on a text field
Hi, I'm trying to figure out why phrase matching on a text field only works some of the time. I have a SOLR index containing a document titled FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT. The FOR seems to be causing a problem... The title field is indexed as both s_title and t_title (string and text, as defined in the demo schema), thus: field name=title type=string indexed=false stored=false multiValued=false / field name=s_title type=string indexed=true stored=true multiValued=false / field name=t_title type=text indexed=true stored=false multiValued=false / copyField source=title dest=s_title / copyField source=title dest=t_title / I can match the document with an exact query on the string: q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT I can match the document with this phrase query on the text: q=t_title:future directions which uses the parsedquery shown by debugQuery=true: str name=rawquerystringt_title:future directions/str str name=querystringt_title:future directions/str str name=parsedqueryPhraseQuery(t_title:futur direct)/str str name=parsedquery_toStringt_title:futur direct/str Similarly, I can match the document with this query: q=t_title:integrated catchment which uses the parsedquery shown by debugQuery=true: str name=rawquerystringt_title:integrated catchment/str str name=querystringt_title:integrated catchment/str str name=parsedqueryPhraseQuery(t_title:integr catchment)/str str name=parsedquery_toStringt_title:integr catchment/str But I can not match the document with the query: q=t_title:future directions for integrated catchment which uses the phrase query shown by debugQuery=true: str name=rawquerystring t_title:future directions for integrated catchment/str str name=querystring t_title:future directions for integrated catchment/str str name=parsedquery PhraseQuery(t_title:futur direct integr catchment)/str str name=parsedquery_toString t_title:futur direct integr catchment/str Any wisdom gratefully accepted. Cheers, -- Phil 640K ought to be enough for anybody. -- Bill Gates, in 1981
What are the Unicode encodings supported by Solr?
Hi, I'd like to know about the different Unicode[/any other?] encodings supported by Solr for posting docs [thru Solrj in my case]. Is it that just UTF-8, UCN supported or other character encodings like NCR(decimal), NCR(hex) etc are supported as well? Now the problem is that during automating the crawling and indexing process for Solr I found that for most of the pages the encoding is UTF-8[in this case searching works fine] but for others the encoding is some other character encoding[like NCR(dec), NCR(hex) or might be something else, don't have much idea on this]. So when I fetch the page content thru java methods using InputSteamReaders and after stripping various tags what I obtained is raw text with some encoding not getting supported by Solr. So either I've to confing Solr to support these other encodings as well[only if it is possible] otherwise convert whatever is the raw text to UTF-8 using some standard encoders[this solution seems better to me, provided I'm able to detect the encoding for input]. I'd like to know if there are some standard encoders are available for this purpose[must be right? din't google much]. Any advice on this is highly appreciated. An off-beat Q: In some of the pages I'm getting some \ufffd chars which I think is some sort of unmappable[by Java?] character, right?. Any idea on how to handle this? Just replacing with blank char will not do [this depends on the requirement, though]. Thanks, KK.
Dataimporthandler Timestamp Error ?
Hi, when I do a full import I get the following error : Caused by: java.sql.SQLException: Cannot convert value '-00-00 00:00:00' from column 10 to TIMESTAMP. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926) at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1321) at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:573) at com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6617) at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:5943) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4901) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4951) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:220) ... 11 more Caused by: java.sql.SQLException: Value '[...@14f9f4a' can not be represented as java.sql.Timestamp at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926) at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1027) ... 17 more But I thought the Timestamp is generated automatically and has nothing to do with my mysql database? best regards, Sebastian -- View this message in context: http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23422139.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dataimporthandler Timestamp Error ?
you may need to change the mysql connection parameters so that it does not throw error for null date jdbc:mysql://localhost/test?zeroDateTimeBehavior=convertToNull On Thu, May 7, 2009 at 1:39 PM, gateway0 reiterwo...@yahoo.de wrote: Hi, when I do a full import I get the following error : Caused by: java.sql.SQLException: Cannot convert value '-00-00 00:00:00' from column 10 to TIMESTAMP. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926) at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1321) at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:573) at com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6617) at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:5943) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4901) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4951) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:220) ... 11 more Caused by: java.sql.SQLException: Value '[...@14f9f4a' can not be represented as java.sql.Timestamp at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926) at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1027) ... 17 more But I thought the Timestamp is generated automatically and has nothing to do with my mysql database? best regards, Sebastian -- View this message in context: http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23422139.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Creating new QParserPlugin
Hi! I agree that Solr is difficult to extend in many cases. We just patch Solr, and I guess many other users patch it too. What I propose is to create some Solr-community site (Solr incubator?) to public patches there, and Solr core team could then look there and choose patches to apply to the Solr codebase. I know that one can use Jira for that, but it's not convinient to use it in this way. On Thu, May 7, 2009 at 2:41 AM, KaktuChakarabati jimmoe...@gmail.comwrote: Hello everyone, I am trying to write a new QParserPlugin+QParser, one that will work similar to how DisMax does, but will give me more control over the FunctionQuery-related part of the query processing (e.g in regards to a specified bf parameter). In specific, I want to be able to affect the way the queryNorm (and possibly other factors) interact with a pre-computed value I store in a static field (i.e I compute an index-time score for a document that I wish to use in a bf as a ValueSource, without being affected by queryNorm or other such extranous considerations.) While trying this, I notice I run alot into cases where some parts I try to override/inherit from are private to a java package namespace, and this makes the whole thing very cumbersome. Examples for this are the DismaxQParser class which is defined as a local class inside the DisMaxQParserPlugin.java file (i think this is bad practice - otherwise, FunctionQParserPlugin/FunctionQParser do have their own seperate files, so i think this is a good convention to follow generally). Another case is where i try to inherit from FunctionQParser and end up not being able to replicate some of the parse() logic, because it uses the QueryParsing.StrParser class which is a static inner class and so is only accessible from the solr.search namespace. In short, many such cases seem to arise and i think this poses a considerable limitation on the possibilities of extending solr. If this resonates with more people here, I'd take this issue up with solr-dev. Otherwise, if some of you have some notions about going about what i'm trying to do differently, I would be happy to hear. Thanks, -Chak -- View this message in context: http://www.nabble.com/Creating-new-QParserPlugin-tp23416974p23416974.html Sent from the Solr - User mailing list archive at Nabble.com. -- Andrew Klochkov
Re: Dataimporthandler Timestamp Error ?
Awesome, thanks!!! I first thought it could be blob-field related. Have a nice day Sebastian Noble Paul നോബിള് नोब्ळ्-2 wrote: you may need to change the mysql connection parameters so that it does not throw error for null date jdbc:mysql://localhost/test?zeroDateTimeBehavior=convertToNull On Thu, May 7, 2009 at 1:39 PM, gateway0 reiterwo...@yahoo.de wrote: Hi, when I do a full import I get the following error : Caused by: java.sql.SQLException: Cannot convert value '-00-00 00:00:00' from column 10 to TIMESTAMP. at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926) at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1321) at com.mysql.jdbc.BufferRow.getTimestampFast(BufferRow.java:573) at com.mysql.jdbc.ResultSetImpl.getTimestampInternal(ResultSetImpl.java:6617) at com.mysql.jdbc.ResultSetImpl.getTimestamp(ResultSetImpl.java:5943) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4901) at com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4951) at org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.getARow(JdbcDataSource.java:220) ... 11 more Caused by: java.sql.SQLException: Value '[...@14f9f4a' can not be represented as java.sql.Timestamp at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1055) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:956) at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:926) at com.mysql.jdbc.ResultSetRow.getTimestampFast(ResultSetRow.java:1027) ... 17 more But I thought the Timestamp is generated automatically and has nothing to do with my mysql database? best regards, Sebastian -- View this message in context: http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23422139.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- View this message in context: http://www.nabble.com/Dataimporthandler---Timestamp-Error---tp23422139p23424354.html Sent from the Solr - User mailing list archive at Nabble.com.
French and SpellingQueryConverter
Hi I have tried to run the following code package org.apache.solr.spelling; import org.apache.lucene.analysis.fr.FrenchAnalyzer; public class Test { public static void main (String args[]) { SpellingQueryConverter sqc = new SpellingQueryConverter(); sqc.analyzer = new FrenchAnalyzer(); System.out.println(sqc.convert(français)); }; }}; I would expect to get [(français,0,8,type=ALPHANUM)] However I get [(fran,0,4,type=ALPHANUM), (ais,5,8,type=ALPHANUM)] Is there any issue with the support of special characters? Thanks Jonathan
Re: no subject aka Replication Stall
We have not pushed the fix into production yet. However, I am wondering two things. 1. If the download takes more than 10 seconds (our replication can take up to 90 seconds) will that be an issue 2. There are 3 patches, 2 have 2 line changes 1 has a large amount. Do we need the latest 2 or just the latest 1? -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Wed, 6 May 2009 10:05:49 +0530 To: solr-user@lucene.apache.org Subject: Re: no subject aka Replication Stall SOLR-1096
Re: Solr Plugins Simple Questions for the Simpleton
On May 6, 2009, at 3:25 PM, Jeff Newburn wrote: We are trying to implement a SearchCompnent plugin. I have been looking at QueryElevateComponent trying to weed through what needs to be done. My basic desire is to get the results back and manipulate them either by altering the actual results or the facets. Questions: 1. Do the components fire off in order or all individually? If so how does one chain them together? http://wiki.apache.org/solr/SearchComponent I apologize. This question was more looking for insight into how the requests are made. One more interesting question is what does each component get from the previous one? 2. Where are the actual documents returned (ie what object gets the return results)? Look on the ResponseBuilder object. I looked into the javadoc for this class and the description is as follows: This class is experimental and will be changing in the future. Are there any tips to point us in the right direction to use and manipulate this? Also does this class get passed from component to component? 3. Is there any specific place I should manipulate the result set? I've done it in the past right on the response docset/doclist, but I've seen others discourage this kind of thing b/c you might not know the downstream effects So does the doc list get passed down the chain in the responsebuilder? 4. Can the individual documents be changed before returning to the client? In what way? In a way that might manipulate what is returned. We have 2 potential avenues. 1. Change the document to remove some values out of a multivalued field or 2. Change the facets returned. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Upgrading from 1.2.0 to 1.3.0
this isn't advice on how to upgrade, but if you/your-project have a bit of time to wait, 1.4 sounds like it's getting close to an official releasefyi. cheers, rob On Tue, May 5, 2009 at 1:05 PM, Francis Yakin fya...@liquid.com wrote: What's the best way to upgrade solr from 1.2.0 to 1.3.0 ? We have the current index that our users search running on 1.2.0 Solr version. We would like to upgrade it to 1.3.0? We have Master/Slaves env. What's the best way to upgrade it without affecting the search? Do we need to do it on master or slaves first? Thanks Francis
Re: Solr autocompletion in rails
Thanks a lot for the information. But I am still a bit confused about the use of TermsComponents. Like where are we exactly going to put these codes in Solr.For example I changed schema.xml to add autocomplete feauture.I read your blog too, its very helpful.But still a little confused. :-(( Can you explain it a bit? Matt Weber-2 wrote: You will probably want to use the new TermsComponent in Solr 1.4. See http://wiki.apache.org/solr/TermsComponent . I just recently wrote a blog post about using autocompletion with TermsComponent, a servlet, and jQuery. You can probably follow these instructions, but instead of writing a servlet you can write a rails handler parsing the json output directly. http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ . Thanks, Matt Weber On May 4, 2009, at 9:39 AM, manisha_5 wrote: Hi, I am new to solr. I am using solr server to index the data and make search in a Ruby on rails project.I want to add autocompletion feature. I tried with the xml patch in the schema.xml file of solr, but dont know how to test if the feature is working.also havent been able to integrate the same in the Rails project that is using Solr.Can anyone please provide some help in this regards?? the patch of codes in Schema.xml is : fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=3 maxGramSize=15 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldType -- View this message in context: http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23372020.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23428267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: When should I optimize/
Great... thanks for the response! 2009/5/7 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com it is wise to optimize the index once in a while (daily may be). But it depends on how many commits you do in a day. Every commit causes fragmentation of index files and your search can become slow if you do not optimize it. But optimizing always is not recommended because it is time consuming and your replication (if it is a master slave setup) can take longer . if you do a delete all then do an optimize anyway On Wed, May 6, 2009 at 9:18 PM, Eric Sabourin eric.sabourin2...@gmail.com wrote: Is the optimize xml command something which is only required when I delete all the docs? Or should I also send the optimize command following other operations? or daily? Thanks... Eric -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Eric Sent from Halifax, NS, Canada
Is it possible to writing solr result on disk from the server side?
Do you know if it's possible to writing solr results directly on a hard disk from server side and not to use an HTTP connection to transfer the results? While the query time is very fast for solr, I want to do that, cause of the time taken during the transfer of the results between the client and the solr server when you have lot of 'rows'. For instance for 10'000 rows, the query time could be 50 ms and 19s to get the results from the server. As my client and server are on the same system, I could get the results faster directly on the hard disk (or better in a ram disk), is it possible configuring solr for that? Regards, -- View this message in context: http://www.nabble.com/Is-it-possible-to-writing-solr-result-on-disk-from-the-server-side--tp23428509p23428509.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: no subject aka Replication Stall
the patches have gone into the trunk. The latest patch should be the one if you wish to run a patched Solr. 10 secs readTimeout means that if there is no data coming from the other end for 10 secs, then the waiting thread returns throwing an exception. It is not the total time taken to read the entire data. At least that is what I observed while testing. BTW, if the timeout occurs it resumes from the point where the failure happened. It retries 5 times before giving up. On Thu, May 7, 2009 at 7:32 PM, Jeff Newburn jnewb...@zappos.com wrote: We have not pushed the fix into production yet. However, I am wondering two things. 1. If the download takes more than 10 seconds (our replication can take up to 90 seconds) will that be an issue 2. There are 3 patches, 2 have 2 line changes 1 has a large amount. Do we need the latest 2 or just the latest 1? -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Wed, 6 May 2009 10:05:49 +0530 To: solr-user@lucene.apache.org Subject: Re: no subject aka Replication Stall SOLR-1096 -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Is it possible to writing solr result on disk from the server side?
did you consider using an EmbeddedSolrServer? On Thu, May 7, 2009 at 8:25 PM, arno13 arnaud.gaudi...@healthonnet.org wrote: Do you know if it's possible to writing solr results directly on a hard disk from server side and not to use an HTTP connection to transfer the results? While the query time is very fast for solr, I want to do that, cause of the time taken during the transfer of the results between the client and the solr server when you have lot of 'rows'. For instance for 10'000 rows, the query time could be 50 ms and 19s to get the results from the server. As my client and server are on the same system, I could get the results faster directly on the hard disk (or better in a ram disk), is it possible configuring solr for that? Regards, -- View this message in context: http://www.nabble.com/Is-it-possible-to-writing-solr-result-on-disk-from-the-server-side--tp23428509p23428509.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: no subject aka Replication Stall
Excellent! Thank you I am going to start testing that. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Reply-To: solr-user@lucene.apache.org Date: Thu, 7 May 2009 20:26:02 +0530 To: solr-user@lucene.apache.org Subject: Re: no subject aka Replication Stall the patches have gone into the trunk. The latest patch should be the one if you wish to run a patched Solr. 10 secs readTimeout means that if there is no data coming from the other end for 10 secs, then the waiting thread returns throwing an exception. It is not the total time taken to read the entire data. At least that is what I observed while testing. BTW, if the timeout occurs it resumes from the point where the failure happened. It retries 5 times before giving up. On Thu, May 7, 2009 at 7:32 PM, Jeff Newburn jnewb...@zappos.com wrote: We have not pushed the fix into production yet. However, I am wondering two things. 1. If the download takes more than 10 seconds (our replication can take up to 90 seconds) will that be an issue 2. There are 3 patches, 2 have 2 line changes 1 has a large amount. Do we need the latest 2 or just the latest 1? -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Wed, 6 May 2009 10:05:49 +0530 To: solr-user@lucene.apache.org Subject: Re: no subject aka Replication Stall SOLR-1096 -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: large index vs multicore
Hi, and sorry for slightly hijacking the thread, On Mar 26, 2009, at 2:54 , Otis Gospodnetic wrote: Hi, Without knowing the details, I'd say keep it in the same index if the additional information shares some/enough fields with the main product data and separately if it's sufficiently distinct (this also means 2 queries and manual merging/joining). Where would this manual merging/joining occur? At the client-side or inside Solr, before returning the results ? I was wondering what relevancy, sorting, etc. would become. -- Nicolas Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Manepalli, Kalyan kalyan.manepa...@orbitz.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wednesday, March 25, 2009 5:46:40 PM Subject: large index vs multicore Hi All, In my project, I have one primary core containing all the basic information for a product. Now I need to add additional information which will be searched and displayed in conjunction with the product results. My question is - From design and query speed point of - should I add new core to handle the additional data or should I add the data to the existing core. The data size is not very large around 150,000 - 200,000 documents. Any insights into this will be helpful Thanks, Kalyan Manepalli -- Nicolas Pastorino Consultant - Trainer - System Developer Phone : +33 (0)4.78.37.01.34 eZ Systems ( Western Europe ) | http://ez.no
RE: When should I optimize/
We do optimize once a day at 1am. Ching-hsien Wang, Manager Library and Archives System Support Branch Office of Chief Information Officer Smithsonian Institution 202-633-5581(office) 202-312-2874(fax) wan...@si.edu Visit us online: www.siris.si.edu -Original Message- From: Eric Sabourin [mailto:eric.sabourin2...@gmail.com] Sent: Thursday, May 07, 2009 10:52 AM To: solr-user@lucene.apache.org Subject: Re: When should I optimize/ Great... thanks for the response! 2009/5/7 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com it is wise to optimize the index once in a while (daily may be). But it depends on how many commits you do in a day. Every commit causes fragmentation of index files and your search can become slow if you do not optimize it. But optimizing always is not recommended because it is time consuming and your replication (if it is a master slave setup) can take longer . if you do a delete all then do an optimize anyway On Wed, May 6, 2009 at 9:18 PM, Eric Sabourin eric.sabourin2...@gmail.com wrote: Is the optimize xml command something which is only required when I delete all the docs? Or should I also send the optimize command following other operations? or daily? Thanks... Eric -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Eric Sent from Halifax, NS, Canada
Re: Phrase matching on a text field
The string fieldtype is not being tokenized, while the text fieldtype is tokenized. So the stop word for is being removed by a stop word filter, which doesn't happen with the text field type (no tokenizing). Have a look at the schema.xml in the example dir and look at the default configuration for both the text and string fieldtypes. String string fieldtype is not analyzed whereas the text fieldtype has a number of different filters that take action. -Jay On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick p.chadw...@internode.on.netwrote: Hi, I'm trying to figure out why phrase matching on a text field only works some of the time. I have a SOLR index containing a document titled FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT. The FOR seems to be causing a problem... The title field is indexed as both s_title and t_title (string and text, as defined in the demo schema), thus: field name=title type=string indexed=false stored=false multiValued=false / field name=s_title type=string indexed=true stored=true multiValued=false / field name=t_title type=text indexed=true stored=false multiValued=false / copyField source=title dest=s_title / copyField source=title dest=t_title / I can match the document with an exact query on the string: q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT I can match the document with this phrase query on the text: q=t_title:future directions which uses the parsedquery shown by debugQuery=true: str name=rawquerystringt_title:future directions/str str name=querystringt_title:future directions/str str name=parsedqueryPhraseQuery(t_title:futur direct)/str str name=parsedquery_toStringt_title:futur direct/str Similarly, I can match the document with this query: q=t_title:integrated catchment which uses the parsedquery shown by debugQuery=true: str name=rawquerystringt_title:integrated catchment/str str name=querystringt_title:integrated catchment/str str name=parsedqueryPhraseQuery(t_title:integr catchment)/str str name=parsedquery_toStringt_title:integr catchment/str But I can not match the document with the query: q=t_title:future directions for integrated catchment which uses the phrase query shown by debugQuery=true: str name=rawquerystring t_title:future directions for integrated catchment/str str name=querystring t_title:future directions for integrated catchment/str str name=parsedquery PhraseQuery(t_title:futur direct integr catchment)/str str name=parsedquery_toString t_title:futur direct integr catchment/str Any wisdom gratefully accepted. Cheers, -- Phil 640K ought to be enough for anybody. -- Bill Gates, in 1981
Re: French and SpellingQueryConverter
It seems to me that this is just the expected behavior of the FrenchAnalyzer using the FrenchStemmer. I'm not familiar with the French language, but in English words like running, runner, and runs are all stemmed down to run as intended. I don't know what other words in French would stem down to franc, but wouldn't this be what you would want? If not, maybe experiment with some of the other Analyzers to see if they give you what you need. -Jay On Thu, May 7, 2009 at 6:51 AM, Jonathan Mamou ma...@il.ibm.com wrote: Hi I have tried to run the following code package org.apache.solr.spelling; import org.apache.lucene.analysis.fr.FrenchAnalyzer; public class Test { public static void main (String args[]) { SpellingQueryConverter sqc = new SpellingQueryConverter(); sqc.analyzer = new FrenchAnalyzer(); System.out.println(sqc.convert(français)); }; }}; I would expect to get [(français,0,8,type=ALPHANUM)] However I get [(fran,0,4,type=ALPHANUM), (ais,5,8,type=ALPHANUM)] Is there any issue with the support of special characters? Thanks Jonathan
RE: What are the Unicode encodings supported by Solr?
Hi KK, On 5/7/2009 at 2:55 AM, KK wrote: In some of the pages I'm getting some \ufffd chars which I think is some sort of unmappable[by Java?] character, right?. Any idea on how to handle this? Just replacing with blank char will not do [this depends on the requirement, though]. From http://www.unicode.org/charts/PDF/UFFF0.pdf: FFFD: REPLACEMENT CHARACTER: used to replace an incoming character whose value is unknown or unrepresentable in Unicode. Also, from http://www.unicode.org/versions/Unicode5.1.0/: Applications are free to use any of these noncharacter code points internally but should never attempt to exchange them. If a noncharacter is received in open interchange, an application is not required to interpret it in any way. It is good practice, however, to recognize it as a noncharacter and to take appropriate action, such as replacing it with U+FFFD REPLACEMENT CHARACTER, to indicate the problem in the text. It is not recommended to simply delete noncharacter code points from such text, because of the potential security issues caused by deleting uninterpreted characters. (See conformance clause C7 in Section 3.2, Conformance Requirements, and Unicode Technical Report #36, Unicode Security Considerations.) So if you're seeing \ufffd in text, you (or someone before you in the processing chain) attempted to convert the text from some other encoding into Unicode, but the encoding conversion failed (no target Unicode character corresponding to the source character). This can happen when attempting to convert from an incorrectly identified source encoding. Steve
Sorting by 'starts with'
I have an index of product names. I'd like to sort results so that entries starting with the user query come first. E.g. q=kitchen Results would sort something like: 1. kitchen appliance 2. kitchenaid dishwasher 3. fridge for kitchen It looks like using a query Function Query comes close, but I don't know how to write a subquery that only matches if the value starts with the query string. Has anyone solved a similar need? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Sorting-by-%27starts-with%27-tp23432815p23432815.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr spring application context error
I have configured solr using tomcat.Everything works fine.I overrode QParserPlugin and configured it.The overriden QParserPlugin has a dependency on another project say project1.So I made a jar of the project and copied the jar to the solr/home lib dir. the project1 project is using spring.It has a factory class which loads the beans.Iam using this factory calss in QParserPlugin to get a bean.When I start my tomcat the factory class is loading fine.But the problem is its not loading the beans.And Iam getting exception org.springframework.beans.factory.BeanDefinitionStoreException: IOException parsing XML document from class path resource [com/mypackage/applicationContext.xml]; nested exception is java.io.FileNotFoundException: class path resource [com/mypackage/applicationContext.xml] cannot be opened because it does not exist Do I need to do something else?. Can anybody please help me. Thanks, Raju -- View this message in context: http://www.nabble.com/Solr-spring-application-context-error-tp23432901p23432901.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solrcofig.xml - need some info
This is resolved.I solved this by reading solrPlugins on the solr wiki. Thanks, Raju Raju444us wrote: Hi Hoss, If i extend SolrQueryParser and override method getFieldQuery for some customization.Can I configure my new queryParser somthing like below requestHandler name=standard class=solr.MynewParser default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- /lst /requestHandler Do I need to place my new Parser class in solr/home/lib folder? Is this the right way to do this. Thanks, Raju hossman wrote: : I am pretty new to solr. I was wondering what is this mm attribute in : requestHandler in solrconfig.xml and how it works. Tried to search wiki : could not find it Hmmm... yeah wiki search does mid-word matching doesn't it? the key thng to realize is that the requestHandler you were looking at when you saw that option was the DisMaxRequestHandler... http://wiki.apache.org/solr/DisMaxRequestHandler -Hoss -- View this message in context: http://www.nabble.com/solrcofig.xml---need-some-info-tp15341858p23433477.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Control segment size
Thanks Otis. I did set the maxMergeDocs to 10M, but I still see couple of index files over 30G which do not match with max number of documents. Here are some numbers, 1) My total index size = 66GB 2) Number of total documents = 200M 3) 1M doc = 300MB 4) 10M doc should be roughly around 3-4GB. Under the index I see, -rw-r--r-- 1 dssearch staff 31771545312 May 6 14:15 _2tp.cfs -rw-r--r-- 1 dssearch staff 31932190573 May 7 08:13 _5ne.cfs -rw-r--r-- 1 dssearch staff543118747 May 7 08:32 _5p2.cfs -rw-r--r-- 1 dssearch staff543124452 May 7 08:53 _5qr.cfs -rw-r--r-- 1 dssearch staff543100201 May 7 09:18 _5sg.cfs .. .. As you can see couple of files are huge. Are those documents or index files? How can I control the file size so no single file grows more than 10GB. Thanks, -vivek On Thu, Apr 23, 2009 at 10:26 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, You are looking for maxMergeDocs, I believe. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: vivek sar vivex...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, April 23, 2009 1:08:20 PM Subject: Control segment size Hi, Is there any configuration to control the segments' file size in Solr? Currently, I've an index (70G) with 80 segment files and one of the file is 24G. We noticed that in some cases commit takes over 2 hours to complete (committing 50K records), whereas usually it finishes in 20 seconds. After further investigation it turns out the system was doing lot of paging - the file system buffer was trying to write back the big segment back to disk. I got 20G memory on system with 6 G assigned to Solr instance (running 2 instances). It seems if I can control the segment size to max of 4-5 GB I'll be ok. Is there any way to do so? I got merging factor of 100 - does that impacts the size too? Why different segments have different size? Thanks, -vivek
Backups using Java-based Replication (forced snapshot)
On the page http://wiki.apache.org/solr/SolrReplication, it says the following: Force a snapshot on master.This is useful to take periodic backups .command : http://master_host:port/solr/replication? command=snapshoot This then puts the snapshot under the data directory. Perfectly reasonable thing to do. However, is it possible to have it take in a directory location and store the snapshot there? For instance, I may want to have it write to a specific directory that is being watched for backup data. Thanks, Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: French and SpellingQueryConverter
Hi It does not seem to be related to FrenchStemmer, the stemmer does not split a word into 2 words. I have checked with other words and SpellingQueryConverter always splits words with special character. I think that the issue is in SpellingQueryConverter class Pattern.compile.((?:(?!(\\w+:|\\d+)))\\w+);?: According to http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html, \w A word character: [a-zA-Z_0-9] I think that special character should also be added to the regex. Best regards, Jonathan Jay Hill jayallenh...@gma il.comTo solr-user@lucene.apache.org 07/05/2009 20:33 cc Subject Please respond to Re: French and solr-u...@lucene. SpellingQueryConverter apache.org It seems to me that this is just the expected behavior of the FrenchAnalyzer using the FrenchStemmer. I'm not familiar with the French language, but in English words like running, runner, and runs are all stemmed down to run as intended. I don't know what other words in French would stem down to franc, but wouldn't this be what you would want? If not, maybe experiment with some of the other Analyzers to see if they give you what you need. -Jay On Thu, May 7, 2009 at 6:51 AM, Jonathan Mamou ma...@il.ibm.com wrote: Hi I have tried to run the following code package org.apache.solr.spelling; import org.apache.lucene.analysis.fr.FrenchAnalyzer; public class Test { public static void main (String args[]) { SpellingQueryConverter sqc = new SpellingQueryConverter(); sqc.analyzer = new FrenchAnalyzer(); System.out.println(sqc.convert(français)); }; }}; I would expect to get [(français,0,8,type=ALPHANUM)] However I get [(fran,0,4,type=ALPHANUM), (ais,5,8,type=ALPHANUM)] Is there any issue with the support of special characters? Thanks Jonathan
Autocommit blocking adds? AutoCommit Speedup?
Question 1: I see in DirectUpdateHandler2 that there is a read/Write lock used between addDoc and commit. My mental model of the process was this: clients can add/update documents until the auto commit threshold was hit. At that point the commit tracker would schedule a background commit. The commit would run and NOT BLOCK subsequent adds. clearly thast not happening because when the autocommit background thread runs it gets the iwCommit lock blocking anyone in addDoc trying to get iwAccess lock. Is this just the way it is or is it possible to configure Solr to process the pending documents int he background, queuing new documents in memory as before. Question 2: I ask this question because autocommits are taking a LONG time to complete, like 10-25 seconds. I have a 40M document index many 10s of GBs. What can I do to speed this up? Thanks Jim -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23435224.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Autocommit blocking adds? AutoCommit Speedup?
On Thu, May 7, 2009 at 5:03 PM, Jim Murphy jim.mur...@pobox.com wrote: Question 1: I see in DirectUpdateHandler2 that there is a read/Write lock used between addDoc and commit. My mental model of the process was this: clients can add/update documents until the auto commit threshold was hit. At that point the commit tracker would schedule a background commit. The commit would run and NOT BLOCK subsequent adds. clearly thast not happening because when the autocommit background thread runs it gets the iwCommit lock blocking anyone in addDoc trying to get iwAccess lock. Background: in the past, you had to close the Lucene IndexWriter so all changes would be flushed to disk (and you could then open a new IndexReader to seel the changes). You obviously can't be adding new documents while you're trying to close the writer - hence the locking. It as also the case that readers and writers had to be opened and closed in the right way to handle things like deletes (which had to go through the reader). This is no longer the case, and we should revisit the locking. I do think we should be able to continue indexing while doing a commit. -Yonik http://www.lucidimagination.com
preImportDeleteQuery
Hi, I'm importing data using the DIH. I manage all my data updates outside of Solr, so I use the full-import command to update my index (with clean=false). Everything works fine, except that I can't delete documents easily using the DIH. I noticed the preImportDeleteQuery attribute, but doesn't seem to do what I'm looking for. I'm looking to do something like: preImportDeleteQuery=ItemId={select ItemId from table where status='delete'} http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059 seems to address this, but I couldn't find any documentation for it in the wiki. Can someone provide an example of how to use this? Thanks, Wojtek -- View this message in context: http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr autocompletion in rails
First, your solrconfig.xml should have the something similar to the following: searchComponent name=termsComp class=org.apache.solr.handler.component.TermsComponent/ requestHandler name=/autoSuggest class=org.apache.solr.handler.component.SearchHandler arr name=components strtermsComp/str /arr /requestHandler This will give you a request handler called /autoSuggest that you will use for suggestions. Then you need to write some rails code to access this. I am not very familiar with ruby, but I believe you might want to try http://wiki.apache.org/solr/solr-ruby . Make sure you set your query type to /autoSuggest. If that won't work for you, then just use the standard http libraries to access the autoSuggest url directly and get json output. With any of these methods make sure you set the following parameters: terms=true terms.fl=source_field terms.lower=input_term terms.prefix=input_term terms.lower.incl=false For direct access to the json output you will want these as well: indent=true wt=json The terms.fl parameter specifys the field(s) you want to use as the source for suggestions. Make sure this field has very little processing done on it, maybe lowercasing and tokenization only. Here is an example url that should give you some output once things are working: http://localhost:8983/solr/autoSuggest?terms=trueterms.fl=spellterms.lower=tterms.prefix=tterms.lower.incl=falseindent=truewt=json The next thing is to parse the json output and do whatever you want with the results. In my example, I just printed out each suggestion on a single line of the response because this is what the jQuery autocomplete plugin wanted. The easiest way to parse the json output is to use the json ruby library, http://json.rubyforge.org/. After you have your rails controller working you can hook it into your FE with some javascript like I did in the example on my blog. Hope this helps. Thanks, Matt Weber eSr Technologies http://www.esr-technologies.com On May 7, 2009, at 7:37 AM, manisha_5 wrote: Thanks a lot for the information. But I am still a bit confused about the use of TermsComponents. Like where are we exactly going to put these codes in Solr.For example I changed schema.xml to add autocomplete feauture.I read your blog too, its very helpful.But still a little confused. :-(( Can you explain it a bit? Matt Weber-2 wrote: You will probably want to use the new TermsComponent in Solr 1.4. See http://wiki.apache.org/solr/TermsComponent . I just recently wrote a blog post about using autocompletion with TermsComponent, a servlet, and jQuery. You can probably follow these instructions, but instead of writing a servlet you can write a rails handler parsing the json output directly. http://www.mattweber.org/2009/05/02/solr-autosuggest-with-termscomponent-and-jquery/ . Thanks, Matt Weber On May 4, 2009, at 9:39 AM, manisha_5 wrote: Hi, I am new to solr. I am using solr server to index the data and make search in a Ruby on rails project.I want to add autocompletion feature. I tried with the xml patch in the schema.xml file of solr, but dont know how to test if the feature is working.also havent been able to integrate the same in the Rails project that is using Solr.Can anyone please provide some help in this regards?? the patch of codes in Schema.xml is : fieldType name=autocomplete class=solr.TextField analyzer type=index tokenizer class=solr.NGramTokenizerFactory minGramSize=3 maxGramSize=15 / filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.EdgeNGramFilterFactory maxGramSize=100 minGramSize=1 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=([^a-z0-9]) replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=^(.{20})(.*)? replacement=$1 replace=all / /analyzer /fieldType -- View this message in context: http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23372020.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Solr-autocompletion-in-rails-tp23372020p23428267.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: preImportDeleteQuery
On May 7, 2009, at 4:52 PM, wojtekpia wrote: Hi, I'm importing data using the DIH. I manage all my data updates outside of Solr, so I use the full-import command to update my index (with clean=false). Everything works fine, except that I can't delete documents easily using the DIH. I noticed the preImportDeleteQuery attribute, but doesn't seem to do what I'm looking for. I'm looking to do something like: preImportDeleteQuery=ItemId={select ItemId from table where status='delete'} http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059 seems to address this, but I couldn't find any documentation for it in the wiki. Can someone provide an example of how to use this? Thanks, Wojtek -- View this message in context: http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html Sent from the Solr - User mailing list archive at Nabble.com. I haven't used those special variables but I noticed an example of $skipDoc in the wiki under the Indexing Wikipedia example (http://wiki.apache.org/solr/DataImportHandler ). -- Martin
bug? No highlighting results with dismax and q.alt=*:*
For the Drupal Apache Solr Integration module, we are exploring the possibility of doing facet browsing - since we are using dismax as the default handler, this would mean issuing a query with an empty q and falling back to to q.alt='*:*' or some other q.alt that matches all docs. However, I notice when I do this that we do not get any highlights back in the results despite defining a highlight alternate field. In contrast, if I force the standard request handler then I do get text back from the highlight alternate field: select/?q=*:*qt=standardhl=truehl.fl=bodyhl.alternateField=bodyhl.maxAlternateFieldLength=256 However, I then loose the nice dismax features of weighting the results using bq and bf parameters. So, is this a bug or the intended behavior? The relevant fragment of the solrconfig.xml is this: requestHandler name=partitioned class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str str name=q.alt*:*/str !-- example highlighter config, enable per-query with hl=true -- str name=hltrue/str str name=hl.flbody/str int name=hl.snippets3/int str name=hl.mergeContiguoustrue/str !-- instructs Solr to return the field itself if no query terms are found -- str name=f.body.hl.alternateFieldbody/str str name=f.body.hl.maxAlternateFieldLength256/str Full solrconfig.xml and other files: http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/?pathrev=DRUPAL-6--1 -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: Autocommit blocking adds? AutoCommit Speedup?
Interesting. So is there a JIRA ticket open for this already? Any chance of getting it into 1.4? Its seriously kicking out butts right now. We write into our masters with ~50ms response times till we hit the autocommit then add/update response time is 10-30 seconds. Ouch. I'd be willing to work on submitting a patch given a better understanding of the issue. Jim -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23438134.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Phrase matching on a text field
Hi Jay Thank you for your response. The data relating to the string (s_title) defines *exactly* what was fed into the SOLR indexing. The string is not otherwise relevant to the question. The essence of my question is why can the indexed text (t_title) not be phrase matched by the query on the text when the word for is present in the query. The following work (and I would expect them to work): q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT q=t_title:future directions q=t_title:integrated catchment The following do not work (and I would expect them to work): q=t_title:directions for integrated The following do not work (not sure if I expect them to work or not): q=t_title:directions integrated My reading is that if the FOR is removed in the text indexing, it should also be removed for the text query! I also added 'enablePositionIncrements=true' to the text query analyzer to make it the same as the text index analyzer: filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ There was no change in the outcome. The definitions for text and string were exactly as in the SOLR 1.3 example schema (shown below). The section of that schema for text is shown below. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt !-- enablePositionIncrements=true -- / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Cheers, -- Phil The art of being wise is the art of knowing what to overlook. -- William James Jay Hill wrote: The string fieldtype is not being tokenized, while the text fieldtype is tokenized. So the stop word for is being removed by a stop word filter, which doesn't happen with the text field type (no tokenizing). Have a look at the schema.xml in the example dir and look at the default configuration for both the text and string fieldtypes. String string fieldtype is not analyzed whereas the text fieldtype has a number of different filters that take action. On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick p.chadw...@internode.on.netwrote: Hi, I'm trying to figure out why phrase matching on a text field only works some of the time. I have a SOLR index containing a document titled FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT. The FOR seems to be causing a problem... The title field is indexed as both s_title and t_title (string and text, as defined in the demo schema), thus: field name=title type=string indexed=false stored=false multiValued=false / field name=s_title type=string indexed=true stored=true multiValued=false / field name=t_title type=text indexed=true stored=false multiValued=false / copyField source=title dest=s_title / copyField source=title dest=t_title / I can match the document with an exact query on the string: q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT I can match the document with this phrase query on the text: q=t_title:future directions which uses the parsedquery shown by debugQuery=true: str name=rawquerystringt_title:future directions/str str name=querystringt_title:future directions/str str name=parsedqueryPhraseQuery(t_title:futur direct)/str str name=parsedquery_toStringt_title:futur direct/str Similarly, I can match the document with this query: q=t_title:integrated catchment which uses the parsedquery shown by debugQuery=true: str name=rawquerystringt_title:integrated catchment/str str name=querystringt_title:integrated catchment/str str name=parsedqueryPhraseQuery(t_title:integr catchment)/str str name=parsedquery_toStringt_title:integr catchment/str
StatsComponent and 1.3
Foreword: I'm not a java developer :) OSVDB.org and datalossdb.org make use of solr pretty extensively via acts_as_solr. I found myself with a real need for some of the StatsComponent stuff (mainly the sum feature), so I pulled down a nightly build and played with it. StatsComponent proved perfect, but... the nightly build output seems to be different, and thus incompatible with acts_as_solr. Now, I realize this is more or less an acts_as_solr issue, but... Is it possible, with some degree of effort (obviously) for me to essentially port some of the functionality of StatsComponent to 1.3 myself? It's that, or waiting for 1.4 to come out and someone developing support for it into acts_as_solr, or myself fixing what I have for acts_as_solr to work with the output. I'm just trying to gauge the easiest solution :) Any feedback or suggestions would be grand. Thanks, Dave Open Security Foundation
Re: Autocommit blocking adds? AutoCommit Speedup?
On Thu, May 7, 2009 at 8:37 PM, Jim Murphy jim.mur...@pobox.com wrote: Interesting. So is there a JIRA ticket open for this already? Any chance of getting it into 1.4? No ticket currently open, but IMO it could make it for 1.4. Its seriously kicking out butts right now. We write into our masters with ~50ms response times till we hit the autocommit then add/update response time is 10-30 seconds. Ouch. It's probably been made a little worse lately since Lucene now does fsync on index files before writing the segments file that points to those files. A necessary evil to prevent index corruption. I'd be willing to work on submitting a patch given a better understanding of the issue. Great, go for it! -Yonik http://www.lucidimagination.com
Re: Backups using Java-based Replication (forced snapshot)
makes sense. I'll open an issue On Fri, May 8, 2009 at 1:53 AM, Grant Ingersoll gsing...@apache.org wrote: On the page http://wiki.apache.org/solr/SolrReplication, it says the following: Force a snapshot on master.This is useful to take periodic backups .command : http://master_host:port/solr/replication?command=snapshoot; This then puts the snapshot under the data directory. Perfectly reasonable thing to do. However, is it possible to have it take in a directory location and store the snapshot there? For instance, I may want to have it write to a specific directory that is being watched for backup data. Thanks, Grant -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: preImportDeleteQuery
are you doing a full-import or a delta-import? for delta-import there is an option of deletedPkQuery which should meet your needs On Fri, May 8, 2009 at 5:22 AM, wojtekpia wojte...@hotmail.com wrote: Hi, I'm importing data using the DIH. I manage all my data updates outside of Solr, so I use the full-import command to update my index (with clean=false). Everything works fine, except that I can't delete documents easily using the DIH. I noticed the preImportDeleteQuery attribute, but doesn't seem to do what I'm looking for. I'm looking to do something like: preImportDeleteQuery=ItemId={select ItemId from table where status='delete'} http://issues.apache.org/jira/browse/SOLR-1059 SOLR-1059 seems to address this, but I couldn't find any documentation for it in the wiki. Can someone provide an example of how to use this? Thanks, Wojtek -- View this message in context: http://www.nabble.com/preImportDeleteQuery-tp23437674p23437674.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr spring application context error
a point to keep in mind is that all the plugin code and everything else must be put into the solrhome/lib directory. where have you placed the file com/mypackage/applicationContext.xml ? On Fri, May 8, 2009 at 12:19 AM, Raju444us gudipal...@gmail.com wrote: I have configured solr using tomcat.Everything works fine.I overrode QParserPlugin and configured it.The overriden QParserPlugin has a dependency on another project say project1.So I made a jar of the project and copied the jar to the solr/home lib dir. the project1 project is using spring.It has a factory class which loads the beans.Iam using this factory calss in QParserPlugin to get a bean.When I start my tomcat the factory class is loading fine.But the problem is its not loading the beans.And Iam getting exception org.springframework.beans.factory.BeanDefinitionStoreException: IOException parsing XML document from class path resource [com/mypackage/applicationContext.xml]; nested exception is java.io.FileNotFoundException: class path resource [com/mypackage/applicationContext.xml] cannot be opened because it does not exist Do I need to do something else?. Can anybody please help me. Thanks, Raju -- View this message in context: http://www.nabble.com/Solr-spring-application-context-error-tp23432901p23432901.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Fwd: Solr MultiCore dataDir bug - a fix
I didn't notice that the mail was not sent to the list. Plz send all your communication to the mailing list -- Forwarded message -- From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Date: 2009/5/8 Subject: Re: Solr MultiCore dataDir bug - a fix To: pasi.j.matilai...@tieto.com are you sure that your solrconfig.xml does not have a dataDir tag ? If it is there, then it is supposed to take precedence over the one you have put in solr.xml On Fri, May 8, 2009 at 10:43 AM, pasi.j.matilai...@tieto.com wrote: Hello, I encountered yesterday the problem that MultiCore Solr doesn't handle properly the dataDir setting in solr.xml, regardless of whether it's specified as a nested property or as an attribute to core element. I found a mail thread where you on March 4th, 2009 promised to have it fixed in a day or two. (http://markmail.org/message/oylfeldy53lebsfe#query:solr%20multicore%20datadir+page:1+mid:abfbhdxxt3r3zujs+state:results) Anyway, as the current Solr trunk didn't contain the fix, I hunted the bug down myself. And as I don't want to take the time to get account to update the patch to Solr SVN myself, I'm sending the fix to you instead. In current trunk, in SolrCore constructor, at line 491, there currently is: if (dataDir == null) dataDir = config.get(dataDir,cd.getDataDir()); I replaced this with the following code: if (dataDir == null) { if (cd.getDataDir()!=null) dataDir = cd.getDataDir(); else dataDir = config.get(dataDir,cd.getDefaultDataDir()); } I'm not sure this fully represents how this is supposed to work, but it works anyway. At least when I specify dataDir as an attribute to core element with a path relative to instanceDir: !-- instanceDir resolves to solr/current/ and dataDir to solr/current/data -- core name=current instanceDir=current dataDir=data / Best regards, Pasi J. Matilainen, Software Engineer Tieto Finland Oy, RD Services, Devices RD pasi.j.matilai...@tieto.com, mobile +358 (0)40 575 7738, fax +358 (0)14 618 566 Visiting address: Mattilanniemi 6, 40101 JYVÄSKYLÄ, Mailing address: P.O. Box 163, 40101 JYVÄSKYLÄ, Finland, www.tieto.com Meet the new Tieto: www.tieto.com/newtieto Please note: The information contained in this message may be legally privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, you are hereby notified that any unauthorised use, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank You. -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Phrase matching on a text field
Hi, I have tracked this problem to: https://issues.apache.org/jira/browse/SOLR-879 Executive summary is that there are errors that relate to text fields in both: - src/java/org/apache/solr/search/SolrQueryParser.java - example/solr/conf/schema.xml It is fixed in 1.4. Thank you Yonik Seeley for the original diagnosis and fix. Cheers, -- Phil It may be that your sole purpose in life is simply to serve as a warning to others. Phil Chadwick wrote: Hi Jay Thank you for your response. The data relating to the string (s_title) defines *exactly* what was fed into the SOLR indexing. The string is not otherwise relevant to the question. The essence of my question is why can the indexed text (t_title) not be phrase matched by the query on the text when the word for is present in the query. The following work (and I would expect them to work): q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT q=t_title:future directions q=t_title:integrated catchment The following do not work (and I would expect them to work): q=t_title:directions for integrated The following do not work (not sure if I expect them to work or not): q=t_title:directions integrated My reading is that if the FOR is removed in the text indexing, it should also be removed for the text query! I also added 'enablePositionIncrements=true' to the text query analyzer to make it the same as the text index analyzer: filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ There was no change in the outcome. The definitions for text and string were exactly as in the SOLR 1.3 example schema (shown below). The section of that schema for text is shown below. fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt !-- enablePositionIncrements=true -- / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Cheers, -- Phil The art of being wise is the art of knowing what to overlook. -- William James Jay Hill wrote: The string fieldtype is not being tokenized, while the text fieldtype is tokenized. So the stop word for is being removed by a stop word filter, which doesn't happen with the text field type (no tokenizing). Have a look at the schema.xml in the example dir and look at the default configuration for both the text and string fieldtypes. String string fieldtype is not analyzed whereas the text fieldtype has a number of different filters that take action. On Wed, May 6, 2009 at 11:09 PM, Phil Chadwick p.chadw...@internode.on.netwrote: Hi, I'm trying to figure out why phrase matching on a text field only works some of the time. I have a SOLR index containing a document titled FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT. The FOR seems to be causing a problem... The title field is indexed as both s_title and t_title (string and text, as defined in the demo schema), thus: field name=title type=string indexed=false stored=false multiValued=false / field name=s_title type=string indexed=true stored=true multiValued=false / field name=t_title type=text indexed=true stored=false multiValued=false / copyField source=title dest=s_title / copyField source=title dest=t_title / I can match the document with an exact query on the string: q=s_title:FUTURE DIRECTIONS FOR INTEGRATED CATCHMENT I can match the document with this phrase query on the text: q=t_title:future directions which uses the parsedquery shown by debugQuery=true: str
RE: Core Reload issue
From my understanding re-indexing the documents is a different thing. If you have the stop word filter for field type say text then after reloading the core if i type in a query which is stop word only it would get parsed from stop word filter which would eventually will not serach against the index. But in my case i am getting the results having the stop word; so the issue. ~Sagar From: noble.p...@gmail.com Date: Tue, 5 May 2009 10:09:29 +0530 Subject: Re: Core Reload issue To: solr-user@lucene.apache.org If you change the the conf files and if you reindex the documents it must be reflected are you sure you re-indexed? On Tue, May 5, 2009 at 10:00 AM, Sagar Khetkade sagar.khetk...@hotmail.com wrote: Hi, I came across a strange problem while reloading the core in multicore scenario. In the config of one of the core I am making changes in the synonym and stopword files and then reloading the core. The core gets reloaded but the changes in stopword and synonym fiels does not get reflected when I query in. The filters for index and query are the same. I face this problem even if I reindex the documents. But when I restart the servlet container in which the solr is embedded I problem does not resurfaces. My ultimate goal is/was to reflect the changes made in the text files inside the config folder. Is this the expected behaviour or some problem at my side. Could anyone suggest me the possible work around? Thanks in advance! Regards, Sagar Khetkade _ More than messages–check out the rest of the Windows Live™. http://www.microsoft.com/india/windows/windowslive/ -- - Noble Paul | Principal Engineer| AOL | http://aol.com _ Planning the weekend ? Here’s what is happening in your town. http://msn.asklaila.com/events/