Replication and segment files
We are at times having some difficulty achieving a 'successful' replication. Our Operations personnel have reported the following behavior (which I cannot attest to): A master has a set of segment files (let's say 25). A slave then polls the master, get the list of segment files that differ and begins to download them. Sometime during the download, the master combines two or more of the files that the slave is going to download and when the slave attempts the download it fails. We're aware that a subsequent attempt usually yields success, but I'm curious as to whether there are any configuration settings that can help mitigate this circumstance.
RE: stemEnglishPossessive and contractions
Thanks Robert, exactly what I was looking for. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, October 19, 2011 1:15 PM To: solr-user@lucene.apache.org Subject: Re: stemEnglishPossessive and contractions The word delimiter filter also does other things, it treats ' as punctuation by default. So it normally splits on ', except if its 's (in this case it removes the 's completely if you use this stemEnglishPossessive). There are a couple approaches you can use: 1. you can keep worddelimiterfilter with this option on, but disabling splitting on ' by customize its type table. in this case specify types=mycustomtypes.txt, and in that file specify ' to be treated as ALPHANUM or similar. see https://issues.apache.org/jira/browse/SOLR-2059 for some examples of this. i would only do this if you want worddelimiterfilter for other purposes, if you just want to remove possessives and don't need worddelimiterfilter's other features, look below. 2. you can instead use EnglishPossessiveFilterFactory, which only does this exact thing (remove 's) and nothing else. On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus wrote: > We utilize a comprehensive dictionary of English words, place names, > surnames, male and female first names, ... you get the point. As such, the > possessive plural forms of these words are recognized as 'misspelled'. > > I simply thought that 'turning on' this option for the WordDelimiterFactory > would address my concerns; however, I also got an unintended consequence: > Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be > affected. Is this intended behavior? When I read 'English possessive' I > hear 'apostrophe s' and not 'apostrophe anything'. Is there something I'm > missing here? > -- lucidimagination.com
stemEnglishPossessive and contractions
We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point. As such, the possessive plural forms of these words are recognized as 'misspelled'. I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected. Is this intended behavior? When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'. Is there something I'm missing here?
Extreme QTime
We service about 25K of each particular query type per hour per server. QTime *averages* less than a second; however, there always a few (1-10) whose QTimes go way above (10 - 500 seconds) the average. If I harvest these queries from the log an re-execute them they of course execute sub-second. Why are some of these queries running long? My first thought was perhaps these queries were occurring subsequent to replication commits, which happen every 10 minutes; however, there seem to be no clustering of these events around a 10 minute periodic cycle. (Given that I have not established any appropriate warming queries, this seemed a logical conclusion). My next though was to compare the times that these queries executed versus what I see in the log file (grep SEVERE...) But I found nothing to correlate. Do you folks have any ideas?
RE: MoreLikeThis assumptions
It generally helps if your solrconfig is correct. Thank you for your tolerance. -Original Message- From: Herman Kiefus [mailto:herm...@angieslist.com] Sent: Thursday, September 01, 2011 10:15 AM To: solr-user@lucene.apache.org Subject: MoreLikeThis assumptions Given a document id:n show me those other documents with similar values in the 'Name' field: http://devsolr03:8983/solr/primary/select?q=id:182652&fl=id,Name,score&mlt=true&mlt.fl=Name My assumption is the above query will generate the desired outcome. It does; however, given a different document (id) it does not. Both id's identify a document whose name contains the term 'smith'. Stated differently if A is like B, C, and D I would assume that B is like A, C, and D, but these are not the results that I'm seeing. My objective is to simply seek out similar documents (based on several fields, I'm just using one here) for any given document; a simple 'duplicate checker' if you will. Am I misguided in my assumptions?
RE: Getting MoreLikeThisHandler operational.
Thank you very much. Name mlt -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Thursday, September 01, 2011 11:06 AM To: solr-user@lucene.apache.org Subject: Re: Getting MoreLikeThisHandler operational. (11/09/01 23:24), Herman Kiefus wrote: > class="org.apache.solr.handler.component.MoreLikeThisComponent"> > >mlt > > > > but ends up returning a 500 error on a core reload. What is an appropriate > configuration entry for the MLT handler? Why you got 500 error because MLTComponent was set for requestHandler class. Set class="solr.SearchHandler" for it. koji -- Check out "Query Log Visualizer" for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Getting MoreLikeThisHandler operational.
I've begun tinkering with MLT using the standard request handler. The Wiki also suggests using the MoreLikeThis handler directly, but apparently, this is not in the default configuration (as I recall, I haven't removed anything from solrconfig.xml as shipped). For example: http://devsolr03:8983/solr/primary/mlt?q=id:3197684&fl=id,Name,Score&mlt=true&mlt.fl=Name yields 'The requested resource is not available'. I tried adding this to my solrconfig.xml: mlt but ends up returning a 500 error on a core reload. What is an appropriate configuration entry for the MLT handler?
MoreLikeThis assumptions
Given a document id:n show me those other documents with similar values in the 'Name' field: http://devsolr03:8983/solr/primary/select?q=id:182652&fl=id,Name,score&mlt=true&mlt.fl=Name My assumption is the above query will generate the desired outcome. It does; however, given a different document (id) it does not. Both id's identify a document whose name contains the term 'smith'. Stated differently if A is like B, C, and D I would assume that B is like A, C, and D, but these are not the results that I'm seeing. My objective is to simply seek out similar documents (based on several fields, I'm just using one here) for any given document; a simple 'duplicate checker' if you will. Am I misguided in my assumptions?
RE: Text Analysis and copyField
It had crossed my mind but for now we have a 'DictionarySource' field whose type utilizes the KeepWordFilterFactory that uses a text file containing all correctly spelled words (thanks to scrabble), location/last/first names (courtesy of the US census bureau) and a few other adds (month/day) names. A file this large does not seem to have a material impact on indexing. What we're seeing now (we also have a field 'TermsMisspelled' that utilizes the same text file with StopFilterFactory) is almost pure misspellings and some contractions (can't, won't, don't, etc.). Thank you everyone for your help here, this is a truly fine community. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, August 24, 2011 1:00 PM To: solr-user@lucene.apache.org Subject: Re: Text Analysis and copyField Have you considered having two dictionaries and using ajax to query them both and intermingling the results in your suggestions? It'd be some work, but I think it might accomplish what you want. Best Erick On Tue, Aug 23, 2011 at 1:48 PM, Herman Kiefus wrote: > To close, I found this article from Hoss: > http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-td > 3122408.html > > Since I cannot use one copyField directive to copy from another copyField's > dest[ination], I cannot achieve what I desire: some terms that are subject to > KeepWordFilterFactory and some that are not. > > -Original Message- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Monday, August 22, 2011 1:16 PM > To: solr-user@lucene.apache.org > Subject: Re: Text Analysis and copyField > > I suspect that the things going into TermsDictionary are from fields other > than CorrectlySpelledTerms. > > In other words I don't think that anything is getting into TermsDictionary > from CorrectlySpelledTerms... > > Be careful to remove the index between schema changes, just to be sure that > you're not seeing old data. > > Best > Erick > > On Mon, Aug 22, 2011 at 11:41 AM, Herman Kiefus > wrote: >> That's what I thought, but my experiments show differently. In actuality: >> >> I have a number of fields that are of type "text" (the default as it is >> packaged). >> >> I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in >> index-time analysis, using a file of terms which are known to be correctly >> spelled. >> >> I have a type 'textDictionary' that has no index-time analysis. >> >> I have the fields: >> > indexed="false" stored="false" multiValued="true"/> > name="TermsDictionary" type="textDictionary" indexed="true" >> stored="false" multiValued="true"/> >> >> I want 'TermsDictionary' to contain only those terms from some fields that >> are correctly spelled plus those terms from a couple other fields >> (CompanyName and ContactName) as is. I use several copyField directives as >> follows: >> >> > source="Field2" dest="CorrectlySpelledTerms"/> > source="Field3" dest="CorrectlySpelledTerms"/> >> >> > source="Contact" dest="TermsDictionary"/> > ="CorrectlySpelledTerms" dest="TermsDictionary"/> >> >> If I query 'Field1' for a term that I know is misspelled (electical) it >> yields results. >> If I query 'TermsDictionary' for the same term it yields no results. >> >> It would seem by these results that 'TermsDictionary' only contains those >> terms with misspellings stripped as a results of the text analysis on the >> field 'CorrectlySpelledTerms'. >> >> Asked another way, I think you can see what I'm getting at: a source for the >> spellchecker that only contains correct spelled terms plus proper names; >> should I have gone about this in a different way? >> >> -Original Message- >> From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] >> Sent: Monday, August 22, 2011 9:30 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Text Analysis and copyField >> >> On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus >> wrote: >>> Is my thinking correct? >>> >>> I have a field 'F1' of type 'T1' whose index time analysis employs the >>> StopFilterFactory. >>> >>> I also have a field 'F2' of type 'T2' whose index time analysis does NOT >>> employ the StopFilterFactory. >>> >>> There is a copyField directive source="F1" dest="F2" >>> >>> F2 will not contain any stop words because they were filtered out as F1 was >>> populated. >>> >> >> No, F2 will contain stop words. Copy fields does not process input through >> a chain, it sends the original content to each field and therefore analysis >> is totally independent. >> >> -- >> Stephen Duncan Jr >> www.stephenduncanjr.com >> >
RE: Text Analysis and copyField
To close, I found this article from Hoss: http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-td3122408.html Since I cannot use one copyField directive to copy from another copyField's dest[ination], I cannot achieve what I desire: some terms that are subject to KeepWordFilterFactory and some that are not. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Monday, August 22, 2011 1:16 PM To: solr-user@lucene.apache.org Subject: Re: Text Analysis and copyField I suspect that the things going into TermsDictionary are from fields other than CorrectlySpelledTerms. In other words I don't think that anything is getting into TermsDictionary from CorrectlySpelledTerms... Be careful to remove the index between schema changes, just to be sure that you're not seeing old data. Best Erick On Mon, Aug 22, 2011 at 11:41 AM, Herman Kiefus wrote: > That's what I thought, but my experiments show differently. In actuality: > > I have a number of fields that are of type "text" (the default as it is > packaged). > > I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in > index-time analysis, using a file of terms which are known to be correctly > spelled. > > I have a type 'textDictionary' that has no index-time analysis. > > I have the fields: > indexed="false" stored="false" multiValued="true"/> name="TermsDictionary" type="textDictionary" indexed="true" > stored="false" multiValued="true"/> > > I want 'TermsDictionary' to contain only those terms from some fields that > are correctly spelled plus those terms from a couple other fields > (CompanyName and ContactName) as is. I use several copyField directives as > follows: > > source="Field2" dest="CorrectlySpelledTerms"/> source="Field3" dest="CorrectlySpelledTerms"/> > > source="Contact" dest="TermsDictionary"/> ="CorrectlySpelledTerms" dest="TermsDictionary"/> > > If I query 'Field1' for a term that I know is misspelled (electical) it > yields results. > If I query 'TermsDictionary' for the same term it yields no results. > > It would seem by these results that 'TermsDictionary' only contains those > terms with misspellings stripped as a results of the text analysis on the > field 'CorrectlySpelledTerms'. > > Asked another way, I think you can see what I'm getting at: a source for the > spellchecker that only contains correct spelled terms plus proper names; > should I have gone about this in a different way? > > -Original Message- > From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] > Sent: Monday, August 22, 2011 9:30 AM > To: solr-user@lucene.apache.org > Subject: Re: Text Analysis and copyField > > On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus wrote: >> Is my thinking correct? >> >> I have a field 'F1' of type 'T1' whose index time analysis employs the >> StopFilterFactory. >> >> I also have a field 'F2' of type 'T2' whose index time analysis does NOT >> employ the StopFilterFactory. >> >> There is a copyField directive source="F1" dest="F2" >> >> F2 will not contain any stop words because they were filtered out as F1 was >> populated. >> > > No, F2 will contain stop words. Copy fields does not process input through a > chain, it sends the original content to each field and therefore analysis is > totally independent. > > -- > Stephen Duncan Jr > www.stephenduncanjr.com >
RE: Spellcheck Phrases
The angle that I am trying here is to create a dictionary from indexed terms that contain only correctly spelled words. We are doing this by having the field from which the dictionary is created utilize a type that employs solr.KeepWordFilterFactory, which in turn utilizes a text file of known correctly spelled words (including their respective derivations example: lead, leads, leading, etc.). This is working great for us with the exception being those fields in our schema that contain proper names. I can't seem to get (unfiltered) terms from those fields along with (correctly spelled) terms from other fields into the single field upon which the dictionary is built. -Original Message- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Thursday, June 02, 2011 11:40 AM To: solr-user@lucene.apache.org Subject: RE: Spellcheck Phrases Actually, someone just pointed out to me that a patch like this is unnecessary. The code works as-is if configured like this: .01 (correct) instead of this: .01 (incorrect) I tested this and it seems to work. I'm still am trying to figure out if using this parameter actually improves the quality of our spell suggestions, now that I know how to use it properly. Sorry about the mis-information earlier. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Dyer, James Sent: Wednesday, June 01, 2011 3:02 PM To: solr-user@lucene.apache.org Subject: RE: Spellcheck Phrases Tanner, I just entered SOLR-2571 to fix the float-parsing-bug that breaks "thresholdTokenFrequency". Its just a 1-line code fix so I also included a patch that should cleanly apply to solr 3.1. See https://issues.apache.org/jira/browse/SOLR-2571 for info and patches. This parameter appears absent from the wiki. And as it has always been broken for me, I haven't tested it. However, my understanding it should be set as the minimum percentage of documents in which a term has to occur in order for it to appear in the spelling dictionary. For instance in the config below, a term would have to occur in at least 1% of the documents for it to be part of the spelling dictionary. This might be a good setting for long fields but for the short fields in my application, I was thinking of setting this to something like 1/1000 of 1% ... text spellchecker Spelling_Dictionary text ./spellchecker .01 James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Tanner Postert [mailto:tanner.post...@gmail.com] Sent: Friday, May 27, 2011 6:04 PM To: solr-user@lucene.apache.org Subject: Re: Spellcheck Phrases are there any updates on this? any third party apps that can make this work as expected? On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James wrote: > Tanner, > > Currently Solr will only make suggestions for words that are not in > the dictionary, unless you specifiy "spellcheck.onlyMorePopular=true". > However, if you do that, then it will try to "improve" every word in > your query, even the ones that are spelled correctly (so while it > might change "brake" to "break" it might also change "leg" to "log".) > > You might be able to alleviate some of the pain by setting the > "thresholdTokenFrequency" so as to remove misspelled and rarely-used > words from your dictionary, although I personally haven't been able to > get this parameter to work. It also doesn't seem to be documented on > the wiki but it is in the 1.4.1. source code, in class > IndexBasedSpellChecker. Its also mentioned in Smiley&Pugh's book. I > tried setting it like this, but got a ClassCastException on the float value: > > > text_spelling > > spellchecker > Spelling_Dictionary > text_spelling > true name="thresholdTokenFrequency">.001 > > > > I have it on my to-do list to look into this further but haven't yet. > If you decide to try it and can get it to work, please let me know how > you do it. > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > -Original Message- > From: Tanner Postert [mailto:tanner.post...@gmail.com] > Sent: Wednesday, February 23, 2011 12:53 PM > To: solr-user@lucene.apache.org > Subject: Spellcheck Phrases > > right now when I search for 'brake a leg', solr returns valid results > with no indication of misspelling, which is understandable since all > of those terms are valid words and are probably found in a few pieces > of our content. > My question is: > > is there any way for it to recognize that the phase should be "break a leg" > and not "brake a leg" and suggest the proper phrase? >
Spellcheck index replication
We employ one 'indexing' master that replicates to many 'query' slaves. We have also recently introduced spellchecking/DYM. It appears that replication does not 'cover' the spellchecker index. Do I understand this correctly? Further, we have seen where 'buildOnCommit' will cause the spellcheck index to be [re]built on each slave; however, during the time that the spellcheck index is being rebuilt, spellcheck queries do not produce suggestions, which makes sense. What suggestions do the community have regarding this issue and/or what is working well for you?
Dictionary of Correctly Spelled terms
My objective is to end up with a field that can be used to build the spellcheck dictionary; however, that field will only contain correctly spelled terms other than those terms originating from two other 'proper name' fields. I thought I had this working, but feedback from a separate thread seems to indicate otherwise. My approach was to use copyField directives to move terms from those fields that I want to strip misspellings from to a field that uses the KeepWordFilterFactory with a file containing only correctly spelled words. Further, this field would be copied to the 'dictionary' field along with the two other 'proper name' fields. The 'dictionary' field has no text analysis as my assumption was that it would be getting those terms from the source whose contents were already subject to the analysis tied to its type. If this is not the case, how could someone go about creating such a dictionary field (other than going outside Solr)?
RE: Text Analysis and copyField
That's what I thought, but my experiments show differently. In actuality: I have a number of fields that are of type "text" (the default as it is packaged). I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in index-time analysis, using a file of terms which are known to be correctly spelled. I have a type 'textDictionary' that has no index-time analysis. I have the fields: I want 'TermsDictionary' to contain only those terms from some fields that are correctly spelled plus those terms from a couple other fields (CompanyName and ContactName) as is. I use several copyField directives as follows: If I query 'Field1' for a term that I know is misspelled (electical) it yields results. If I query 'TermsDictionary' for the same term it yields no results. It would seem by these results that 'TermsDictionary' only contains those terms with misspellings stripped as a results of the text analysis on the field 'CorrectlySpelledTerms'. Asked another way, I think you can see what I'm getting at: a source for the spellchecker that only contains correct spelled terms plus proper names; should I have gone about this in a different way? -Original Message- From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] Sent: Monday, August 22, 2011 9:30 AM To: solr-user@lucene.apache.org Subject: Re: Text Analysis and copyField On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus wrote: > Is my thinking correct? > > I have a field 'F1' of type 'T1' whose index time analysis employs the > StopFilterFactory. > > I also have a field 'F2' of type 'T2' whose index time analysis does NOT > employ the StopFilterFactory. > > There is a copyField directive source="F1" dest="F2" > > F2 will not contain any stop words because they were filtered out as F1 was > populated. > No, F2 will contain stop words. Copy fields does not process input through a chain, it sends the original content to each field and therefore analysis is totally independent. -- Stephen Duncan Jr www.stephenduncanjr.com
Text Analysis and copyField
Is my thinking correct? I have a field 'F1' of type 'T1' whose index time analysis employs the StopFilterFactory. I also have a field 'F2' of type 'T2' whose index time analysis does NOT employ the StopFilterFactory. There is a copyField directive source="F1" dest="F2" F2 will not contain any stop words because they were filtered out as F1 was populated.
RE: Solr spellcheck and multiple collations
Nice catch, I was sending maxCollations with a capital M. -Original Message- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Wednesday, August 17, 2011 6:10 PM To: solr-user@lucene.apache.org Subject: RE: Solr spellcheck and multiple collations I quickly went through what you've got from your last 2 posts and do not see any problems. You might want to double-check that your client is translating the constant variable you've got for "spellcheck.maxCollationTries" correctly in your query, or if you've got it in the request handler config that its spelled out right in there. The other thing, obviously, is you'll only get 1 collation if there is only 1 combination from the individual words it suggested that returns hits. You may need to play with different test queries to find one that can generate more than 1 good collation. Also if you set spellcheck.maxCollationTries down to zero it will return all the possibilities (up to the spellcheck.maxCollation value), even the nonsensical ones. That might be helpful to do for testing. Also, these params are in solr 3.x and higher. So it won't work in 1.4 without the SOLR-2010 patch. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Herman Kiefus [mailto:herm...@angieslist.com] Sent: Wednesday, August 17, 2011 4:55 PM To: solr-user@lucene.apache.org Subject: RE: Solr spellcheck and multiple collations Thanks James, here are the settings that only yield the one collation: static int count = 10; static bool onlyMorePopular = true; static bool extendedResults = true; static bool collate = true; static int maxCollations = 10; static int maxCollationTries = 100; static int maxCollationEvaluations = 1; static bool collateExtendedResults = true; static float accuracy = 0.7f; -Original Message- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Wednesday, August 17, 2011 5:48 PM To: solr-user@lucene.apache.org Subject: RE: Solr spellcheck and multiple collations Herman, - Specify "spellcheck.maxCollations" with something higher than one to get more than 1 collation. - If you also want the spellchecker to test whether or not a particular collation will return hits, also specify "spellcheck.maxCollationTries" - If you also want to know how many hits each collation will return, also specify "spellcheck.collateExtendedResults=true" - See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxCollations for more information James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Herman Kiefus [mailto:herm...@angieslist.com] Sent: Wednesday, August 17, 2011 4:31 PM To: solr-user@lucene.apache.org Subject: Solr spellcheck and multiple collations After a bit of work, we have 'spellchecking' up and going and we are happy with the suggestions. I have not; however, ever been able to generate more than one collation query. Is there something simple that I have overlooked?
RE: Solr spellcheck and multiple collations
Thanks James, here are the settings that only yield the one collation: static int count = 10; static bool onlyMorePopular = true; static bool extendedResults = true; static bool collate = true; static int maxCollations = 10; static int maxCollationTries = 100; static int maxCollationEvaluations = 1; static bool collateExtendedResults = true; static float accuracy = 0.7f; -Original Message- From: Dyer, James [mailto:james.d...@ingrambook.com] Sent: Wednesday, August 17, 2011 5:48 PM To: solr-user@lucene.apache.org Subject: RE: Solr spellcheck and multiple collations Herman, - Specify "spellcheck.maxCollations" with something higher than one to get more than 1 collation. - If you also want the spellchecker to test whether or not a particular collation will return hits, also specify "spellcheck.maxCollationTries" - If you also want to know how many hits each collation will return, also specify "spellcheck.collateExtendedResults=true" - See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxCollations for more information James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message- From: Herman Kiefus [mailto:herm...@angieslist.com] Sent: Wednesday, August 17, 2011 4:31 PM To: solr-user@lucene.apache.org Subject: Solr spellcheck and multiple collations After a bit of work, we have 'spellchecking' up and going and we are happy with the suggestions. I have not; however, ever been able to generate more than one collation query. Is there something simple that I have overlooked?
RE: Solr spellcheck and multiple collations
If you only get one, best, collation then there is no point to my question; however, since you asked... The relevant sections: Solrconfig.xml - textDictionary default solr.IndexBasedSpellChecker TermsDictionary ./spellchecker 0.0 score Schema.xml - -Original Message- From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] Sent: Wednesday, August 17, 2011 5:34 PM To: solr-user@lucene.apache.org Subject: Re: Solr spellcheck and multiple collations Can u show us how is your schema and config? I believe that's how collation is: the best match, only one. 2011/8/17 Herman Kiefus > After a bit of work, we have 'spellchecking' up and going and we are > happy with the suggestions. I have not; however, ever been able to > generate more than one collation query. Is there something simple that I > have overlooked? > -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Solr spellcheck and multiple collations
After a bit of work, we have 'spellchecking' up and going and we are happy with the suggestions. I have not; however, ever been able to generate more than one collation query. Is there something simple that I have overlooked?
RE: solr keeps dying every few hours.
While I can't be as specific as other here will be, we encountered the same/similar problem. We simply loaded up our servers with 48GB and life is good. I too would like to be a bit more proactive on the provisioning front and hopefully someone will come along and help us out. FWIW and I'm sure someone will correct me, but it seems as if the Java GC cannot keep up with cache allocation; in our case everything was fine until the nth query and then the box would go TU. But leave it to Solr, it would simply 'restart' and start serving queries again. -Original Message- From: Jason Toy [mailto:jason...@gmail.com] Sent: Wednesday, August 17, 2011 5:15 PM To: solr-user@lucene.apache.org Subject: solr keeps dying every few hours. I have a large ec2 instance(7.5 gb ram), it dies every few hours with out of heap memory issues. I started upping the min memory required, currently I use -Xms3072M . I insert about 50k docs an hour and I currently have about 65 million docs with about 10 fields each. Is this already too much data for one box? How do I know when I've reached the limit of this server? I have no idea how to keep control of this issue. Am I just supposed to keep upping the min ram used for solr? How do I know what the accurate amount of ram I should be using is? Must I keep adding more memory as the index size grows, I'd rather the query be a little slower if I can use constant memory and have the search read from disk.
RE: 'Stable' 4.0 version
I should say I'm running: Solr Specification Version: 4.0.0.2010.12.10.08.54.56 and by the looks of the version number I'm running something from Dec 12 of last year. Tomas: geofilt and geodist() are supported in 3.3? Along with the location and point type? Quite frankly, 1.3/1.4, 3.3, 4.0 all confuse me. I just had our operations personnel install versions until I got the needed functionality. -Original Message- From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] Sent: Wednesday, August 17, 2011 5:12 PM To: solr-user@lucene.apache.org Subject: Re: 'Stable' 4.0 version As far as I know, Solr's trunk is pretty stable, so you shoundl't have many problems with it if you test it correctly. Lucid's search platform is built upon the trunk ( http://www.lucidimagination.com/products/lucidworks-search-platform/enterprise ). The one thing I would be concerned is with the index format. It might change in an incompatible way from one revision to the next one, so if rebuilding your index is complicated or takes too long this can be a problem. If your version election is based on the geospatial stuff, why don't you use Solr 3.3 release? It already contains those features. Tomás On Wed, Aug 17, 2011 at 4:58 PM, Jaeger, Jay - DOT wrote: > > geospatial requirements > > Looking at your email address, no surprise there. 8^) > > > What insight can you share (if any) regarding moving forward to a > > later > nightly build? > > I used build 1271 (Solr 1.4.1, which seemed to be called Solr 4 at the > time) during some testing, and it performed well -- but we were not > doing geospatial indexing with Solr. Or are you referring to the > successor to Solr 3.3 at some future point in time (which I supposed > might also be called Solr 4 in the future -- won't that be confusing!) > > -Original Message- > From: Herman Kiefus [mailto:herm...@angieslist.com] > Sent: Wednesday, August 17, 2011 2:55 PM > To: solr-user@lucene.apache.org > Subject: 'Stable' 4.0 version > > My origination uses Solr 4 because of our geospatial requirements. > What insight can you share (if any) regarding moving forward to a > later nightly build? Or, for those of you using 4.0 in a Production > setting, when is it that you move ahead? >
'Stable' 4.0 version
My origination uses Solr 4 because of our geospatial requirements. What insight can you share (if any) regarding moving forward to a later nightly build? Or, for those of you using 4.0 in a Production setting, when is it that you move ahead?