Replication and segment files

2012-01-16 Thread Herman Kiefus
We are at times having some difficulty achieving a 'successful' replication.  
Our Operations personnel have reported the following behavior (which I cannot 
attest to): A master has a set of segment files (let's say 25).  A slave then 
polls the master, get the list of segment files that differ and begins to 
download them.  Sometime during the download, the master combines two or more 
of the files that the slave is going to download and when the slave attempts 
the download  it fails.  We're aware that a subsequent attempt usually yields 
success, but I'm curious as to whether there are any configuration settings 
that can help mitigate this circumstance.


RE: stemEnglishPossessive and contractions

2011-10-19 Thread Herman Kiefus
Thanks Robert, exactly what I was looking for.

-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Wednesday, October 19, 2011 1:15 PM
To: solr-user@lucene.apache.org
Subject: Re: stemEnglishPossessive and contractions

The word delimiter filter also does other things, it treats ' as punctuation by 
default. So it normally splits on ', except if its 's (in this case it removes 
the 's completely if you use this stemEnglishPossessive).

There are a couple approaches you can use:
1. you can keep worddelimiterfilter with this option on, but disabling 
splitting on ' by customize its type table. in this case specify 
types=mycustomtypes.txt, and in that file specify ' to be treated as ALPHANUM 
or similar. see
https://issues.apache.org/jira/browse/SOLR-2059 for some examples of this. i 
would only do this if you want worddelimiterfilter for other purposes, if you 
just want to remove possessives and don't need worddelimiterfilter's other 
features, look below.
2. you can instead use EnglishPossessiveFilterFactory, which only does this 
exact thing (remove 's) and nothing else.

On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus  wrote:
> We utilize a comprehensive dictionary of English words, place names, 
> surnames, male and female first names, ... you get the point.  As such, the 
> possessive plural forms of these words are recognized as 'misspelled'.
>
> I simply thought that 'turning on' this option for the WordDelimiterFactory 
> would address my concerns; however, I also got an unintended consequence: 
> Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be 
> affected.  Is this intended behavior?  When I read 'English possessive' I 
> hear 'apostrophe s' and not 'apostrophe anything'.  Is there something I'm 
> missing here?
>



--
lucidimagination.com


stemEnglishPossessive and contractions

2011-10-19 Thread Herman Kiefus
We utilize a comprehensive dictionary of English words, place names, surnames, 
male and female first names, ... you get the point.  As such, the possessive 
plural forms of these words are recognized as 'misspelled'.

I simply thought that 'turning on' this option for the WordDelimiterFactory 
would address my concerns; however, I also got an unintended consequence: 
Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be 
affected.  Is this intended behavior?  When I read 'English possessive' I hear 
'apostrophe s' and not 'apostrophe anything'.  Is there something I'm missing 
here?


Extreme QTime

2011-10-12 Thread Herman Kiefus
We service about 25K of each particular query type per hour per server.  QTime 
*averages* less than a second; however, there always a few (1-10) whose QTimes 
go way above (10 - 500 seconds) the average.  If I harvest these queries from 
the log an re-execute them they of course execute sub-second.  Why are some of 
these queries running long?

My first thought was perhaps these queries were occurring subsequent to 
replication commits, which happen every 10 minutes; however, there seem to be 
no clustering of these events around a 10 minute periodic cycle.  (Given that I 
have not established any appropriate warming queries, this seemed a logical 
conclusion).

My next though was to compare the times that these queries executed versus what 
I see in the log file (grep SEVERE...)  But I found nothing to correlate.

Do you folks have any ideas?


RE: MoreLikeThis assumptions

2011-09-02 Thread Herman Kiefus
It generally helps if your solrconfig is correct.  Thank you for your 
tolerance. 

-Original Message-
From: Herman Kiefus [mailto:herm...@angieslist.com] 
Sent: Thursday, September 01, 2011 10:15 AM
To: solr-user@lucene.apache.org
Subject: MoreLikeThis assumptions

Given a document id:n show me those other documents with similar values in the 
'Name' field:

http://devsolr03:8983/solr/primary/select?q=id:182652&fl=id,Name,score&mlt=true&mlt.fl=Name

My assumption is the above query will generate the desired outcome.  It does; 
however, given a different document (id) it does not.  Both id's identify a 
document whose name contains the term 'smith'.  Stated differently if A is like 
B, C, and D I would assume that B is like A, C, and D, but these are not the 
results that I'm seeing.

My objective is to simply seek out similar documents (based on several fields, 
I'm just using one here) for any given document; a simple 'duplicate checker' 
if you will.  Am I misguided in my assumptions?


RE: Getting MoreLikeThisHandler operational.

2011-09-01 Thread Herman Kiefus
Thank you very much.



Name


mlt



-Original Message-
From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] 
Sent: Thursday, September 01, 2011 11:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Getting MoreLikeThisHandler operational.

(11/09/01 23:24), Herman Kiefus wrote:
>  class="org.apache.solr.handler.component.MoreLikeThisComponent">
> 
>mlt
> 
> 
>
> but ends up returning a 500 error on a core reload.  What is an appropriate 
> configuration entry for the MLT handler?

Why you got 500 error because MLTComponent was set for requestHandler class.
Set class="solr.SearchHandler" for it.

koji
--
Check out "Query Log Visualizer" for Apache Solr 
http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
http://www.rondhuit.com/en/


Getting MoreLikeThisHandler operational.

2011-09-01 Thread Herman Kiefus
I've begun tinkering with MLT using the standard request handler.  The Wiki 
also suggests using the MoreLikeThis handler directly, but apparently, this is 
not in the default configuration (as I recall, I haven't removed anything from 
solrconfig.xml as shipped).  For example: 
http://devsolr03:8983/solr/primary/mlt?q=id:3197684&fl=id,Name,Score&mlt=true&mlt.fl=Name
 yields 'The requested resource is not available'.

I tried adding this to my solrconfig.xml:



   
  mlt



but ends up returning a 500 error on a core reload.  What is an appropriate 
configuration entry for the MLT handler?


MoreLikeThis assumptions

2011-09-01 Thread Herman Kiefus
Given a document id:n show me those other documents with similar values in the 
'Name' field:

http://devsolr03:8983/solr/primary/select?q=id:182652&fl=id,Name,score&mlt=true&mlt.fl=Name

My assumption is the above query will generate the desired outcome.  It does; 
however, given a different document (id) it does not.  Both id's identify a 
document whose name contains the term 'smith'.  Stated differently if A is like 
B, C, and D I would assume that B is like A, C, and D, but these are not the 
results that I'm seeing.

My objective is to simply seek out similar documents (based on several fields, 
I'm just using one here) for any given document; a simple 'duplicate checker' 
if you will.  Am I misguided in my assumptions?


RE: Text Analysis and copyField

2011-08-25 Thread Herman Kiefus
It had crossed my mind but for now we have a 'DictionarySource' field whose 
type utilizes the KeepWordFilterFactory that uses a text file containing all 
correctly spelled words (thanks to scrabble), location/last/first names 
(courtesy of the US census bureau) and a few other adds (month/day) names.  A 
file this large does not seem to have a material impact on indexing.

What we're seeing now (we also have a field 'TermsMisspelled' that utilizes the 
same text file with StopFilterFactory) is almost pure misspellings and some 
contractions (can't, won't, don't, etc.).

Thank you everyone for your help here, this is a truly fine community.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Wednesday, August 24, 2011 1:00 PM
To: solr-user@lucene.apache.org
Subject: Re: Text Analysis and copyField

Have you considered having two dictionaries and using ajax to query them both 
and intermingling the results in your suggestions? It'd be some work, but I 
think it might accomplish what you want.

Best
Erick

On Tue, Aug 23, 2011 at 1:48 PM, Herman Kiefus  wrote:
> To close, I found this article from Hoss: 
> http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-td
> 3122408.html
>
> Since I cannot use one copyField directive to copy from another copyField's 
> dest[ination], I cannot achieve what I desire: some terms that are subject to 
> KeepWordFilterFactory and some that are not.
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Monday, August 22, 2011 1:16 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Text Analysis and copyField
>
> I suspect that the things going into TermsDictionary are from fields other 
> than CorrectlySpelledTerms.
>
> In other words I don't think that anything is getting into TermsDictionary 
> from CorrectlySpelledTerms...
>
> Be careful to remove the index between schema changes, just to be sure that 
> you're not seeing old data.
>
> Best
> Erick
>
> On Mon, Aug 22, 2011 at 11:41 AM, Herman Kiefus  
> wrote:
>> That's what I thought, but my experiments show differently.  In actuality:
>>
>> I have a number of fields that are of type "text" (the default as it is 
>> packaged).
>>
>> I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in 
>> index-time analysis, using a file of terms which are known to be correctly 
>> spelled.
>>
>> I have a type 'textDictionary' that has no index-time analysis.
>>
>> I have the fields:
>> > indexed="false" stored="false" multiValued="true"/> > name="TermsDictionary" type="textDictionary" indexed="true"
>> stored="false" multiValued="true"/>
>>
>> I want 'TermsDictionary' to contain only those terms from some fields that 
>> are correctly spelled plus those terms from a couple other fields 
>> (CompanyName and ContactName) as is.  I use several copyField directives as 
>> follows:
>>
>>  > source="Field2" dest="CorrectlySpelledTerms"/> > source="Field3" dest="CorrectlySpelledTerms"/>
>>
>>  > source="Contact" dest="TermsDictionary"/> > ="CorrectlySpelledTerms" dest="TermsDictionary"/>
>>
>> If I query 'Field1' for a term that I know is misspelled (electical) it 
>> yields results.
>> If I query 'TermsDictionary' for the same term it yields no results.
>>
>> It would seem by these results that 'TermsDictionary' only contains those 
>> terms with misspellings stripped as a results of the text analysis on the 
>> field 'CorrectlySpelledTerms'.
>>
>> Asked another way, I think you can see what I'm getting at: a source for the 
>> spellchecker that only contains correct spelled terms plus proper names; 
>> should I have gone about this in a different way?
>>
>> -Original Message-
>> From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com]
>> Sent: Monday, August 22, 2011 9:30 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Text Analysis and copyField
>>
>> On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus  
>> wrote:
>>> Is my thinking correct?
>>>
>>> I have a field 'F1' of type 'T1' whose index time analysis employs the 
>>> StopFilterFactory.
>>>
>>> I also have a field 'F2' of type 'T2' whose index time analysis does NOT 
>>> employ the StopFilterFactory.
>>>
>>> There is a copyField directive source="F1" dest="F2"
>>>
>>> F2 will not contain any stop words because they were filtered out as F1 was 
>>> populated.
>>>
>>
>> No, F2 will contain stop words.  Copy fields does not process input through 
>> a chain, it sends the original content to each field and therefore analysis 
>> is totally independent.
>>
>> --
>> Stephen Duncan Jr
>> www.stephenduncanjr.com
>>
>


RE: Text Analysis and copyField

2011-08-23 Thread Herman Kiefus
To close, I found this article from Hoss: 
http://lucene.472066.n3.nabble.com/CopyField-into-another-CopyField-td3122408.html

Since I cannot use one copyField directive to copy from another copyField's 
dest[ination], I cannot achieve what I desire: some terms that are subject to 
KeepWordFilterFactory and some that are not.

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Monday, August 22, 2011 1:16 PM
To: solr-user@lucene.apache.org
Subject: Re: Text Analysis and copyField

I suspect that the things going into TermsDictionary are from fields other than 
CorrectlySpelledTerms.

In other words I don't think that anything is getting into TermsDictionary from 
CorrectlySpelledTerms...

Be careful to remove the index between schema changes, just to be sure that 
you're not seeing old data.

Best
Erick

On Mon, Aug 22, 2011 at 11:41 AM, Herman Kiefus  wrote:
> That's what I thought, but my experiments show differently.  In actuality:
>
> I have a number of fields that are of type "text" (the default as it is 
> packaged).
>
> I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in 
> index-time analysis, using a file of terms which are known to be correctly 
> spelled.
>
> I have a type 'textDictionary' that has no index-time analysis.
>
> I have the fields:
>  indexed="false" stored="false" multiValued="true"/>  name="TermsDictionary" type="textDictionary" indexed="true" 
> stored="false" multiValued="true"/>
>
> I want 'TermsDictionary' to contain only those terms from some fields that 
> are correctly spelled plus those terms from a couple other fields 
> (CompanyName and ContactName) as is.  I use several copyField directives as 
> follows:
>
>   source="Field2" dest="CorrectlySpelledTerms"/>  source="Field3" dest="CorrectlySpelledTerms"/>
>
>   source="Contact" dest="TermsDictionary"/>  ="CorrectlySpelledTerms" dest="TermsDictionary"/>
>
> If I query 'Field1' for a term that I know is misspelled (electical) it 
> yields results.
> If I query 'TermsDictionary' for the same term it yields no results.
>
> It would seem by these results that 'TermsDictionary' only contains those 
> terms with misspellings stripped as a results of the text analysis on the 
> field 'CorrectlySpelledTerms'.
>
> Asked another way, I think you can see what I'm getting at: a source for the 
> spellchecker that only contains correct spelled terms plus proper names; 
> should I have gone about this in a different way?
>
> -Original Message-
> From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com]
> Sent: Monday, August 22, 2011 9:30 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Text Analysis and copyField
>
> On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus  wrote:
>> Is my thinking correct?
>>
>> I have a field 'F1' of type 'T1' whose index time analysis employs the 
>> StopFilterFactory.
>>
>> I also have a field 'F2' of type 'T2' whose index time analysis does NOT 
>> employ the StopFilterFactory.
>>
>> There is a copyField directive source="F1" dest="F2"
>>
>> F2 will not contain any stop words because they were filtered out as F1 was 
>> populated.
>>
>
> No, F2 will contain stop words.  Copy fields does not process input through a 
> chain, it sends the original content to each field and therefore analysis is 
> totally independent.
>
> --
> Stephen Duncan Jr
> www.stephenduncanjr.com
>


RE: Spellcheck Phrases

2011-08-23 Thread Herman Kiefus
The angle that I am trying here is to create a dictionary from indexed terms 
that contain only correctly spelled words.  We are doing this by having the 
field from which the dictionary is created utilize a type that employs 
solr.KeepWordFilterFactory, which in turn utilizes a text file of known 
correctly spelled words (including their respective derivations example: lead, 
leads, leading, etc.).

This is working great for us with the exception being those fields in our 
schema that contain proper names.  I can't seem to get (unfiltered) terms from 
those fields along with (correctly spelled) terms from other fields into the 
single field upon which the dictionary is built.

-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Thursday, June 02, 2011 11:40 AM
To: solr-user@lucene.apache.org
Subject: RE: Spellcheck Phrases

Actually, someone just pointed out to me that a patch like this is unnecessary. 
 The code works as-is if configured like this:

.01  (correct)

instead of this:

.01 (incorrect)

I tested this and it seems to work.  I'm still am trying to figure out if using 
this parameter actually improves the quality of our spell suggestions, now that 
I know how to use it properly.

Sorry about the mis-information earlier.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Dyer, James
Sent: Wednesday, June 01, 2011 3:02 PM
To: solr-user@lucene.apache.org
Subject: RE: Spellcheck Phrases

Tanner,

I just entered SOLR-2571 to fix the float-parsing-bug that breaks 
"thresholdTokenFrequency".  Its just a 1-line code fix so I also included a 
patch that should cleanly apply to solr 3.1.  See 
https://issues.apache.org/jira/browse/SOLR-2571 for info and patches.

This parameter appears absent from the wiki.  And as it has always been broken 
for me, I haven't tested it.  However, my understanding it should be set as the 
minimum percentage of documents in which a term has to occur in order for it to 
appear in the spelling dictionary.  For instance in the config below, a term 
would have to occur in at least 1% of the documents for it to be part of the 
spelling dictionary.  This might be a good setting for long fields but for the 
short fields in my application, I was thinking of setting this to something 
like 1/1000 of 1% ...

  text
 
  spellchecker
  Spelling_Dictionary
  text
  ./spellchecker
  .01
 


James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Tanner Postert [mailto:tanner.post...@gmail.com]
Sent: Friday, May 27, 2011 6:04 PM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck Phrases

are there any updates on this? any third party apps that can make this work as 
expected?

On Wed, Feb 23, 2011 at 12:38 PM, Dyer, James wrote:

> Tanner,
>
> Currently Solr will only make suggestions for words that are not in 
> the dictionary, unless you specifiy "spellcheck.onlyMorePopular=true".  
> However, if you do that, then it will try to "improve" every word in 
> your query, even the ones that are spelled correctly (so while it 
> might change "brake" to "break" it might also change "leg" to "log".)
>
> You might be able to alleviate some of the pain by setting the 
> "thresholdTokenFrequency" so as to remove misspelled and rarely-used 
> words from your dictionary, although I personally haven't been able to 
> get this parameter to work.  It also doesn't seem to be documented on 
> the wiki but it is in the 1.4.1. source code, in class 
> IndexBasedSpellChecker.  Its also mentioned in Smiley&Pugh's book.  I 
> tried setting it like this, but got a ClassCastException on the float value:
>
>   
> text_spelling
>  
>  spellchecker
>  Spelling_Dictionary
>  text_spelling
>  true   name="thresholdTokenFrequency">.001
>  
> 
>
> I have it on my to-do list to look into this further but haven't yet.  
> If you decide to try it and can get it to work, please let me know how 
> you do it.
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
> -Original Message-
> From: Tanner Postert [mailto:tanner.post...@gmail.com]
> Sent: Wednesday, February 23, 2011 12:53 PM
> To: solr-user@lucene.apache.org
> Subject: Spellcheck Phrases
>
> right now when I search for 'brake a leg', solr returns valid results 
> with no indication of misspelling, which is understandable since all 
> of those terms are valid words and are probably found in a few pieces 
> of our content.
> My question is:
>
> is there any way for it to recognize that the phase should be "break a leg"
> and not "brake a leg" and suggest the proper phrase?
>


Spellcheck index replication

2011-08-23 Thread Herman Kiefus
We employ one 'indexing' master that replicates to many 'query' slaves.  We 
have also recently introduced spellchecking/DYM.  It appears that replication 
does not 'cover' the spellchecker index.  Do I understand this correctly?

Further, we have seen where 'buildOnCommit' will cause the spellcheck index to 
be [re]built on each slave; however, during the time that the spellcheck index 
is being rebuilt, spellcheck queries do not produce suggestions, which makes 
sense.

What suggestions do the community have regarding this issue and/or what is 
working well for you?


Dictionary of Correctly Spelled terms

2011-08-22 Thread Herman Kiefus
My objective is to end up with a field that can be used to build the spellcheck 
dictionary; however, that field will only contain correctly spelled terms other 
than those terms originating from two other 'proper name' fields.

I thought I had this working, but feedback from a separate thread seems to 
indicate otherwise.

My approach was to use copyField directives to move terms from those fields 
that I want to strip misspellings from to a field that uses the 
KeepWordFilterFactory with a file containing only correctly spelled words.  
Further, this field would be copied to the 'dictionary' field along with the 
two other 'proper name' fields.  The 'dictionary' field has no text analysis as 
my assumption was that it would be getting those terms from the source whose 
contents were already subject to the analysis tied to its type.

If this is not the case, how could someone go about creating such a dictionary 
field (other than going outside Solr)?


RE: Text Analysis and copyField

2011-08-22 Thread Herman Kiefus
That's what I thought, but my experiments show differently.  In actuality:

I have a number of fields that are of type "text" (the default as it is 
packaged).  

I have a type 'textCorrectlySpelled' that utilizes KeepWordFilterFactory in 
index-time analysis, using a file of terms which are known to be correctly 
spelled.

I have a type 'textDictionary' that has no index-time analysis.

I have the fields:



I want 'TermsDictionary' to contain only those terms from some fields that are 
correctly spelled plus those terms from a couple other fields (CompanyName and 
ContactName) as is.  I use several copyField directives as follows:









If I query 'Field1' for a term that I know is misspelled (electical) it yields 
results.
If I query 'TermsDictionary' for the same term it yields no results.

It would seem by these results that 'TermsDictionary' only contains those terms 
with misspellings stripped as a results of the text analysis on the field 
'CorrectlySpelledTerms'.

Asked another way, I think you can see what I'm getting at: a source for the 
spellchecker that only contains correct spelled terms plus proper names; should 
I have gone about this in a different way?

-Original Message-
From: Stephen Duncan Jr [mailto:stephen.dun...@gmail.com] 
Sent: Monday, August 22, 2011 9:30 AM
To: solr-user@lucene.apache.org
Subject: Re: Text Analysis and copyField

On Mon, Aug 22, 2011 at 9:25 AM, Herman Kiefus  wrote:
> Is my thinking correct?
>
> I have a field 'F1' of type 'T1' whose index time analysis employs the 
> StopFilterFactory.
>
> I also have a field 'F2' of type 'T2' whose index time analysis does NOT 
> employ the StopFilterFactory.
>
> There is a copyField directive source="F1" dest="F2"
>
> F2 will not contain any stop words because they were filtered out as F1 was 
> populated.
>

No, F2 will contain stop words.  Copy fields does not process input through a 
chain, it sends the original content to each field and therefore analysis is 
totally independent.

--
Stephen Duncan Jr
www.stephenduncanjr.com


Text Analysis and copyField

2011-08-22 Thread Herman Kiefus
Is my thinking correct?

I have a field 'F1' of type 'T1' whose index time analysis employs the 
StopFilterFactory.

I also have a field 'F2' of type 'T2' whose index time analysis does NOT employ 
the StopFilterFactory.

There is a copyField directive source="F1" dest="F2"

F2 will not contain any stop words because they were filtered out as F1 was 
populated.


RE: Solr spellcheck and multiple collations

2011-08-18 Thread Herman Kiefus
Nice catch, I was sending maxCollations with a capital M.

-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Wednesday, August 17, 2011 6:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr spellcheck and multiple collations

I quickly went through what you've got from your last 2 posts and do not see 
any problems.  You might want to double-check that your client is translating 
the constant variable you've got for "spellcheck.maxCollationTries" correctly 
in your query, or if you've got it in the request handler config that its 
spelled out right in there.

The other thing, obviously, is you'll only get 1 collation if there is only 1 
combination from the individual words it suggested that returns hits.  You may 
need to play with different test queries to find one that can generate more 
than 1 good collation.  Also if you set spellcheck.maxCollationTries down to 
zero it will return all the possibilities (up to the spellcheck.maxCollation 
value), even the nonsensical ones.  That might be helpful to do for testing.

Also, these params are in solr 3.x and higher.  So it won't work in 1.4 without 
the SOLR-2010 patch.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Herman Kiefus [mailto:herm...@angieslist.com]
Sent: Wednesday, August 17, 2011 4:55 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr spellcheck and multiple collations

Thanks James, here are the settings that only yield the one collation:

static int count = 10;
static bool onlyMorePopular = true;
static bool extendedResults = true;
static bool collate = true;
static int maxCollations = 10;
static int maxCollationTries = 100;
static int maxCollationEvaluations = 1; static bool collateExtendedResults 
= true; static float accuracy = 0.7f;

-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com]
Sent: Wednesday, August 17, 2011 5:48 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr spellcheck and multiple collations

Herman,

- Specify "spellcheck.maxCollations" with something higher than one to get more 
than 1 collation.  

- If you also want the spellchecker to test whether or not a particular 
collation will return hits, also specify "spellcheck.maxCollationTries"

- If you also want to know how many hits each collation will return, also 
specify "spellcheck.collateExtendedResults=true"

- See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxCollations 
for more information

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Herman Kiefus [mailto:herm...@angieslist.com]
Sent: Wednesday, August 17, 2011 4:31 PM
To: solr-user@lucene.apache.org
Subject: Solr spellcheck and multiple collations

After a bit of work, we have 'spellchecking' up and going and we are happy with 
the suggestions.  I have not; however, ever been able to generate more than one 
collation query.  Is there something simple that I have overlooked?


RE: Solr spellcheck and multiple collations

2011-08-17 Thread Herman Kiefus
Thanks James, here are the settings that only yield the one collation:

static int count = 10;
static bool onlyMorePopular = true;
static bool extendedResults = true;
static bool collate = true;
static int maxCollations = 10;
static int maxCollationTries = 100;
static int maxCollationEvaluations = 1;
static bool collateExtendedResults = true;
static float accuracy = 0.7f;

-Original Message-
From: Dyer, James [mailto:james.d...@ingrambook.com] 
Sent: Wednesday, August 17, 2011 5:48 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr spellcheck and multiple collations

Herman,

- Specify "spellcheck.maxCollations" with something higher than one to get more 
than 1 collation.  

- If you also want the spellchecker to test whether or not a particular 
collation will return hits, also specify "spellcheck.maxCollationTries"

- If you also want to know how many hits each collation will return, also 
specify "spellcheck.collateExtendedResults=true"

- See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxCollations 
for more information

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-
From: Herman Kiefus [mailto:herm...@angieslist.com] 
Sent: Wednesday, August 17, 2011 4:31 PM
To: solr-user@lucene.apache.org
Subject: Solr spellcheck and multiple collations

After a bit of work, we have 'spellchecking' up and going and we are happy with 
the suggestions.  I have not; however, ever been able to generate more than one 
collation query.  Is there something simple that I have overlooked?


RE: Solr spellcheck and multiple collations

2011-08-17 Thread Herman Kiefus
If you only get one, best, collation then there is no point to my question; 
however, since you asked...

The relevant sections:

Solrconfig.xml -



textDictionary


default
solr.IndexBasedSpellChecker
TermsDictionary
./spellchecker
0.0
score


Schema.xml -



















































-Original Message-
From: Alexei Martchenko [mailto:ale...@superdownloads.com.br] 
Sent: Wednesday, August 17, 2011 5:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr spellcheck and multiple collations

Can u show us how is your schema and config?

I believe that's how collation is: the best match, only one.

2011/8/17 Herman Kiefus 

> After a bit of work, we have 'spellchecking' up and going and we are 
> happy with the suggestions.  I have not; however, ever been able to 
> generate more than one collation query.  Is there something simple that I 
> have overlooked?
>



-- 

*Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | 
ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Solr spellcheck and multiple collations

2011-08-17 Thread Herman Kiefus
After a bit of work, we have 'spellchecking' up and going and we are happy with 
the suggestions.  I have not; however, ever been able to generate more than one 
collation query.  Is there something simple that I have overlooked?


RE: solr keeps dying every few hours.

2011-08-17 Thread Herman Kiefus
While I can't be as specific as other here will be, we encountered the 
same/similar problem.  We simply loaded up our servers with 48GB and life is 
good.  I too would like to be a bit more proactive on the provisioning front 
and hopefully someone will come along and help us out.

FWIW and I'm sure someone will correct me, but it seems as if the Java GC 
cannot keep up with cache allocation; in our case everything was fine until the 
nth query and then the box would go TU.  But leave it to Solr, it would simply 
'restart' and start serving queries again.

-Original Message-
From: Jason Toy [mailto:jason...@gmail.com] 
Sent: Wednesday, August 17, 2011 5:15 PM
To: solr-user@lucene.apache.org
Subject: solr keeps dying every few hours.

I have a large ec2 instance(7.5 gb ram), it dies every few hours with out of 
heap memory issues.  I started upping the min memory required, currently I use 
-Xms3072M .
I insert about 50k docs an hour and I currently have about 65 million docs with 
about 10 fields each. Is this already too much data for one box? How do I know 
when I've reached the limit of this server? I have no idea how to keep control 
of this issue.  Am I just supposed to keep upping the min ram used for solr? 
How do I know what the accurate amount of ram I should be using is? Must I keep 
adding more memory as the index size grows, I'd rather the query be a little 
slower if I can use constant memory and have the search read from disk.


RE: 'Stable' 4.0 version

2011-08-17 Thread Herman Kiefus
I should say I'm running: Solr Specification Version: 4.0.0.2010.12.10.08.54.56 
and by the looks of the version number I'm running something from Dec 12 of 
last year.

Tomas: geofilt and geodist() are supported in 3.3?  Along with the location and 
point type?  Quite frankly, 1.3/1.4, 3.3, 4.0 all confuse me.  I just had our 
operations personnel install versions until I got the needed functionality.

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] 
Sent: Wednesday, August 17, 2011 5:12 PM
To: solr-user@lucene.apache.org
Subject: Re: 'Stable' 4.0 version

As far as I know, Solr's trunk is pretty stable, so you shoundl't have many 
problems with it if you test it correctly. Lucid's search platform is built 
upon the trunk ( 
http://www.lucidimagination.com/products/lucidworks-search-platform/enterprise
).
The one thing I would be concerned is with the index format. It might change in 
an incompatible way from one revision to the next one, so if rebuilding your 
index is complicated or takes too long this can be a problem.

If your version election is based on the geospatial stuff, why don't you use 
Solr 3.3 release? It already contains those features.

Tomás

On Wed, Aug 17, 2011 at 4:58 PM, Jaeger, Jay - DOT wrote:

> > geospatial requirements
>
> Looking at your email address, no surprise there.  8^)
>
> > What insight can you share (if any) regarding moving forward to a 
> > later
> nightly build?
>
> I used build 1271 (Solr 1.4.1, which seemed to be called Solr 4 at the
> time) during some testing, and it performed well -- but we were not 
> doing geospatial indexing with Solr.  Or are you referring to the 
> successor to Solr 3.3 at some future point in time (which I supposed 
> might also be called Solr 4 in the future -- won't that be confusing!)
>
> -Original Message-
> From: Herman Kiefus [mailto:herm...@angieslist.com]
> Sent: Wednesday, August 17, 2011 2:55 PM
> To: solr-user@lucene.apache.org
> Subject: 'Stable' 4.0 version
>
> My origination uses Solr 4 because of our geospatial requirements.  
> What insight can you share (if any) regarding moving forward to a 
> later nightly build?  Or, for those of you using 4.0 in a Production 
> setting, when is it that you move ahead?
>


'Stable' 4.0 version

2011-08-17 Thread Herman Kiefus
My origination uses Solr 4 because of our geospatial requirements.  What 
insight can you share (if any) regarding moving forward to a later nightly 
build?  Or, for those of you using 4.0 in a Production setting, when is it that 
you move ahead?