RE: Index database with SolrJ using xml file directly throws an error

2019-03-01 Thread Dyer, James
Instead of dataConfig=data-config.xml, use config=data-config.xml .

From: sami 
Sent: Friday, March 1, 2019 3:05 AM
To: solr-user@lucene.apache.org
Subject: RE: Index database with SolrJ using xml file directly throws an error

Hi James,

Thanks for your reply. I am not absolotuely sure I understood everything
correctly here. I would like to index my database to start with fresh index.
I have already done it with DIH execute function.

>

It works absolutely fine. But, I want to use SolrJ API instead of using the
inbuilt execute function. The data-config.xml and solrconfig.xml works fine
with my database.

I am using the same data-config.xml file and solrconfig.xml file to do the
indexing with program mentioned in my query.

String url = "http://localhost:8983/solr/test;;
HttpSolrClient server = new HttpSolrClient.Builder(url).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "full-import");
params.set("clean", "true");
params.set("commit", "true");
params.set("optimize", "true");
params.set("dataConfig","data-config.xml"); *I tried this too. as you
suggested not to use full path. *
server.query(params);

I checked the xml file for any bogus characters too. BUT the same files work
fine with inbuilt DIH not with the code. What it could be?



--
Sent from: 
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Index database with SolrJ using xml file directly throws an error

2019-02-28 Thread Dyer, James
The parameter "dataConfig" should hold an actual xml document to override the 
data-config.xml file you store in zookeeper (cloud) or the configuration 
directory (standalone).  Typically you do not use this parameter.  Instead, 
specify the "config" parameter with the filename (eg. data-config.xml).  This 
file is the DIH configuration, not solrconfig.xml as you are using.  It is just 
the filename, or path starting at the base configuration directory, not a full 
path as you are using.  Unless you want users to override the DIH configuration 
at request time, it is best to specify the filename using the "config" 
parameter in the request handler's invariant section in solrconfig.xml.

From: sami 
Sent: Thursday, February 28, 2019 8:36 AM
To: solr-user@lucene.apache.org
Subject: Index database with SolrJ using xml file directly throws an error

I would like to index my database using SolrJ Java API. I have already tried
to use DIH directly from the Solr server. It works and indexes well. But
when I would like to use the same XML config file with SolrJ it throws an
error.

**Solr version 7.6.0 SolrJ 7.6.0**

Here is the full code I am using:

String url = "http://localhost:8983/solr/test;;
String dataConfig =
"D:/solr-7.6.0/server/solr/test/conf/solrconfig.xml";
HttpSolrClient server = new HttpSolrClient.Builder(url).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "full-import");
params.set("clean", "true");
params.set("commit", "true");
params.set("optimize", "true");
params.set("dataConfig",dataConfig);
server.query(params);

But using this piece of code throws an error.

Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/test: Data Config problem: Content
is not allowed in Prolog.

Am I doing it right? Reference:
https://stackoverflow.com/questions/31446644/how-to-do-solr-dataimport-i-e-from-rdbms-using-java-api/54905578#54905578

Is there any other way to index directly.



--
Sent from: 
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: [External] Setting Spellcheck for solr only for zero result

2018-09-26 Thread Dyer, James
Neel,

I do not think there is a way to entirely bypass spellchecking if there are 
results returned, and I'm not so sure performance would noticeably improve if 
it did this.  Clients can easily check to see if results were returned and can 
ignore the spellcheck response in these cases, if desired.

The one exception to this is if you are using "spellcheck.collate=true" with  
"spellcheck.maxCollationTries" set to a value > 0.  In this case, if your main 
query uses "o.op=OR" or a low "mm" value, you might want to force it to only 
return collations with all matching words.  In this case you would use 
something like "spellcheck.collateParam.mm=100%" to be sure it only returned 
re-written queries for which all the words matched.

The "spellcheck.maxResultsForSuggest" parameter is designed to be used in 
conjunction with "spellcheck.alternativeTermCount" to produce 
did-you-mean-style suggestions when a query returns only a few hits and at 
least some of the terms were in the index (but may be misspelled nevertheless).

James Dyer
Ingram Content Group

-Original Message-
From: neel choudhury [mailto:findneel2...@gmail.com] 
Sent: Sunday, September 23, 2018 2:58 PM
To: solr-user@lucene.apache.org
Subject: [External] Setting Spellcheck for solr only for zero result

I am looking for setting up spellcheck for solr correctly. For performance
reason (and avoiding confusion) I don't want to give any suggestion for any
query which returns at least one result. Solr provides a parameter
spellcheck.maxResultsForSuggest. For my use case i need to set is as 0 as I
only want suggestions when no result is returned. However looking into the
code of SpellCheckComponent in Solr i saw that for 0 value for
spellcheck.maxResultsForSuggest is ignored because of greater than sign. Is
there a way i can suppress spell suggestion even if 1 result is returned.

private Integer maxResultsForSuggest(ResponseBuilder rb) {
SolrParams params = rb.req.getParams();
float maxResultsForSuggestParamValue =
params.getFloat(SpellingParams.SPELLCHECK_MAX_RESULTS_FOR_SUGGEST,
0.0f);
Integer maxResultsForSuggest = null;

if (maxResultsForSuggestParamValue > 0.0f) {
...}

return maxResultsForSuggest
 }


RE: [External] [Solr 7.1.0] spellcheck.maxCollationTries > 0 no results

2018-08-09 Thread Dyer, James
It doesn't appear to me that the collator works with "spellcheck.q".  Looking 
at the unit test (SpellCheckCollatorTest.java), this is not a use-case that is 
being tested.  I opened https://issues.apache.org/jira/browse/SOLR-12650 to 
track this bug.

As a workaround, you can remove "spellcheck.q" and it might work.  You also 
probably want smaller values for spellcheck.count and 
spellcheck.maxCollationTries, maybe 10-20 for these.

James Dyer
Ingram Content Group

From: agorriz [mailto:agor...@tacitknowledge.com]
Sent: Wednesday, August 08, 2018 8:38 AM
To: solr-user@lucene.apache.org
Subject: [External] [Solr 7.1.0] spellcheck.maxCollationTries > 0 no results

I have a problem with solr suggested terms, when I search for a miss spelled
phrase or word, for example "halogan balbs" (0 results found) I want a
suggestion which will lead to results (eg "halogen bulbs").

I'm able to get a suggested phrase enabling spellcheck.collation and
spellcheck.maxCollationTries = 0, but unfortunately the suggested phrase
does not always generate results (eg. searching for "fence panel" (1 result)
suggests "face paper" (0 results)

According to documentation, in order to bypass the problem of 0 results on
the collated query I can configure spellcheck.maxCollationTries > 0, but by
doing so I noticed that the returned collation is always empty, even when
the single suggested words collated would generate results.

My question is, why is that happening and how can I avoid it?

Following an example of query for "halogen balbs" that does not work as I'm
expecting:

http://localhost:8983/solr/master_Product_default/select?fq=(catalogId:%22ProductCatalog%22%20AND%20catalogVersion:%22Online%22)=((code_string:halogan^100.0))%20OR%20((code_string:balbs^100.0))%20OR%20((code_string:%22halogan%20balbs%22~10.0^100.0)%20OR%20(brand.search_text_mv:%22halogan%20balbs%22~10.0^300.0)%20OR%20(categoryName_text_en_mv:%22halogan%20balbs%22~10.0^700.0)%20OR%20(type.search_text_mv:%22halogan%20balbs%22~10.0^800.0)%20OR%20(name_text_en:%22halogan%20balbs%22~10.0^500.0))=20=default=halogan%20balbs=true=true=true=true=100=500

that query returns the following:

"spellcheck":{
"suggestions":[
"halogan",{
"numFound":1,
"startOffset":0,
"endOffset":7,
"origFreq":0,
"suggestion":[{
"word":"halogen",
"freq":84}]},
"balb",{
"numFound":1,
"startOffset":8,
"endOffset":13,
"origFreq":0,
"suggestion":[{
"word":"bulb",
"freq":198}]}],
"correctlySpelled":false,
"collations":[]}}

Note that halogen and bulb is returned as single suggestion but collations
is empty, whilst if I run the query with "spellcheck.maxCollationTries=0"
then I get "halogen bulb" as suggested collation query:

"spellcheck":{
"suggestions":[
"halogan",{
"numFound":1,
"startOffset":0,
"endOffset":7,
"origFreq":0,
"suggestion":[{
"word":"halogen",
"freq":84}]},
"balb",{
"numFound":1,
"startOffset":8,
"endOffset":13,
"origFreq":0,
"suggestion":[{
"word":"bulb",
"freq":198}]}],
"correctlySpelled":false,
"collations":[
"collation",{
"collationQuery":"halogen bulb",
"hits":0,
"misspellingsAndCorrections":[
"halogan","halogen",
"balb","bulb"]}]}}

I would expect this behaviour to happen if searching for "halogen bulb"
returns 0 results, but in this particular case the search returns results:

http://localhost:8983/solr/master_Product_default/select?fq=(catalogId:%22ProductCatalog%22%20AND%20catalogVersion:%22Online%22)=((code_string:halogen^100.0))%20OR%20((code_string:bulb^100.0))%20OR%20((code_string:%22halogen%20bulb%22~10.0^100.0)%20OR%20(brand.search_text_mv:%22halogen%20bulb%22~10.0^300.0)%20OR%20(categoryName_text_en_mv:%22halogen%20bulb%22~10.0^700.0)%20OR%20(type.search_text_mv:%22halogen%20bulb%22~10.0^800.0)%20OR%20(name_text_en:%22halogen%20bulb%22~10.0^500.0))=20=default=halogen%20bulb=true=true=true=true=100=500

returns:

"response":{"numFound":42,"start":0,"docs":[
{...}





--
Sent from: 
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Error configuring Spell Checker

2018-04-17 Thread Dyer, James
(moving to solr-user@lucene.apache.org)

Gene,

I can reproduce your problem if I misspell the "spellcheck.dictionary" 
parameter in my query.  But I see your query has "direct" which matches the 
"name" element of one of your spellcheckers.  I think the actual problem in 
your case might be that you have separate  sections in your 
configuration.  This might be causing only the last Search Component named 
"spellcheck" to be active.  I believe you need to have just one:


 
  direct
  ...
 
 
  index
  ...
 
 
  wordbreak
  ...
 
 


James Dyer
Ingram Content Group

From: genel [mailto:g...@tekdata.com] 
Sent: Monday, April 16, 2018 12:25 PM
To: java-u...@lucene.apache.org
Subject: Error configuring Spell Checker
Importance: Low

We've been using SOLR for quite awhile. I'm attempting to install spell
checking.

I think I have the basic configuration correct, because the wordbreak
component seems to work, but none of the others do. 

I consistently get an NPE error 


. java.lang.NullPointerException at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:147)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073) at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) at
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Unknown Source) 

Relevant part of solrconfig:





direct
text_spell
solr.DirectSolrSpellChecker
internal
0.5
2
2
5
4
0.01
.01




index
solr.IndexBasedSpellChecker
./spellchecker
text_spell
true





wordbreak
solr.WordBreakSolrSpellChecker
text_spell
true
true
10

 


explicit 
20 
text

direct

on
true
10
5
5
true
true
10
5


spellcheck



Relevant part of schema:


























url:

http://localhost:8983/solr/gene/spell?spellcheck.q=rainb=true

I've tried just about everything I can think of, what am I missing?



--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation

2017-01-17 Thread Dyer, James
This sounds a lot like SOLR-4489.  However it looks like this was fixed prior 
to you version (4.5).  So it could be you found another case where this bug 
still exists.

The other thing is the default Query Converter cannot handle all cases, and it 
could be the query you are sending is beyond its abilities?  Even in this case, 
it'd be nice if it failed more gracefully than this.

Could you provide the query parameters you are sending and also how you have 
spellcheck configured?

James Dyer
Ingram Content Group


-Original Message-
From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Sent: Thursday, January 05, 2017 8:22 AM
To: 'solr-user@lucene.apache.org' 
Subject: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation

I am seeing many exceptions like this in my Solr [5.4.1] log:
null:java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
at java.lang.StringBuilder.replace(StringBuilder.java:262)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:236)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:93)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:238)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:203)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
...
at java.lang.Thread.run(Thread.java:745)

What am I potentially facing here?

Thx
Clemens


RE: Can't get spelling suggestions to work properly

2017-01-17 Thread Dyer, James
Jimi,

Generally speaking, spellcheck does not work well against fields with stemming, 
or other "heavy" analysis.  I would  to a field that is tokenized 
on whitespace with little else, and use that field for spellcheck.

By default, the spellchecker does not suggest for words in the index.  So if 
the user misspells a word but the misspelling is actually some other word that 
is indexed, it will never suggest.  You can orverride this behavior by 
specifying  "spellcheck.alternativeTermCount" with a value >0.  This is how 
many suggestions it should give for words that indeed exist in the index.  This 
can be the same value as "spellcheck.count", but you may wish to set it to a 
lower value.

I do not recommend using "spellcheck.onlyMorePopular".  It is similar to 
"spellcheck.alternativeTermCount", but in my opinion, the later gives a better 
experience.

You might also wish to set "spellcheck.maxResultsForSuggest".  If you set this, 
then the spellchecker will not suggest anything if more results are returned 
than the value you specify.  This is helpful in providing "did you mean"-style 
suggestions for queries that return few results.

If you would like to ensure the suggestions combine nicely into a re-written 
query that returns results, then specify both "spellcheck.collate=true" and 
"spellcheck.maxCollationTries" to a value >0 (possibly 5-10).  This will cause 
it to internally check the re-written queries (aka. Collations) and report back 
on how many results you get for each.  If you are using "q.op=OR" or a low 
value for "mm", then you will likely want to override this with something like 
"spellcheck.collateParam.mm=0".  Otherwise every combination will get reported 
as returning results.

I hope this and other comments you've gotten helps demystify spellcheck 
configuration.  I do agree it is fairly complicated and frustrating to get it 
just right.

James Dyer
Ingram Content Group

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:16 AM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

I just noticed why setting maxResultsForSuggest to a high value was not a good 
thing. Because now it show spelling suggestions even on correctly spelled words.

I think, what I would need is the logic of SuggestMode. 
SUGGEST_WHEN_NOT_IN_INDEX, but with a configurable limit instead of it being 
hard coded to 0. Ie just as maxQueryFrequency works.

/Jimi

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:56 PM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

Hi Alessandro,

Thanks for your explanation. It helped a lot. Although setting 
"spellcheck.maxResultsForSuggest" to a value higher than zero was not enough. I 
also had to set "spellcheck.alternativeTermCount". With that done, I now get 
suggestions when searching for 'mycet' (a misspelling of the Swedish word 
'mycket', that didn't return suggestions before).

Although, I'm still not able to fully understand how to configure this 
properly. Because with this change there now are other misspelled searches that 
now longer gives suggestions. The problem here is stemming, I suspect. Because 
the main search fields use stemming, so that in some cases one can get lots of 
results for spellings that doesn't exist in the index at all (or, at least not 
in the spelling-field). How can I configure this component so that those 
suggestions are still included? Do I need to set maxResultsForSuggest to a 
really high number? Like Integer.MAX_VALUE? I feel that such a setting would 
defeat the purpose of that parameter, in a way. But I'm not sure how else to 
solve this.

Also, there is one other things I wonder about the spelling suggestions, that 
you might have the answer to. Is there a way to make the logic case 
insensitive, but the presentation case sensitive? For example, a search for 
'georg washington' now would return 'george washington' as a suggestion, but ' 
Georg Washington' would be even better.

Regards
/Jimi


-Original Message-
From: alessandro.benedetti [mailto:abenede...@apache.org] 
Sent: Thursday, January 12, 2017 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Can't get spelling suggestions to work properly

Hi Jimi,
taking a look to the *maxQueryFrequency*  param :

Your understanding is correct.

1) we don't provide misspelled suggestions if we set the param to 1, and we 
have a minimum of 1 doc freq for the term .

2) we don't provide misspelled suggestions if the doc frequency of the term is 
greater than the max limit set.

Let us explore the code :

if (suggestMode==SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX && docfreq > 0) {
  return new SuggestWord[0];
}
/// If we are working in "Not in Index Mode" , with a document frequency >0 we 
get 

RE: CachedSqlEntityProcessor with delta-import

2016-10-21 Thread Dyer, James
Sowmya,

My memory is that the cache feature does not work with Delta Imports.  In fact, 
I believe that nearly all DIH features except straight JDBC imports do not work 
with Delta Imports.  My advice is to not use the Delta Import feature at all as 
the same result can (often more-efficiently) be accomplished following the 
approach outlined here: 
https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

James Dyer
Ingram Content Group

-Original Message-
From: Mohan, Sowmya [mailto:sowmya.mo...@icf.com] 
Sent: Tuesday, October 18, 2016 10:07 AM
To: solr-user@lucene.apache.org
Subject: CachedSqlEntityProcessor with delta-import

Good morning,

Can CachedSqlEntityProcessor be used with delta-import? In my setup when 
running a delta-import with CachedSqlEntityProcessor, the child entity values 
are not correctly updated for the parent record. I am on Solr 4.3. Has anyone 
experienced this and if so how to resolve it?

Thanks,
Sowmya.



RE: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE

2016-07-29 Thread Dyer, James
You need to set the "spellcheck.maxCollationTries" parameter to a value greater 
than zero.  The higher the value, the more queries it checks for hits, and the 
longer it could potentially take.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-Thespellcheck.maxCollationTriesParameter

James Dyer
Ingram Content Group

-Original Message-
From: SRINI SOLR [mailto:srini.s...@gmail.com] 
Sent: Friday, July 22, 2016 12:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE

Hi all - please help me here

On Thursday, July 21, 2016, SRINI SOLR  wrote:
> Hi All -
> Could you please help me on spell check on multi-word phrase as a whole...
> Scenario -
> I have a problem with solr spellcheck suggestions for multi word phrases.
With the query for 'red chillies'
>
>
q=red+chillies=xml=true=true=true=true
>
> I get
>
> 
> 
> 2
> 4
> 12
> 0
> 
> chiller4
> challis2
> 
> 
> false
> red chiller
> 
>
> The problem is, even though 'chiller' has 4 results in index, 'red
chiller' has none. So we end up suggesting a phrase with 0 result.
>
> What can I do to make spellcheck work on the whole phrase only?
>
> Please help me here ...


RE: using spell check on phrases

2016-06-10 Thread Dyer, James
Kaveh,

If your query has "mm" set to zero or a low value, then you may want to 
override this when the spellchecker checks possible collations.  For example:

spellcheck.collateParam.mm=100%

You may also want to consider adding "spellcheck.maxResultsForSuggest" to your 
query, so that it will return spelling suggestions even when the query returns 
some results.  Also if you set "spellcheck.alternativeTermCount", then it will 
try to correct all of the query keywords, including those that exist in the 
dictionary.

See https://cwiki.apache.org/confluence/display/solr/Spell+Checking for more 
information.

James Dyer
Ingram Content Group

-Original Message-
From: kaveh minooie [mailto:ka...@plutoz.com] 
Sent: Monday, June 06, 2016 8:19 PM
To: solr-user@lucene.apache.org
Subject: using spell check on phrases

Hi everyone

I am using solr 6 and DirectSolrSpellChecker, and edismax parser. the 
problem that I am having is that when the query is a phrase, every 
single word in the phrase need to be misspelled for the spell checker to 
gets activated and gives suggestions. if only one of the word is 
misspelled then it just says that spelling is correct:
true

I was wondering if anyone has encountered this situation before and 
knows how to solve it?

thanks,

-- 
Kaveh Minooie



RE: How get around solr's spellcheck maxEdit limit of 2?

2016-01-22 Thread Dyer, James
See the old docs at 
https://wiki.apache.org/solr/SpellCheckComponent#Configuration

In particular, you need this line in solrconfig.xml:

./spellchecker


James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Friday, January 22, 2016 11:20 AM
To: solr-user@lucene.apache.org
Subject: Re: How get around solr's spellcheck maxEdit limit of 2?

Ok, But IndexBasedSpellChecker needs a directory where all indexes are
stored to do spell check. I don't have any idea about
IndexBasedSpellChecker. If you send me snap configuration of that. It will
help me.. Thanks

On Fri, Jan 22, 2016 at 1:45 AM Dyer, James <james.d...@ingramcontent.com>
wrote:

> But if you really need more than 2 edits, I think IndexBasedSpellChecker
> supports it.
>
> James Dyer
> Ingram Content Group
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, January 21, 2016 11:29 AM
> To: solr-user
> Subject: Re: How get around solr's spellcheck maxEdit limit of 2?
>
> bq: ...is anyway to increase that maxEdit
>
> IIUC, increasing maxEdit beyond 2 increases the space/time required
> unacceptably, that limit is there on purpose, put there by people who
> know their stuff.
>
> Best,
> Erick
>
> On Thu, Jan 21, 2016 at 12:39 AM, Nitin Solanki <nitinml...@gmail.com>
> wrote:
> > I am using Solr for spell Correction. Solr is limited to maxEdit of 2.
> Does
> > there is anyway to increase that maxEdit without using phonetic mapping ?
> > Please any suggestions
>
>


RE: How get around solr's spellcheck maxEdit limit of 2?

2016-01-21 Thread Dyer, James
But if you really need more than 2 edits, I think IndexBasedSpellChecker 
supports it.

James Dyer
Ingram Content Group

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, January 21, 2016 11:29 AM
To: solr-user
Subject: Re: How get around solr's spellcheck maxEdit limit of 2?

bq: ...is anyway to increase that maxEdit

IIUC, increasing maxEdit beyond 2 increases the space/time required
unacceptably, that limit is there on purpose, put there by people who
know their stuff.

Best,
Erick

On Thu, Jan 21, 2016 at 12:39 AM, Nitin Solanki  wrote:
> I am using Solr for spell Correction. Solr is limited to maxEdit of 2. Does
> there is anyway to increase that maxEdit without using phonetic mapping ?
> Please any suggestions



RE: Spellcheck response format differs between a single core and SolrCloud

2016-01-11 Thread Dyer, James
Ryan,

The json response format changed for Solr 5.0.  See 
https://issues.apache.org/jira/browse/SOLR-3029 .  Is the single-core solr 
running a 4.x version with the cloud solr running 5.x ?  If they are both on 
the same major version, then we have a bug.

James Dyer
Ingram Content Group


-Original Message-
From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] 
Sent: Monday, January 11, 2016 12:32 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck response format differs between a single core and SolrCloud

Hello,

I am using the spellcheck component for spelling suggestions and I've used
the same configurations in two separate projects, the only difference is
one project uses a single core and the other is a collection on SolrCloud
with three shards. The single core has about 56K docs and the one on
SolrCloud has 1M docs. Strangely, the format of the response is slightly
different between the two and I'm not sure why (particularly the collations
part). Was wondering if any can shed some light on this? Below is my
configuration and the results I'm getting.

This is in my "/select" searchHandler:


on
false
5
2
5
true
true
5
3

And my spellcheck component:



  
  
default
spelling
solr.DirectSolrSpellChecker
internal
0.5
2
1
5
4
0.01
  


Examples of each output can be found here:
https://gist.github.com/ryac/ceff8da00ec9f5b84106

Thanks,
Ryan


RE: DIH Caching w/ BerkleyBackedCache

2015-12-16 Thread Dyer, James
Todd,

I have no idea if this will perform acceptable with so many multiple values.  I 
doubt the solr/patch code was really optimized for such a use case.  In my 
production environment, I have je-6.2.31.jar on the classpath.  I don't think 
I've tried it with other versions.

James Dyer
Ingram Content Group

-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Wednesday, December 16, 2015 10:21 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH Caching w/ BerkleyBackedCache

James,

I apologize for the late response.


Dyer, James-2 wrote
> With the DIH request, are you specifying "cacheDeletePriorData=false"

We are not specifying that property (it looks like it defaults to "false").
I'm actually seeing this issue when running a full clean/import.

It appears that the Berkeley DB "cleaner" is always removing the oldest file
once there are three. In this case, I'll see two 1GB files and then as the
third file is being written (after ~200MB) the oldest 1GB file will fall off
(i.e. get deleted). I'm only utilizing ~13% disk space at the time. I'm
using Berkeley DB version 4.1.6 with Solr 4.8.1. I'm not specifying any
other configuration properties other than what I mentioned before. I simply
cannot figure out what is going on with the "cleaner" logic that would deem
that file "lowest utilized". Any other Berkeley DB/system configuration I
could consider that would affect this?

It's possible that this caching simply might not be suitable for our data
set where one document might contain a field with tens of thousands of
values... maybe this is the bottleneck with using this database as every add
copies in the prior data and then the "cleaner" removes the old stuff. Maybe
it's working like it should but just incredibly slow... I can get a full
index without caching in about two hours, however, when using this caching
it was still running after 24 hours (still caching the sub-entity).

Thanks again for the reply.

Respectfully,
Todd



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4245777.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Data Import Handler - Multivalued fields - splitBy

2015-12-04 Thread Dyer, James
Brian,

Be sure to have...

transformer="RegexTransformer"

...in your  tag.  It’s the RegexTransformer class that looks for 
"splitBy".

See https://wiki.apache.org/solr/DataImportHandler#RegexTransformer for more 
information.

James Dyer
Ingram Content Group


-Original Message-
From: Brian Narsi [mailto:bnars...@gmail.com] 
Sent: Friday, December 04, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Data Import Handler - Multivalued fields - splitBy

I have the following:





I believe I had the following working (splitting on pipe delimited)



But it does not work now.



In-fact now I have even tried



But I cannot get the values to split into an array.

Any thoughts/suggestions what may be wrong?

Thanks,


RE: Spellcheck error

2015-12-03 Thread Dyer, James
Matt,

Can you give some information about how your spellcheck field is analyzed and 
also if you're using a custom query converter.  Also, try and place the bare 
terms you want checked in spellcheck.q (ex, if your query is q=+movie +theatre, 
then spellcheck.q=movie theatre).  Does it work in this case?  Also, could you 
give the exact query you're using?

This is the very same bug as in the 3 tickets you mention.  We clearly haven't 
solved all of the possible ways this bug can be triggered.  But we cannot fix 
this unless we can come up with a unit test that reliably reproduces it.  At 
the very least, we should handle these problems better than throwing SIOOB like 
this.

Long term, there is probably a better design we could come up with for how 
terms are identified within queries and how collations are generated.

James Dyer
Ingram Content Group


-Original Message-
From: Matt Pearce [mailto:m...@flax.co.uk] 
Sent: Thursday, December 03, 2015 10:40 AM
To: solr-user
Subject: Spellcheck error

Hi,

We're using Solr 5.3.1, and we're getting a 
StringIndexOutOfBoundsException from the SpellCheckCollator. I've done 
some investigation, and it looks like the problem is that the corrected 
string is shorter than the original query.

For example, the search term is "theatre", the suggested correction is 
"there". The error is being thrown when replacing the original query 
with the shorter replacement.

This is the stack trace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
 at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
 at java.lang.StringBuilder.replace(StringBuilder.java:262)
 at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
 at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
 at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:237)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:202)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)

The error looks very similar to those described in 
https://issues.apache.org/jira/browse/SOLR-4489, 
https://issues.apache.org/jira/browse/SOLR-3608 and 
https://issues.apache.org/jira/browse/SOLR-2509, most of which are closed.

Any suggestions would be appreciated, or should I open a JIRA ticket?

Thanks,

Matt

-- 
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk



RE: DIH Caching w/ BerkleyBackedCache

2015-11-20 Thread Dyer, James
Todd,

With the DIH request, are you specifying "cacheDeletePriorData=false".  Looking 
at the BerkleyBackedCache code if this is set to true, it deletes the cache and 
assumes the current update is to fully repopulate it.  If you want to do an 
incremental update to the cache, it needs to be false.  You might also need to 
specify "clean=false", but I'm not sure if this is a requirement.

I've used DIH with BerkleyBackedCache for a few years and it works well for us. 
 But rather than using it inline, we have a number of DIH handlers that just 
build caches, then when they're all built, a final DIH joins data from the 
caches and indexes it to solr.  We also do like you are, with several handlers 
running at once, each doing part of the data.

But I have to warn you this code hasn't been maintained by anyone.  I'm using 
an older DIH jar (4.6) with newer solr.  I think there might have been an api 
change or something that prevented the uncommitted caching code from working 
with newer versions, but I honestly forget.  This is probably a viable solution 
if you don't want to write any code, but it might take some trial and error 
getting it to work.

James Dyer
Ingram Content Group


-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Tuesday, November 17, 2015 8:11 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH Caching w/ BerkleyBackedCache

Mikhail Khludnev wrote
> It's worth to mention that for really complex relations scheme it might be
> challenging to organize all of them into parallel ordered streams.

This will most likely be the issue for us which is why I would like to have
the Berkley cache solution to fall back on, if possible. Again, I'm not sure
why but it appears that the Berkley cache is overwriting itself (i.e.
cleaning up unused data) when building the database... I've read plenty of
other threads where it appears folks are having success using that caching
solution.


Mikhail Khludnev wrote
> threads... you said? Which ones? Declarative parallelization in
> EntityProcessor worked only with certain 3.x version.

We are running multiple DIH instances which query against specific
partitions of the data (i.e. mod of the document id we're indexing).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4240562.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: DIH Caching with Delta Import

2015-10-21 Thread Dyer, James
The DIH Cache feature does not work with delta import.  Actually, much of DIH 
does not work with delta import.  The workaround you describe is similar to the 
approach described here: 
https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport , which 
in my opinion is the best way to implement partial updates with DIH.

James Dyer
Ingram Content Group

-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Tuesday, October 20, 2015 8:02 PM
To: solr-user@lucene.apache.org
Subject: DIH Caching with Delta Import

It appears that DIH entity caching (e.g. SortedMapBackedCache) does not work
with deltas... is this simply a bug with the DIH cache support or somehow by
design?

Any ideas on a workaround for this? Ideally, I could just omit the
"cacheImpl" attribute but that leaves the query (using the default processor
in my case) without the appropriate where clause including the "cacheKey"
and "cacheLookup". Should SqlEntityProcessor be smart enough to ignore the
cache with deltas and simply append a where clause which includes the
"cacheKey" and "cacheLookup"? Or possibly just include a where clause which
includes ('${dih.request.command}' = 'full-import' or cacheKey =
cacheLookup)? I suppose those could be used to mitigate the issue but I was
hoping for possibly a better solution.

Any help would be greatly appreciated. Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-with-Delta-Import-tp4235598.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: DIH parallel processing

2015-10-15 Thread Dyer, James
Nabil,

What we do is have multiple dih request handlers configured in solrconfig.xml.  
Then in the sql query we put something like "where mod(id, ${partition})=0".  
Then an external script calls a full import on each request handler at the same 
time and monitors the response.  This isn't the most elegant solution but it 
gets around the fact that DIH is single-threaded.

James Dyer
Ingram Content Group


-Original Message-
From: nabil Kouici [mailto:koui...@yahoo.fr] 
Sent: Thursday, October 15, 2015 3:58 AM
To: Solr-user
Subject: DIH parallel processing

Hi All,
I'm using DIH to index more than 15M from Sql Server to Solr. This take more 
than 2 hours. Big amount of this time is consumed by data fetching from 
database. I'm thinking about a solution to have parallel (thread) loud in the 
same DIH. Each thread load a part of data.
Do you have any experience with this kind of situation?
Regards,Nabil. 


RE: File-based Spelling

2015-10-13 Thread Dyer, James
Mark,

The older spellcheck implementations create an n-gram sidecar index, which is 
why you're seeing your name split into 2-grams like this.  See the IR Book by 
Manning et al, section 3.3.4 for more information.  Based on the results you're 
getting, I think it is loading your file correctly.  You should now try a query 
against this spelling index, using words *not* in the file you loaded that are 
within 1 or 2 edits from something that is in the dictionary.  If it doesn't 
yield suggestions, then post the relevant sections of the solrconfig.xml, 
schema.xml and also the query string you are trying.

James Dyer
Ingram Content Group


-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov] 
Sent: Monday, October 12, 2015 2:38 PM
To: Solr User Group
Subject: File-based Spelling

Greetings!

I'm attempting to use a file-based spell checker.  My sourceLocation is 
/usr/share/dict/linux.words, and my spellcheckIndexDir is set to 
./data/spFile.  BuildOnStartup is set to true, and I see nothing to 
suggest any sort of problem/error in solr.log.  However, in my 
./data/spFile/ directory, there are only two files: segments_2 with only 
71 bytes in it, and a zero-byte write.lock file.  For a source 
dictionary having 480,000 words in it, I was expecting a bit more 
substance in the ./data/spFile directory.  Something doesn't seem right 
with this.

Moreover, I ran a query on the word Fenbers, which isn't listed in the 
linux.words file, but there are several similar words.  The results I 
got back were odd, and suggestions included the following:
fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

But I expected suggestions like fenders, embers, and fenberry, etc. I 
also ran a query on Mark (which IS listed in linux.words) and got back 
two suggestions in a similar format.  I played with configurables like 
changing the fieldType from text_en to string and the characterEncoding 
from UTF-8 to ASCII, etc., but nothing seemed to yield any different 
results.

Can anyone offer suggestions as to what I'm doing wrong?  I've been 
struggling with this for more than 40 hours now!  I'm surprised my 
persistence has lasted this long!

Thanks,
Mark


RE: Spell Check and Privacy

2015-10-12 Thread Dyer, James
Arnon,

Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to a 
non-zero value.  This will give you re-written queries that are guaranteed to 
return hits, given the original query and filters.  If you are using an "mm" 
value other than 100%, you also will want specify 
"spellcheck.collateParam.mm=100%". (or if using "q.op=OR", then use 
"spellcheck.collateParam.q.op=AND")

Of course, the first section of the spellcheck result will still show every 
possible suggestion, so your client needs to discard these and not divulge them 
to the user.  If you need to know word-by-word how the collations were 
constructed, then specify "spellcheck.collateExtendedResults=true".  Use the 
extended collation results for this information and not the first section of 
the spellcheck results.

This is all fairly well-documented on the old solr wiki:  
https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

James Dyer
Ingram Content Group

-Original Message-
From: Arnon Yogev [mailto:arn...@il.ibm.com] 
Sent: Monday, October 12, 2015 2:33 AM
To: solr-user@lucene.apache.org
Subject: Spell Check and Privacy

Hi,

Our system supports many users from different organizations and with 
different ACLs. 
We consider adding a spell check ("did you mean") functionality using 
DirectSolrSpellChecker. However, a privacy concern was raised, as this 
might lead to private information being revealed between users via the 
suggested terms. Using the FileBasedSpellChecker is another option, but 
naturally a static list of terms is not optimal.

Is there a best practice or a suggested method for these kind of cases?

Thanks,
Arnon



RE: String index out of range exception from Spell check

2015-09-28 Thread Dyer, James
This looks similar to SOLR-4489, which is marked fixed for version 4.5.  If 
you're using an older version, the fix is to upgrade.  

Also see SOLR-3608, which is similar but here it seems as if the user's query 
is more than spellcheck was designed to handle.  This should still be looked at 
and possibly we can come up with a way to handle these cases.

A way to work around these bugs is to strip your query down to raw terms, 
separated by spaces, and use "spellcheck.q" with the raw terms only.

James Dyer
Ingram Content Group


-Original Message-
From: davidphilip cherian [mailto:davidphilipcher...@gmail.com] 
Sent: Sunday, September 27, 2015 3:50 PM
To: solr-user@lucene.apache.org
Subject: String index out of range exception from Spell check

There are irregular exceptions from spell check component. Below is the
stack trace. This is not common for all the q terms but have often seen
them occurring for specific queries after enabling spellcheck.collate
method.



String index out of range: -3



java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) at
java.lang.StringBuilder.replace(StringBuilder.java:266) at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
at
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230)
at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:226)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:722)



500


RE: Spellcheck / Suggestions : Append custom dictionary to SOLR default index

2015-08-25 Thread Dyer, James
Max,

If you know the entire list of words you want to spellcheck against, you can 
use FileBasedSpellChecker.  See 
http://wiki.apache.org/solr/FileBasedSpellChecker .

If, however, you have a field you want to spellcheck against but also want 
additional words added, consider using a copy of the field for spellcheck 
purposes, and then index the additional terms to that field.   You may be able 
to accomplish this easily, for instance, by using index-time synonyms in the 
analysis chain for the spellcheck field.  Or you could just append them to any 
document (more than once if you want to boost the term frequency).

Keep in mind that while this will work fine for regular word-by-word spell 
suggestions, collations are not going to work well with these approaches.

James Dyer
Ingram Content Group

-Original Message-
From: Max Chadwick [mailto:mpchadw...@gmail.com] 
Sent: Monday, August 24, 2015 9:43 PM
To: solr-user@lucene.apache.org
Subject: Spellcheck / Suggestions : Append custom dictionary to SOLR default 
index

Is there a way to append a set of words the the out-of-box solr index when
using the spellcheck / suggestions feature?


RE: exclude folder in dataimport handler.

2015-08-20 Thread Dyer, James
I took a quick look at FileListEntityProcessor#init, and it looks like it 
applies the excludes regex to the filename element of the path only, and not 
to the directories.

If your filenames do not have a naming convention that would let you use it 
this way, you might be able to write a transformer to get what you want.

James Dyer
Ingram Content Group


-Original Message-
From: coolmals [mailto:coolm...@gmail.com] 
Sent: Thursday, August 20, 2015 12:57 PM
To: solr-user@lucene.apache.org
Subject: exclude folder in dataimport handler.

I am importing files from my file system and want to exclude import of files
from folder called templatedata. How do i configure that in entity. 
excludes=templatedata doesnt seem to work.

 entity name=files dataSource=null rootEntity=false
processor=FileListEntityProcessor
baseDir=E:\Malathy\ fileName=.*\.* excludes=templatedata
pk=id 
onError=skip
recursive=true




--
View this message in context: 
http://lucene.472066.n3.nabble.com/exclude-folder-in-dataimport-handler-tp4224267.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr spell check not showing any suggestions for other language

2015-08-05 Thread Dyer, James
Talha,

Possibly this english-specific analysis in your text_suggest field is 
interfering:  solr.EnglishPossessiveFilterFactory ?

Another guess is you're receiving more than 5 results and 
maxResultsForSuggest is set to 5.

But I'm not sure.  Maybe someone can help with more information from you?

Can you provide a few document examples that have Bangla text, then the full 
query request with a misspelled Bangla word (from the document examples you 
provide), then the full spellcheck response, and the total # of documents 
returned ? 

James Dyer
Ingram Content Group

-Original Message-
From: talha [mailto:talh...@gmail.com] 
Sent: Wednesday, August 05, 2015 5:20 AM
To: solr-user@lucene.apache.org
Subject: Solr spell check not showing any suggestions for other language

Solr spell check is not showing any suggestions for other language.I have
indexed mutli-languages (english and bangla) in same core.It's showing
suggestions for wrongly spelt english word but in case of wrongly spelt
bangla word it showing correctlySpelled = false but not showing any
suggestions for it.

Please check my configuration for spell check below

solrconfig.xml

requestHandler name=/select class=solr.SearchHandler
  lst name=defaults

str name=echoParamsexplicit/str
int name=rows10/int
str name=dfproduct_name/str

str name=spellcheckon/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.count5/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollationTries5/str
str name=spellcheck.maxCollations3/str

  /lst
  arr name=last-components
strspellcheck/str
  /arr
/requestHandler
searchComponent name=spellcheck class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypetext_suggest/str

  lst name=spellchecker
str name=namedefault/str
str name=fieldsuggest/str
str name=classnamesolr.DirectSolrSpellChecker/str
str name=distanceMeasureinternal/str
float name=accuracy0.5/float
  /lst

  lst name=spellchecker
str name=namewordbreak/str
str name=fieldsuggest/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=combineWordstrue/str
str name=breakWordstrue/str
int name=maxChanges10/int
int name=minBreakLength5/int
  /lst
/searchComponent


schema.xml

fieldType name=text_suggest class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.UAX29URLEmailTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
  /analyzer
  /fieldType




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spell-check-not-showing-any-suggestions-for-other-language-tp4220950.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr spell check not showing any suggestions for other language

2015-08-05 Thread Dyer, James
Talha,

Can you try putting your queried keyword in spellcheck.q ?

James Dyer
Ingram Content Group


-Original Message-
From: talha [mailto:talh...@gmail.com] 
Sent: Wednesday, August 05, 2015 10:13 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr spell check not showing any suggestions for other language

Dear James

Thank you for your reply.

I tested analyser without “solr.EnglishPossessiveFilterFactory” but still no
luck. I also updated analyser please find this below.

fieldType name=text_suggest class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=${resources}/StopWords/stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


with above configuration for “text_sugggest” i got following results

For Correct Bangla Word: সহজ Solr response is 
Note: i set rows to 0 to skip results

lst name=responseHeader
  int name=status0/int
  int name=QTime2/int
  lst name=params
str name=qসহজ/str
str name=indenttrue/str
str name=rows0/str
str name=wtxml/str
str name=_1438787238383/str
  /lst
/lst
result name=response numFound=249 start=0
/result
lst name=spellcheck
  lst name=suggestions
bool name=correctlySpelledtrue/bool
  /lst
/lst
/response

For an Incorrect Bangla Word: সহগ where i just changed last letter and Solr
response is

lst name=responseHeader
  int name=status0/int
  int name=QTime7/int
  lst name=params
str name=qসহগ/str
str name=indenttrue/str
str name=rows0/str
str name=wtxml/str
str name=_1438787208052/str
  /lst
/lst
result name=response numFound=0 start=0
/result
lst name=spellcheck
  lst name=suggestions
bool name=correctlySpelledfalse/bool
  /lst
/lst
/response




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spell-check-not-showing-any-suggestions-for-other-language-tp4220950p4221033.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr spell check mutliwords

2015-07-30 Thread Dyer, James
Talha,

In your configuration, you have this set:

str name=spellcheck.maxResultsForSuggest5/str

...which means it will consider the query correctly spelled and offer no 
suggestions if there are 5 or more results. You could omit this parameter and 
it will always suggest when possible.  

Possibly, a better option would be to add spellcheck.collateParam.mm=100% or 
spellcheck.collateParam.q.op=100%, so when testing collations against the 
index, it will require all the terms to match something.  See 
https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX for 
more information.

James Dyer
Ingram Content Group

-Original Message-
From: talha [mailto:talh...@gmail.com] 
Sent: Wednesday, July 22, 2015 9:34 AM
To: solr-user@lucene.apache.org
Subject: Solr spell check mutliwords

Could not figure out actual reason why my configured Solr spell checker not
giving desire output. In my indexed data query: symphony+mobile has around
3.5K+ docs and spell checker detect it as correctly spelled. When i
miss-spell symphony in query: symphony+mobile it showing only results for
mobile and spell checker detect this query as correctly spelled. I have
searched this query in different combination. Please find search result stat

Query: symphony 
ResultFound: 1190
SpellChecker: correctly spelled

Query: mobile
ResultFound: 2850
SpellChecker: correctly spelled

Query: simphony
ResultFound: 0
SpellChecker: symphony 
Collation Hits: 1190

Query: symphony+mobile
ResultFound: 3585
SpellChecker: correctly spelled 

Query: simphony+mobile
ResultFound: 2850
SpellChecker: correctly spelled

Query: symphony+mbile
ResultFound: 1190
SpellChecker: correctly spelled 

In last two quries it should suggest something for miss-spelled word
simphony and mbile

Please find my configuration below. Only spell check configuration are given

solrconfig.xml

  requestHandler name=/select class=solr.SearchHandler
  lst name=defaults

str name=echoParamsexplicit/str
int name=rows10/int
str name=dfproduct_name/str

str name=spellcheckon/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.extendedResultstrue/str
str name=spellcheck.count5/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollationTries5/str
str name=spellcheck.maxCollations3/str

  /lst
  arr name=last-components
strspellcheck/str
  /arr
  /requestHandler

  searchComponent name=spellcheck class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypetext_suggest/str

  lst name=spellchecker
str name=namedefault/str
str name=fieldsuggest/str
str name=classnamesolr.DirectSolrSpellChecker/str
str name=distanceMeasureinternal/str
float name=accuracy0.5/float
  /lst

  lst name=spellchecker
str name=namewordbreak/str
str name=fieldsuggest/str
str name=classnamesolr.WordBreakSolrSpellChecker/str
str name=combineWordstrue/str
str name=breakWordstrue/str
int name=maxChanges10/int
int name=minBreakLength5/int
  /lst

  /searchComponent

schema.xml

  fieldType name=text_suggest class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.UAX29URLEmailTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.EnglishPossessiveFilterFactory/
  /analyzer
  /fieldType



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spell-check-mutliwords-tp4218580.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Protwords in solr spellchecker

2015-07-10 Thread Dyer, James
Kamal,

Given the constraint that you cannot re-index the data, your best bet might be 
to simply filter out the suggestions at the application level, or maybe even 
have a proxy do it.

Possibly another option, you might be able to extend DirectSolrSpellchecker and 
override #getSuggestions(), calling super(), then post-filtering out your stop 
words from the response.  You'll want to request a few more terms so you're 
more likely to get results even if a term or two get filtered out.  You can 
specify your custom spell checker in solrconfig.xml.

James Dyer
Ingram Content Group


-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Friday, July 10, 2015 7:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Protwords in solr spellchecker

So let's try to analyse the situation from the spellchecking point of view .
First of all we follow David suggestions and we add in the QueryTime
analysis, the StopWordsFilter, with our configured bad words.

*Starting scenario*
- we have the protected words in our index, we still want them to be in
there

Let's explore the different kind of Spellcheckers available, where do they
take the suggestions ? :

*Index Based Spellchecker*
The suggestions will come from an auxiliary index.

*Direct Spellchecker*
The suggestions will come from the current index.

*File based spellchecker*
It uses an external file to get the spelling suggestions from, so we can
curate this file properly with only good words, and we are fine.
But I guess you would like to use a blacklist, in this case we are going to
have a white list.

*Query Time*
At query time *the query is analysed *and a token stream is provided.
Then depending on the implementation we trigger a different lookup.
In the case of the Direct Spellchecker, if I remember well :
For each token a FST with all the supported inflections is generated and an
intersection happen with the Index FST ( based on the field), and the
suggestion is returned.

Unfortunately a proper* query time analysis will not help .*
When we analyse the query we have the misspelled word sexe that is not
going to be recognised as the bad word.
Then the inflections are calculated, the FST built and the intersection
will actually produce the feared suggestion sex .
This because the word is in the index.

If we can't modify the index, the *Direct Spellcheck is not an option *if
my understanding is correct.

Let's see if the Index Based spellcheck can help …
Unfortunately also in this case, the auxiliary index produced is based on
the analysed form of the original field.

If you really can not re-index content I would suggest you an
implementation based on a concept similar to the AnalyzingSuggester in Solr.

Open to clarify your further questions.








2015-07-10 9:31 GMT+01:00 davidphilip cherian davidphilipcher...@gmail.com
:

 Hi Kamal,

 Not necessarily. You can have different filters applied at index time and
 query time. (note that the order in which filters are defined matters). You
 could just add the stop filter at query time.
 Have your own custom data type defined (similar to 'text_en' that will be
 in schem.xml) and perhaps use standard/whitespace tokenizer followed by
 stop filter at query time.

 Tip: Use analysis tool that is available in solr admin page to further
 understand the analysis chain of data types.

 HTH



 On Fri, Jul 10, 2015 at 1:03 PM, Kamal Kishore Aggarwal 
 kkroyal@gmail.com wrote:

  Hi David,
 
  This one is a good suggestion. But, if add these *adult* keywords in the
  stopwords.txt file, it will be requiring the re-indexing of these
 keywords
  related data.
 
  How can I see the change instantly. Is there any other great suggestion
  that you can suggest me.
 
 
 
 
  On Thu, Jul 9, 2015 at 12:09 PM, davidphilip cherian 
  davidphilipcher...@gmail.com wrote:
 
   The best bet is to use solr.StopFilterFactory.
   Have all such words added to stopwords.txt and add this filter to your
   analyzer.
  
   Reference links
  
  
 
 https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory
  
  
 
 https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter
  
   HTH
  
  
   On Thu, Jul 9, 2015 at 11:50 AM, Kamal Kishore Aggarwal 
   kkroyal@gmail.com wrote:
  
Hi Team,
   
I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. Is
  there
any feature by which I can refrain the following words to appear in
  spell
suggestion.
   
For example: Somebody searches for sexe, I does not want to show him
  sex
   as
the spell suggestion via solr. How can I stop these type of keywords
 to
   be
shown in suggestion.
   
Any help is appreciated.
   
   
Regards
Kamal Kishore
Solr Beginner
   
  
 




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What 

RE: Spell checking the synonym list?

2015-07-09 Thread Dyer, James
Ryan,

If you use index-time synonyms on the spellcheck field, this will give you what 
you want.

For instance, if the document has lawyer and you index both terms 
lawyer,attorney, then the spellchecker will see that atorney is 1 edit 
away from an indexed term and will suggest attorney. 

You'll need to have the same synonyms set up against the query field, but you 
have the option of making these query-time synonyms if you prefer.

James Dyer
Ingram Content Group

-Original Message-
From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] 
Sent: Thursday, July 09, 2015 2:28 AM
To: solr-user@lucene.apache.org
Subject: Spell checking the synonym list?

Hi all,

I'm wondering if it's possible to have spell checking performed on terms in
the synonym list?

For example, let's say I have documents with the word lawyer in them and
I add lawyer, attorney in the synonyms.txt file. Then a query is made for
the word atorney. Is there any way to provide spell checking on this?

Thanks,
Ryan


RE: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-14 Thread Dyer, James
Elisabeth,

Currently ConjunctionSolrSpellChecker only supports adding 
WordBreakSolrSpellchecker to IndexBased- FileBased- or DirectSolrSpellChecker.  
In the future, it would be great if it could handle other Spell Checker 
combinations.  For instance, if you had a (e)dismax query that searches 
multiple fields, to have a separate spellchecker for each of them.

But CSSC is not hardened for this more general usage, as hinted in the API doc. 
 The check done to ensure all spellcheckers use the same stringdistance object, 
I believe, is a safeguard against using this class for functionality it is not 
able to correctly support.  It looks to me that SOLR-6271 was opened to fix the 
bug in that it is comparing references on the stringdistance.  This is not a 
problem with WBSSC because this one does not support string distance at all.

What you're hoping for, however, is that the requirement for the string 
distances be the same to be removed entirely.  You could try modifying the code 
by removing the check.  However beware that you might not get the results you 
desire!  But should this happen, please, go ahead and fix it for your use case 
and then donate the code.  This is something I've personally wanted for a long 
time.

James Dyer
Ingram Content Group


-Original Message-
From: elisabeth benoit [mailto:elisaelisael...@gmail.com] 
Sent: Tuesday, April 14, 2015 7:37 AM
To: solr-user@lucene.apache.org
Subject: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

Hello,

I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and
FileBasedSpellchecker in same request.

I've applied change from patch 135.patch (cf Solr-6271). I've tried running
the command patch -p1 -i 135.patch --dry-run but it didn't work, maybe
because the patch was a fix to Solr 4.9, so I just replaced line in
ConjunctionSolrSpellChecker

else if (!stringDistance.equals(checker.getStringDistance())) {
 throw new IllegalArgumentException(
 All checkers need to use the same StringDistance.);
   }


by

else if (!stringDistance.equals(checker.getStringDistance())) {
throw new IllegalArgumentException(
All checkers need to use the same StringDistance!!! 1: +
checker.getStringDistance() +  2:  + stringDistance);
  }

as it was done in the patch

but still, when I send a spellcheck request, I get the error

msg: All checkers need to use the same StringDistance!!!
1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32:
org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08

From error message I gather both spellchecker use same distanceMeasure
LuceneLevenshteinDistance, but they're not same instance of
LuceneLevenshteinDistance.

Is the condition all right? What should be done to fix this properly?

Thanks,
Elisabeth


RE: Solr phonetics with spelling

2015-03-10 Thread Dyer, James
Ashish,

I would not recommend using spellcheck against a phonetic-analyzed field.  
Instead, you can use copyField to create a separate field that is lightly 
analyzed and use the copy for spelling.  

James Dyer
Ingram Content Group


-Original Message-
From: Ashish Mukherjee [mailto:ashish.mukher...@gmail.com] 
Sent: Tuesday, March 10, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Solr phonetics with spelling

Hello,

Couple of questions related to phonetics -

1. If I enable the phonetic filter in managed-schema file for a particular
field, how does it affect the spell handler?

2. What is the meaning of the inject attribute within analyzer in
managed-schema? The documentation is not very clear about it.

Regards,
Ashish


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-18 Thread Dyer, James
It will try to give you suggestions up to the number you specify, but if fewer 
are available it will not give you any more.

James Dyer
Ingram Content Group

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:40 PM
To: solr-user@lucene.apache.org
Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

Thanks James,
  I tried the same thing
spellcheck.count=10spellcheck.alternativeTermCount=5. And I got 5
suggestions of both life and hope but not like this * The spellchecker
will try to return you up to 10 suggestions for hope, but only up to 5
suggestions for life. *


On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James james.d...@ingramcontent.com
wrote:

 Here is an example to illustrate what I mean...

 - query q=text:(life AND
 hope)spellcheck.count=10spellcheck.alternativeTermCount=5
 - suppose at least one document in your dictionary field has life in it
 - also suppose zero documents in your dictionary field have hope in them
 - The spellchecker will try to return you up to 10 suggestions for hope,
 but only up to 5 suggestions for life

 James Dyer
 Ingram Content Group


 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Tuesday, February 17, 2015 11:35 AM
 To: solr-user@lucene.apache.org
 Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

 Hi James,
 How can you say that count doesn't use
 index/dictionary then from where suggestions come.

 On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James 
 james.d...@ingramcontent.com
 wrote:

  See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
  the following section, for details.
 
  Briefly, count is the # of suggestions it will return for terms that
 are
  *not* in your index/dictionary.  alternativeTermCount are the # of
  alternatives you want returned for terms that *are* in your dictionary.
  You can set them to the same value, unless you want fewer suggestions
 when
  the terms is in the dictionary.
 
  James Dyer
  Ingram Content Group
 
  -Original Message-
  From: Nitin Solanki [mailto:nitinml...@gmail.com]
  Sent: Tuesday, February 17, 2015 5:27 AM
  To: solr-user@lucene.apache.org
  Subject: spellcheck.count v/s spellcheck.alternativeTermCount
 
  Hello Everyone,
I got confusion between spellcheck.count and
  spellcheck.alternativeTermCount in Solr. Any help in details?
 



RE: Why collations are coming even I set the value of spellcheck.count to zero(0)

2015-02-18 Thread Dyer, James
I think when you set count/alternativeTermCount to zero, the defaults (10?) 
are used instead.  Instead of setting these to zero, just use 
spellcheck=false.  These 2 parameters control suggestions, not collations.

To turn off collations, set spellcheck.collate=false.  Also, I wouldn't set 
maxCollationTries as high as 100, as it could (sometimes) potentially check 
100 possibly collations against the index and that would be very slow.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Wednesday, February 18, 2015 2:37 AM
To: solr-user@lucene.apache.org
Subject: Why collations are coming even I set the value of spellcheck.count to 
zero(0)

Hi Everyone,
I have set the value of spellcheck.count = 0 and
spellcheck.alternativeTermCount = 0. Even though collations are coming when
I search any query which is misspelled. Why so?
I also set the value of spellcheck.maxCollations = 100 and
spellcheck.maxCollationTries = 100. What I know that collations are built
on suggestions. So, Have I any misunderstanding about collation or any
other configuration issue. Any help Please?


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the 
following section, for details.

Briefly, count is the # of suggestions it will return for terms that are 
*not* in your index/dictionary.  alternativeTermCount are the # of 
alternatives you want returned for terms that *are* in your dictionary.  You 
can set them to the same value, unless you want fewer suggestions when the 
terms is in the dictionary.

James Dyer
Ingram Content Group

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 5:27 AM
To: solr-user@lucene.apache.org
Subject: spellcheck.count v/s spellcheck.alternativeTermCount

Hello Everyone,
  I got confusion between spellcheck.count and
spellcheck.alternativeTermCount in Solr. Any help in details?


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
Here is an example to illustrate what I mean...

- query q=text:(life AND 
hope)spellcheck.count=10spellcheck.alternativeTermCount=5
- suppose at least one document in your dictionary field has life in it
- also suppose zero documents in your dictionary field have hope in them
- The spellchecker will try to return you up to 10 suggestions for hope, but 
only up to 5 suggestions for life

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

Hi James,
How can you say that count doesn't use
index/dictionary then from where suggestions come.

On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
 the following section, for details.

 Briefly, count is the # of suggestions it will return for terms that are
 *not* in your index/dictionary.  alternativeTermCount are the # of
 alternatives you want returned for terms that *are* in your dictionary.
 You can set them to the same value, unless you want fewer suggestions when
 the terms is in the dictionary.

 James Dyer
 Ingram Content Group

 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Tuesday, February 17, 2015 5:27 AM
 To: solr-user@lucene.apache.org
 Subject: spellcheck.count v/s spellcheck.alternativeTermCount

 Hello Everyone,
   I got confusion between spellcheck.count and
 spellcheck.alternativeTermCount in Solr. Any help in details?



RE: Collations are not working fine.

2015-02-13 Thread Dyer, James
Nitin,

Can you post the full spellcheck response when you query:

q=gram_ci:gone wthh thes wintwt=jsonindent=trueshards.qt=/spell

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Friday, February 13, 2015 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi James Dyer,
  I did the same as you told me. Used
WordBreakSolrSpellChecker instead of shingles. But still collations are not
coming or working.
For instance, I tried to get collation of gone with the wind by searching
gone wthh thes wint on field=gram_ci but didn't succeed. Even, I am
getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*.
Also I have documents which contains gone with the wind having 167 times
in the documents. I don't know that I am missing something or not.
Please check my below solr configuration:

*URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:gone wthh thes
wintwt=jsonindent=trueshards.qt=/spell

*solrconfig.xml:*

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpellCi/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldgram_ci/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  str name=distanceMeasureinternal/str
  float name=accuracy0.5/float
  int name=maxEdits2/int
  int name=minPrefix0/int
  int name=maxInspections5/int
  int name=minQueryLength2/int
  float name=maxQueryFrequency0.9/float
  str name=comparatorClassfreq/str
/lst
lst name=spellchecker
  str name=namewordbreak/str
  str name=classnamesolr.WordBreakSolrSpellChecker/str
  str name=fieldgram/str
  str name=combineWordstrue/str
  str name=breakWordstrue/str
  int name=maxChanges5/int
/lst
/searchComponent

requestHandler name=/spell class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=dfgram_ci/str
  str name=spellcheck.dictionarydefault/str
  str name=spellcheckon/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.count25/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.maxResultsForSuggest1/str
  str name=spellcheck.alternativeTermCount25/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.maxCollations50/str
  str name=spellcheck.maxCollationTries50/str
  str name=spellcheck.collateExtendedResultstrue/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

*Schema.xml: *

field name=gram_ci type=textSpellCi indexed=true stored=true
multiValued=false/

/fieldTypefieldType name=textSpellCi class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType


RE: Collations are not working fine.

2015-02-10 Thread Dyer, James
Nitin,

I have not tested using shingles with collations but my guess here is the 
collation feature is not going to work as expected with a shingled index.  So 
try re-indexing without the shingles and see if it gives you more intuitive 
results.  If that helps, and if you want to still correct whitespace errors, 
then consider using WordBreakSolrSpellChecker instead of shingles (the main 
solr example demonstrates how).  

Beyond that, without some queries *and* the full spellcheck response, and an 
explanation as to why you feel the spellcheck response is wrong, I'm not sure 
you will get much more help with this.

Here is what hits in the collation response means:

 By hits, it means if you replaced the q parameter on the original
 query but left everything else the same (filters, etc), this is how many
 results you would get.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Monday, February 09, 2015 11:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi *James Dyer*
*,*
   I have not done stemming and my
spellcheck.alternativeTermCount is set equals to spellcheck.count. Below, I
have pasted my solrconfig.xml and schema.xml configuration.


*URL: *
localhost:8983/solr/wikingram/spell?q=gram_ci:delighwt=jsonindent=trueshards.qt=/spell

*solrconfig.xml:*

searchComponent name=spellcheck class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypetextSpellCi/str
lst name=spellchecker
  str name=namedefault/str
  str name=fieldgram_ci/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  str name=distanceMeasureinternal/str
  float name=accuracy0.5/float
  int name=maxEdits2/int
  int name=minPrefix0/int
  int name=maxInspections5/int
  int name=minQueryLength2/int
  float name=maxQueryFrequency0.9/float
  str name=comparatorClassfreq/str
/lst
/searchComponent

requestHandler name=/spell class=solr.SearchHandler startup=lazy
lst name=defaults
  str name=dfgram_ci/str
  str name=spellcheck.dictionarydefault/str
  str name=spellcheckon/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.count25/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.maxResultsForSuggest1/str
  str name=spellcheck.alternativeTermCount25/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.maxCollations50/str
  str name=spellcheck.maxCollationTries50/str
  str name=spellcheck.collateExtendedResultstrue/str
/lst
arr name=last-components
  strspellcheck/str
/arr
  /requestHandler

*Schema.xml: *

field name=gram_ci type=textSpellCi indexed=true stored=true
multiValued=false/

/fieldTypefieldType name=textSpellCi class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory maxShingleSize=5
minShingleSize=2 outputUnigrams=true/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory maxShingleSize=5
minShingleSize=2 outputUnigrams=true/
/analyzer
/fieldType

On Tue, Feb 10, 2015 at 1:23 AM, Dyer, James james.d...@ingramcontent.com
wrote:

 Nitin,

 My guess here is that your spellcheck field is a field that has stemming.
 This might be why you get a collation that return wind even though the
 user queried wnd and it does not get any suggestions.  Perhaps wnd is
 stemmed the same as wind ?  (Spellcheck usually works best if you
 copyField the query field to something that is tokenized but not heavily
 analyzed, and use the copy as the spellcheck dictionary.)

 The other problem might be because wind is in the index but you are not
 using spellcheck.alternativeTermCount.  If you set this to the same value
 as spellcheck.count, then it will give suggestions even when words exist
 in the index.

 By hits, it means if you replaced the q parameter on the original
 query but left everything else the same (filters, etc), this is how many
 results you would get.

 If you need more help, please include in your message the pertinent
 sections of solrconfig.xml, schema.xml and also the full query url you are
 using and the full spellcheck response.

 James Dyer
 Ingram Content Group


 -Original Message-
 From: Nitin Solanki [mailto:nitinml...@gmail.com]
 Sent: Monday, February 09, 2015 7:47 AM
 To: solr-user@lucene.apache.org
 Subject: Collations are not working fine.

 I am working on spell checking in Solr. I have implemented Suggestions and
 collations in my spell checker component.

 Most of the time collations work fine but in few case it fails.

 *Working*:
 I tried query:*gone wthh thes wnd*: In this wnd doesn't give suggestion
 wind but collation

RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
Okke,

My first guess is that the additional results from the word break spellchecker 
is causing additional per-term results and the correct answer is not making the 
list.  So you might need to increase spellcheck.count and/or 
spellcheck.alternativeTermCount .

My second guess is that the correct answer is still in the per-term results but 
low enough down now that wordbreak is producing additional results, that the 
correct answer never gets tested as a possible collation.  In this case, if 
you're already getting your maximum collations back, just not the one you 
wanted, then increase spellcheck.maxCollations.  Otherwise, try increasing 
spellcheck.maxCollationTries.

If this doesn't help, then go ahead and post the pertinent sections of 
solrconfig.xml, schema.xml, and show what you change when adding wordbreak.  
Then also include before  after query url's with the full spellcheck responses.

James Dyer
Ingram Content Group


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 8:49 AM
To: solr-user@lucene.apache.org
Subject: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

Because of a lot of misspellings in content I am using alternativeTermCount
and maxResultsForSuggest to get suggestions even if terms are in index.
However when adding wordbreak dictionary the collation that was given before
is now empty.

Is there a way to make this work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
Okke,

There is no way to have it both correct spelling and whitespace in the same 
correction.  So unfortunately there is no easy fix for your use-case.  The old 
shingle method of correcting whitespace might work for this, but it might also 
introduce other problems.

I saw your comments on SOLR-5386 and I appreciate your reminder about that 
issue.  The easiest workaround is to put spellcheck.maxCollationTries=0 in 
all of your warming queries.  Better yet, just use spellcheck=false in the 
warming queries because having spellcheck enabled in the warming queries serves 
no purpose but to make searchers take longer to open.

James Dyer
Ingram Content Group

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 9:51 AM
To: solr-user@lucene.apache.org
Subject: RE: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

Thank you for that answer James.

Increasing spellcheck.count did the trick.

Funny result for query holywood the suggestion is holy wood instead of
hollywood. Eventhough I have a mm of 100%.

Any way to fix that?

BTW when using maxCollationTries Solr hangs on core reload. Apparantly an
old bug, but hard to find as logs show nothing.

Below the results for holywood: 

suggestions:[
  holywood,{
numFound:4,
startOffset:0,
endOffset:8,
origFreq:4,
suggestion:[{
word:holy wood,
freq:70559},
  {
word:hollywood,
freq:2649},
  {
word:holyrood,
freq:14},
  {
word:homewood,
freq:737}]},
  correctlySpelled,false,
  collation,(holy wood)]}}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185368.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
I think the problem is when it combines suggestions from DirectSolrSpellChecker 
and WorkBreakSolrSpellChecker, it gets two lists of possiblities in edit 
distance order.  And when it combines these lists, all it does is interleave 
the 2 lists: 1 from the first list, then 1 from the 2nd list, then 1 from the 
1st, etc.  

So I think if you ran the query with just Direct, you'd see 1 list for each 
potentially misspelled word, and then if you ran the query with just WordBreak, 
you'd see a difference list for each potentially misspelled word.  And then 
when running with both spellcheckers, you'll see them interleaved 
every-other-one.

It might (or might not) depend on the order you specify the 2 spellcheckers in 
solrconfig.xml.  Maybe (not sure here) the first one is guaranteed to provide 
the first suggestion, so long as it provides at least one.  You might want to 
see if you have WordBreak specified first, and if so, then switch them.  
Because when collations are tested, it just goes through the lists, top to 
bottom and tries the various combinations until either maxCollationTries or 
maxCollations is exhausted.  And it will give you the good collations it 
finds in the order it finds them.

Possibly, an easy workaround is to just increase maxCollations by 1 more and 
then use the suggestion with the most hits.  This will be a small performance 
penalty though every time it has to find collations, as testing the 
possibilities is expensive.

James Dyer
Ingram Content Group


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 11:55 AM
To: solr-user@lucene.apache.org
Subject: RE: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

James,

That is very useful information. I tested it and can confirm that disabling
spellcheck in warmer solves core reload problem. 

Now with my use case I'm not trying to spellcheck and correct a whitespace.
If holy wood was queried with a mm of 100% it would have fewer hits then
hollywood and this would then be the best correction.

Is there a way to do this?

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185423.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
Got it.  Took a quick look at the code and I see it uses the maximum frequency 
of the terms.  And in your case, one of these terms (holy and wood), occurs 
71,000 times.  It wouldn't be too difficult to change this to use the average 
frequency of the terms or the minimum.  But currently the only options is to 
use the maximum or the sum of the frequencies.  Possibly the minimum is a 
better predictor of how relevant a suggestion is though.

James Dyer
Ingram Content Group


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 1:27 PM
To: solr-user@lucene.apache.org
Subject: RE: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

I did some testing and the order of dictionaries doesn't seem to have an
effect. They are sorted by frequency. So if mm was applied holy wood would
have a lower frequency and solve this problem.

  suggestions:[
  holywood,{
numFound:4,
startOffset:0,
endOffset:8,
origFreq:4,
suggestion:[{
word:holy wood,
freq:71828},
  {
word:hollywood,
freq:2669},
  {
word:holyrood,
freq:14},
  {
word:homewood,
freq:737}]},
  correctlySpelled,false,
  collation,(holy wood),
  collation,hollywood]}}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185461.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
I opened LUCENE-6237 for this.  I can't promise when I or someone else will 
actually complete this, but it wouldn't be very difficult to do either.  Seeing 
your use-case, I think this would be a nice little improvement.

James Dyer
Ingram Content Group


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 3:25 PM
To: solr-user@lucene.apache.org
Subject: RE: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

Yeah that should work. Is this something you will change in the code?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185489.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Collations are not working fine.

2015-02-09 Thread Dyer, James
Nitin,

My guess here is that your spellcheck field is a field that has stemming.  This 
might be why you get a collation that return wind even though the user 
queried wnd and it does not get any suggestions.  Perhaps wnd is stemmed 
the same as wind ?  (Spellcheck usually works best if you copyField the 
query field to something that is tokenized but not heavily analyzed, and use 
the copy as the spellcheck dictionary.)

The other problem might be because wind is in the index but you are not using 
spellcheck.alternativeTermCount.  If you set this to the same value as 
spellcheck.count, then it will give suggestions even when words exist in the 
index.

By hits, it means if you replaced the q parameter on the original query but 
left everything else the same (filters, etc), this is how many results you 
would get.

If you need more help, please include in your message the pertinent sections of 
solrconfig.xml, schema.xml and also the full query url you are using and the 
full spellcheck response.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Monday, February 09, 2015 7:47 AM
To: solr-user@lucene.apache.org
Subject: Collations are not working fine.

I am working on spell checking in Solr. I have implemented Suggestions and
collations in my spell checker component.

Most of the time collations work fine but in few case it fails.

*Working*:
I tried query:*gone wthh thes wnd*: In this wnd doesn't give suggestion
wind but collation is coming right = gone with the wind, hits = 117


*Not working:*
But when I tried query: *gone wthh thes wint*: In this wint does give
suggestion wind but collation is not coming right. Instead of gone with
the wind it gives gone with the west, hits = 1.

And I want to also know what is *hits* in collations.


RE: Solr 4.9 Calling DIH concurrently

2015-02-04 Thread Dyer, James
Yes, that is what I mean.  In my case, for each /dataimport in the defaults 
section, I also put something like this:

str name=currentPartition1/str

...and then reference it in the data-config.xml with 
${dataimporter.request.currentPartition} .  This way the same data-config.xml 
can be used for each handler.

As I said before, while this works (and this is what I do in production), it 
seems generally preferable to write code for this use-case.

James Dyer
Ingram Content Group


-Original Message-
From: meena.sri...@mathworks.com [mailto:meena.sri...@mathworks.com] 
Sent: Tuesday, February 03, 2015 4:24 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.9 Calling DIH concurrently

Thanks James. After lots of search and reading now I think I understand a
little from your answer.
If I understand correctly my solrconfig.xml will have section like this

requestHandler name=/dataimport1 class=solr.DataImportHandler
lst name=defaults
  str name=configdb-data-config1.xml/str
/lst
  /requestHandler

requestHandler name=/dataimport2 class=solr.DataImportHandler
lst name=defaults
  str name=configdb-data-config1.xml/str
/lst
  /requestHandler

.
.
.
.
.
requestHandler name=/dataimport8 class=solr.DataImportHandler
lst name=defaults
  str name=configdb-data-config1.xml/str
/lst
  /requestHandler


Is this correct. If its true then I can call 8 such requests 
maxIndexingThreads8/maxIndexingThreads

and solr will commit data when the 
ramBufferSizeMB100/ramBufferSizeMB

of 100MB is reached per thread.

Thanks again for your time.

Thanks
Meena






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-9-Calling-DIH-concurrently-tp4183744p4183750.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Solr 4.9 Calling DIH concurrently

2015-02-03 Thread Dyer, James
DIH is single-threaded.  There was once a threaded option, but it was buggy and 
subsequently was removed.  

What I do is partition my data and run multiple dih request handlers at the 
same time.  It means redundant sections in solrconfig.xml and its not very 
elegant but it works.

For instance, for a sql query, I add something like this: where mod(id, 
${dataimporter.request.numPartitions})=${dataimporter.request.currentPartition}.

I think, though, most users who want to make the most out of multithreading 
write their own program and use the solrj api to send the updates.

James Dyer
Ingram Content Group


-Original Message-
From: meena.sri...@mathworks.com [mailto:meena.sri...@mathworks.com] 
Sent: Tuesday, February 03, 2015 3:43 PM
To: solr-user@lucene.apache.org
Subject: Solr 4.9 Calling DIH concurrently

Hi 

I am using solr 4.9 and need to index million of documents from database. I
am using DIH and sending request to fetch by ids. Is there a way to run
multiple indexing threads, concurrently in DIH. 
I want to take advantage of 
maxIndexingThreads
parameter. How do I do it. I am just invoking DIH handler using solrj
HttpSolrServer.
And issue requests sequentially.
http://localhost:8983/solr/db/dataimport?command=full-importclean=falsemaxId=100minId=1

http://localhost:8983/solr/db/dataimport?command=full-importclean=falsemaxId=201minId=101





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-9-Calling-DIH-concurrently-tp4183744.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-02-02 Thread Dyer, James
1 is not too small a value, in fact, it’s the default value.  Of course the 
more combinations it has to try, the slower it will run, but the penalty is 
small enough you're not going to notice.  The only problem you might have is if 
you use a lot of 1-character stop-words, you might get these stop-words back as 
nonsense suggestions (assuming you do not filter stop words for your spelling 
dictionary field, but do remove them on the query field).  But I'd try it if I 
were you.  It's probably the best option in your case.

James Dyer
Ingram Content Group

-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Friday, January 30, 2015 5:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

Nice! It works indeed!
Sorry I didn't noticed that before.

But what if I want the same for the iPhone?
I mean suggesting I phone for users who searched iphone. Minbreaklength
of 1 is just too small isn't it?

Il sabato 31 gennaio 2015, Dyer, James-2 [via Lucene] 
ml-node+s472066n4183176...@n3.nabble.com ha scritto:

 You need to decrease this to at least 2 because the length of go is 3.

 int name=minBreakLength3/int

 James Dyer
 Ingram Content Group


 -Original Message-
 From: fabio.bozzo [mailto:[hidden email]
 http:///user/SendEmail.jtp?type=nodenode=4183176i=0]
 Sent: Wednesday, January 28, 2015 4:55 PM
 To: [hidden email] http:///user/SendEmail.jtp?type=nodenode=4183176i=1
 Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

 I tried increasing my alternativeTermCount to 5 and enable extended
 results.
 I also added a filter fq parameter to clarify what I mean:

 *Querying for go pro is good:*

 {
   responseHeader: {
 status: 0,
 QTime: 2,
 params: {
   q: go pro,
   indent: true,
   fq: marchio:\GO PRO\,
   rows: 1,
   wt: json,
   spellcheck.extendedResults: true,
   _: 1422485581792
 }
   },
   response: {
 numFound: 27,
 start: 0,
 docs: [
   {
 codice_produttore_s: DK00150020,
 codice_s: 5.BAT.27407,
 id: 27407,
 marchio: GO PRO,
 barcode_interno_s: 185323000958,
 prezzo_acquisto_d: 16.12,
 data_aggiornamento_dt: 2012-06-21T00:00:00Z,
 descrizione: BATTERIA GO PRO HERO ,
 prezzo_vendita_d: 39.9,
 categoria: Batterie,
 _version_: 1491583424191791000
   },

  

 ]
   },
   spellcheck: {
 suggestions: [
   go pro,
   {
 numFound: 1,
 startOffset: 0,
 endOffset: 6,
 origFreq: 433,
 suggestion: [
   {
 word: gopro,
 freq: 2
   }
 ]
   },
   correctlySpelled,
   false,
   collation,
   [
 collationQuery,
 gopro,
 hits,
 3,
 misspellingsAndCorrections,
 [
   go pro,
   gopro
 ]
   ]
 ]
   }
 }

 While querying for gopro is not:

 {
   responseHeader: {
 status: 0,
 QTime: 6,
 params: {
   q: gopro,
   indent: true,
   fq: marchio:\GO PRO\,
   rows: 1,
   wt: json,
   spellcheck.extendedResults: true,
   _: 1422485629480
 }
   },
   response: {
 numFound: 3,
 start: 0,
 docs: [
   {
 codice_produttore_s: DK0030010,
 codice_s: 5.VID.39163,
 id: 38814,
 marchio: GO PRO,
 barcode_interno_s: 818279012477,
 prezzo_acquisto_d: 150.84,
 data_aggiornamento_dt: 2014-12-24T00:00:00Z,
 descrizione: VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM,
 prezzo_vendita_d: 219,
 categoria: Fotografia,
 _version_: 1491583425479442400
   },
 
 ]
   },
   spellcheck: {
 suggestions: [
   gopro,
   {
 numFound: 1,
 startOffset: 0,
 endOffset: 5,
 origFreq: 2,
 suggestion: [
   {
 word: giro,
 freq: 6
   }
 ]
   },
   correctlySpelled,
   false
 ]
   }
 }

 ---

 I'd like go pro as a suggestion for gopro too.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4183176.html
  To unsubscribe from Suggesting broken words with
 solr.WordBreakSolrSpellChecker, click here
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4182172code=Zi5ib3p6b0AzLXcuaXR8NDE4MjE3MnwxODkyODA0NDQy
 .
 NAML
 http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid

RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-30 Thread Dyer, James
You need to decrease this to at least 2 because the length of go is 3.

int name=minBreakLength3/int

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Wednesday, January 28, 2015 4:55 PM
To: solr-user@lucene.apache.org
Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

I tried increasing my alternativeTermCount to 5 and enable extended results.
I also added a filter fq parameter to clarify what I mean:

*Querying for go pro is good:*

{
  responseHeader: {
status: 0,
QTime: 2,
params: {
  q: go pro,
  indent: true,
  fq: marchio:\GO PRO\,
  rows: 1,
  wt: json,
  spellcheck.extendedResults: true,
  _: 1422485581792
}
  },
  response: {
numFound: 27,
start: 0,
docs: [
  {
codice_produttore_s: DK00150020,
codice_s: 5.BAT.27407,
id: 27407,
marchio: GO PRO,
barcode_interno_s: 185323000958,
prezzo_acquisto_d: 16.12,
data_aggiornamento_dt: 2012-06-21T00:00:00Z,
descrizione: BATTERIA GO PRO HERO ,
prezzo_vendita_d: 39.9,
categoria: Batterie,
_version_: 1491583424191791000
  },

 

]
  },
  spellcheck: {
suggestions: [
  go pro,
  {
numFound: 1,
startOffset: 0,
endOffset: 6,
origFreq: 433,
suggestion: [
  {
word: gopro,
freq: 2
  }
]
  },
  correctlySpelled,
  false,
  collation,
  [
collationQuery,
gopro,
hits,
3,
misspellingsAndCorrections,
[
  go pro,
  gopro
]
  ]
]
  }
}

While querying for gopro is not:

{
  responseHeader: {
status: 0,
QTime: 6,
params: {
  q: gopro,
  indent: true,
  fq: marchio:\GO PRO\,
  rows: 1,
  wt: json,
  spellcheck.extendedResults: true,
  _: 1422485629480
}
  },
  response: {
numFound: 3,
start: 0,
docs: [
  {
codice_produttore_s: DK0030010,
codice_s: 5.VID.39163,
id: 38814,
marchio: GO PRO,
barcode_interno_s: 818279012477,
prezzo_acquisto_d: 150.84,
data_aggiornamento_dt: 2014-12-24T00:00:00Z,
descrizione: VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM,
prezzo_vendita_d: 219,
categoria: Fotografia,
_version_: 1491583425479442400
  },

]
  },
  spellcheck: {
suggestions: [
  gopro,
  {
numFound: 1,
startOffset: 0,
endOffset: 5,
origFreq: 2,
suggestion: [
  {
word: giro,
freq: 6
  }
]
  },
  correctlySpelled,
  false
]
  }
}

---

I'd like go pro as a suggestion for gopro too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-28 Thread Dyer, James
Try using something larger than 2 for alternativeTermCount.  5 is probably ok 
here.  If that doesn't work, then post the exact query you are using and the 
full extended spellcheck results.

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 3:59 PM
To: solr-user@lucene.apache.org
Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

I have this in my solrconfig:

requestHandler name=/select class=solr.SearchHandler

lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfcatch_all/str

str name=spellcheckon/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count5/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.maxResultsForSuggest100/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollationTries5/str
str name=spellcheck.maxCollations3/str
/lst

arr name=last-components
strspellcheck/str
/arr

/requestHandler

Although my spellchecker does work, suggesting for misspelled terms, it
doesn't work for the example above:
I mean terms which are both valid, (gopro=100 docs; go pro=150 'others'
docs).
I want to suggest gopro for go pro search term and vice-versa, even if
they're both perfectly valid terms in the index. Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182398.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread Dyer, James
You need to set spellcheck.alternativeTermCount to a value greater than zero. 
 Without it, spellcheck will never suggest for something in the index.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-Thespellcheck.alternativeTermCountParameter

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 9:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

Good, I'll try.
But imagine I have 100 documents containing go pro and 150 documents
containing gopro.
Suggestions of the other term do not come up in any case.

2015-01-27 16:21 GMT+01:00 Dyer, James-2 [via Lucene] 
ml-node+s472066n4182254...@n3.nabble.com:

 I think the word break spellchecker will do what you want.  But, if I were
 you, I'd dial back maxChanges to 1 or 2.  You don't want it slicing a
 word into 10 parts or trying to combine 10 adjacent words.  You also need
 the minBreakLength to be no more than 2, if you want it to break go
 (length=2) off of gopro.

 James Dyer
 Ingram Content Group


 -Original Message-
 From: fabio.bozzo [mailto:[hidden email]
 http:///user/SendEmail.jtp?type=nodenode=4182254i=0]
 Sent: Tuesday, January 27, 2015 2:58 AM
 To: [hidden email] http:///user/SendEmail.jtp?type=nodenode=4182254i=1
 Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker

 I indexed an electronics e-commerce product catalog.

 This is a typical document from my collection:


 docs: [
   {
 prezzo_vendita_d: 39.9,
 codice_produttore_s: DK00150020,
 codice_s: 5.BAT.27407,
 descrizione: BATTERIA GO PRO HERO ,
 barcode_interno_s: 185323000958,
 categoria: Batterie,
 prezzo_acquisto_d: 16.12,
 marchio: GO PRO,
 data_aggiornamento_dt: 2012-06-21T00:00:00Z,
 id: 27407,
 _version_: 1491274123542790100
   },
   {
 codice_produttore_s: DK0052043,
 codice_s: 05.SP.42760,
 id: 42760,
 marchio: SP GADGETS,
 barcode_interno_s: 4028017520430,
 prezzo_acquisto_d: 34.4,
 data_aggiornamento_dt: 2014-11-04T00:00:00Z,
 descrizione: SP POS CASE GOPRO OLIVE LARGE,
 prezzo_vendita_d: 59.95,
 _version_: 1491274406746390500
   }
 ...]
 I want my spellchecker to suggest go pro to users searching gopro
 (without whitespace).

 I also want users searching go pro to find gopro products, too.

 Here's a little bit of my configuration:

 *schema.xml*
 field name=marchio type=string indexed=true stored=true/
 field name=categoria type=string indexed=true
 stored=true/
 field name=fornitore type=string indexed=true
 stored=true/
 field name=descrizione type=string indexed=true
 stored=true/

 field name=catch_all_original type=text_general
 indexed=true
 stored=false multiValued=true /
 field name=catch_all type=text_it indexed=true
 stored=false
 multiValued=true /

 copyField source=marchio dest=catch_all /
 copyField source=categoria dest=catch_all /
 copyField source=descrizione dest=catch_all /
 copyField source=fornitore dest=catch_all /

 copyField source=marchio dest=catch_all_original /
 copyField source=categoria dest=catch_all_original /
 copyField source=descrizione dest=catch_all_original /
 copyField source=fornitore dest=catch_all_original /
 ...

 fieldType name=text_it class=solr.TextField
 positionIncrementGap=100
 analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
 preserveOriginal=1 /

 filter class=solr.ElisionFilterFactory
 ignoreCase=true
 articles=lang/contractions_it.txt/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.ASCIIFoldingFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_it.txt format=snowball /
 filter class=solr.ItalianLightStemFilterFactory/
 /analyzer
 analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
 preserveOriginal=1 /

 filter class=solr.ElisionFilterFactory
 ignoreCase=true
 articles=lang/contractions_it.txt/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.ASCIIFoldingFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=lang/stopwords_it.txt format=snowball /

 filter class=solr.ItalianLightStemFilterFactory/
 filter class

RE: SpellingQueryConverter and query parsing

2015-01-27 Thread Dyer, James
Having worked with the spellchecking code for the last few years, I've often 
wondered the same thing, but I never looked seriously into it.  I'm sure 
there's probably some serious hurdles, hence the Query Converter.  The easy 
thing to do here is to use spellcheck.q, and then pass in space-delimited 
keywords.  This bypasses the query converter entirely for custom situations 
like yours.

But please, if you find a way to plug the actual query parser into spellcheck, 
consider opening a jira  contributing the code, even if what you end up with 
isn't in a final polished state for general use.

James Dyer
Ingram Content Group


-Original Message-
From: Scott Stults [mailto:sstu...@opensourceconnections.com]
Sent: Tuesday, January 27, 2015 11:26 AM
To: solr-user@lucene.apache.org
Subject: SpellingQueryConverter and query parsing

Hello!

SpellingQueryConverter parses the incoming query in sort of a quick and
dirty way with a regular expression. Is there a reason the query string
isn't parsed with the _actual_ parser, if one was configured for that type
of request? Even better, could the parsed query object be added to the
response in some way so that the query wouldn't need to be parsed twice?
The individual terms could then be visited and substituted in-place without
needing to worry about preserving the meaning of operators in the query.

The motive in my question is, I may need to implement a QueryConverter
because I'm using a custom parser, and using that parser in the
QueryConverter itself seems like the right thing to do. That wasn't done
though in SpellingQueryConverter, so I wan't to find out why before I go
blundering into a known minefield.


Thanks!
-Scott


RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread Dyer, James
I think the word break spellchecker will do what you want.  But, if I were you, 
I'd dial back maxChanges to 1 or 2.  You don't want it slicing a word into 10 
parts or trying to combine 10 adjacent words.  You also need the 
minBreakLength to be no more than 2, if you want it to break go (length=2) 
off of gopro.  

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 2:58 AM
To: solr-user@lucene.apache.org
Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker

I indexed an electronics e-commerce product catalog.

This is a typical document from my collection:


docs: [
  {
prezzo_vendita_d: 39.9,
codice_produttore_s: DK00150020,
codice_s: 5.BAT.27407,
descrizione: BATTERIA GO PRO HERO ,
barcode_interno_s: 185323000958,
categoria: Batterie,
prezzo_acquisto_d: 16.12,
marchio: GO PRO,
data_aggiornamento_dt: 2012-06-21T00:00:00Z,
id: 27407,
_version_: 1491274123542790100
  },
  {
codice_produttore_s: DK0052043,
codice_s: 05.SP.42760,
id: 42760,
marchio: SP GADGETS,
barcode_interno_s: 4028017520430,
prezzo_acquisto_d: 34.4,
data_aggiornamento_dt: 2014-11-04T00:00:00Z,
descrizione: SP POS CASE GOPRO OLIVE LARGE,
prezzo_vendita_d: 59.95,
_version_: 1491274406746390500
  }
...]
I want my spellchecker to suggest go pro to users searching gopro
(without whitespace).

I also want users searching go pro to find gopro products, too.

Here's a little bit of my configuration:

*schema.xml*
field name=marchio type=string indexed=true stored=true/
field name=categoria type=string indexed=true stored=true/
field name=fornitore type=string indexed=true stored=true/
field name=descrizione type=string indexed=true
stored=true/

field name=catch_all_original type=text_general indexed=true
stored=false multiValued=true /
field name=catch_all type=text_it indexed=true stored=false
multiValued=true /

copyField source=marchio dest=catch_all /
copyField source=categoria dest=catch_all /
copyField source=descrizione dest=catch_all /
copyField source=fornitore dest=catch_all /

copyField source=marchio dest=catch_all_original /
copyField source=categoria dest=catch_all_original /
copyField source=descrizione dest=catch_all_original /
copyField source=fornitore dest=catch_all_original /
...

fieldType name=text_it class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
preserveOriginal=1 /

filter class=solr.ElisionFilterFactory ignoreCase=true
articles=lang/contractions_it.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_it.txt format=snowball /
filter class=solr.ItalianLightStemFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1
preserveOriginal=1 /

filter class=solr.ElisionFilterFactory ignoreCase=true
articles=lang/contractions_it.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=lang/stopwords_it.txt format=snowball /

filter class=solr.ItalianLightStemFilterFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
/analyzer
/fieldType

br /

*solr-config.xml*
requestHandler name=/select class=solr.SearchHandler

lst name=defaults
str name=echoParamsexplicit/str
int name=rows10/int
str name=dfcatch_all/str

str name=spellcheckon/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count5/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollationTries5/str
str name=spellcheck.maxCollations3/str
/lst

arr name=last-components
   

RE: Stop word suggestions are coming when I indexed sentence using ShingleFilterFactory

2015-01-27 Thread Dyer, James
Can you give a little more information as to how you have the spellchecker 
configured in solrsonfig.xml?  Also, it would help if you showed a query and 
the spell check response and then explain what you wanted it to return vs what 
it actually returned.  

My guess is that the stop words you mention exist in your spelling index and 
you're not using the alternativeTermCount parameter, which tells it to 
suggest for terms that exist in the index.

I take it also you're using shingles to get word-break suggestions?  You might 
have better luck with this using WordBreakSolrSpellchecker instead of shingles.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, January 27, 2015 5:06 AM
To: solr-user@lucene.apache.org
Subject: Stop word suggestions are coming when I indexed sentence using 
ShingleFilterFactory

Hi,
  I am getting the suggestion of both correct words and misspell
words but not getting, stop words suggestions. Why? Even I am not using
solr.StopFilterFactory.


Schema.xml :

*field name=gram type=textSpell indexed=true stored=true
required=true multiValued=false/*

fieldType name=*textSpell* class=solr.TextField
positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.ShingleFilterFactory maxShingleSize=5
minShingleSize=2 outputUnigrams=true/

 /analyzer
 analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/

filter class=solr.ShingleFilterFactory maxShingleSize=5
minShingleSize=2 outputUnigrams=true/

/analyzer
/fieldType


RE: can't make sense of spellchecker results when using techproducts example

2015-01-09 Thread Dyer, James
Chris,

- DirectSpellChecker has a setting for minPrefix which the techproducts 
example sets to 1 (also the default).  So it will never try to correct the 
first character.  I think this is both a performance optimization and is based 
on the assumption that we rarely misspell the first character.  This is why it 
will not  correct hell to dell.  I think it will allow you to set this to 
0, if you want your sample query to work.

- The maxCollationTries feature re-writes q / spellcheck.q, and then 
using all the other parameters, queries internally to see if there any hits.  
This doesn't play very well when q.op=OR / mm=1.  So when you see a 
collation like here ultrasharp / heat ... etc, you see it is indeed getting 
some hits.  So it considers it a valid query re-write, despite the absurdity.  
We could improve this example config by adding 
spellcheck.collateParam.q.op=AND to the defaults.  (When using dismax, you 
would add spellcheck.collateParam.mm=100%)  Also, while the collateParam 
functionality is in the old Solr wiki, it doesn't seem to be in the reference 
manual, so we probably should add it as this would be pretty important for a 
lot of users.

- Unless using the legacy IndexBasedSpellChecker / FileBasedSpellchecker, you 
need not use spellcheck.build.  Its a no-op for both Direct and WordBreak, as 
these do not use sidecar indexes.

So without changing the config, these queries illustrate the spellchecker 
pretty well, including the word-break functionality.

http://localhost:8983/solr/techproducts/spell?spellcheck.q=dzll+ultra%20sharpdf=textspellcheck=truespellcheck.collateParam.q.op=AND
http://localhost:8983/solr/techproducts/spell?spellcheck.q=dellultrasharpdf=textspellcheck=truespellcheck.collateParam.q.op=AND

Spellcheck has a lot of gotchas, and I would wish we could dream up a way to 
make it easy for people.  I remember it being a struggle for me when I was a 
new user, and I know we get lots of questions on the user-list about it.

My apologies to you for not answering this sooner.

James Dyer
Ingram Content Group


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, December 17, 2014 6:49 PM
To: solr-user@lucene.apache.org
Subject: can't make sense of spellchecker results when using techproducts 
example


Ok, so i've been working on updating hte ref guide to account for hte new 
way to run the examples in 5.0.

The spell checking page...

https://cwiki.apache.org/confluence/display/solr/Spell+Checking

...has some examples that loosely corroloate to the techproducts 
example, but even if you ignore the specifics of those examples, i need 
help understanding the basic behavior of hte spellchecker as configured in 
the techproducts

Assuming you run this...

bin/solr -e techproducts

with that example running  those docs indexed, this URL gives me 
results i can't explain...

http://localhost:8983/solr/techproducts/spell?spellcheck.q=hell+ultrashardf=textspellcheck=truespellcheck.build=true

(see below)

1) dell is not listed as a possible suggestion for for hell (even if 
the dictionary thinks hold is a better suggestion, why isn't dell even 
included in the list of possibilities?

2) in the collation section, i can't make any sense of what these 
results mean -- how is hello ultrasharp a suggested collationQuery when 
*none* of the example docs contain both hello and ultrasharp ?

http://localhost:8983/solr/techproducts/select?df=textq=%2Bhello+%2Bultrasharp


So WTF is up with these spell check results?


?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
   int name=status0/int
   int name=QTime15/int
/lst
str name=commandbuild/str
result name=response numFound=0 start=0
/result
lst name=spellcheck
   lst name=suggestions
 lst name=hell
   int name=numFound6/int
   int name=startOffset0/int
   int name=endOffset4/int
   int name=origFreq0/int
   arr name=suggestion
 lst
   str name=wordhello/str
   int name=freq1/int
 /lst
 lst
   str name=wordhere/str
   int name=freq2/int
 /lst
 lst
   str name=wordheat/str
   int name=freq1/int
 /lst
 lst
   str name=wordhold/str
   int name=freq1/int
 /lst
 lst
   str name=wordhtml/str
   int name=freq1/int
 /lst
 lst
   str name=wordhéllo/str
   int name=freq1/int
 /lst
   /arr
 /lst
 lst name=ultrashar
   int name=numFound1/int
   int name=startOffset5/int
   int name=endOffset14/int
   int name=origFreq0/int
   arr name=suggestion
 lst
   str name=wordultrasharp/str
   int name=freq1/int
 /lst
   /arr
 /lst
   /lst
   bool name=correctlySpelledfalse/bool
   lst name=collations
 lst name=collation
   str name=collationQueryhello ultrasharp/str
   int 

RE: Spellchecker delivers far too few suggestions

2014-12-18 Thread Dyer, James
Martin,

If you would like to get suggestions even for terms occurring in the index, set 
spellcheck.alternativeTermCount to a value 0 .  You can use the same value 
as for spellcheck.count, or a lower value if you want fewer results than for 
terms not in the index.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-The{{spellcheck.alternativeTermCount}}Parameter

With this, you might also want to set spellcheck.maxResultsForSuggest to a 
value 0.  This will prevent the spellchecker from doing work even when enough 
results returned that you wouldn't want to suggest anything to the user.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-The{{spellcheck.maxResultsForSuggest}}Parameter

Used with the maxCollationTries parameter, you should be getting fairly good 
did-you-mean-style suggestions.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-The{{spellcheck.maxCollationTries}}Parameter

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Martin Dietze [mailto:mdie...@gmail.com] 
Sent: Thursday, December 18, 2014 3:02 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecker delivers far too few suggestions

On 17 December 2014 at 18:08, Erick Erickson erickerick...@gmail.com wrote:
 This is seeming like a puzzler...

I’ve got to the point that I do get suggestions if I find no document
at all. The problem was seemingly caused by the way I quoted my search
queries.

Still I don’t get suggestions for terms that are in the index. For
instance, if I create a document that contains the term “bnak”, I
would like to display a result like: “found one occurrence of ‘bnak’,
but did you mean: list of suggestions”.

Is there a setting I’ve missed?


-- 
-- mdie...@gmail.com --/-- mar...@the-little-red-haired-girl.org 
- / http://herbert.the-little-red-haired-girl.org / -



RE: Multiword mispellings

2014-12-18 Thread Dyer, James
Matt,

Unfortunately this kind of correction is not supported.  The word break spell 
checker works independently from the distance-based spellcheckers so it cannot 
correct both whitespace problems and other misspellings together.  

If you really need this, then you'll need to go with the shingle approach where 
you create your spellcheck field with both the base terms and also shingles 
(adjacent terms combined as 1 term).  In this case, rock piont would be 
considered a single term and the string difference would be 2, with one 
insertion (the space) and one transposition.  I believe there is a field 
analyzer out there that will do this for you.  I think you're supposed to set 
it up for both at index time (to catch when the user omits whitespace) and 
query time (to catch when the user adds whitespace).

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Thursday, December 18, 2014 2:40 PM
To: solr-user@lucene.apache.org
Subject: Multiword mispellings

Is it possible for Solr's SpellCheckComponent to suggest Rockpoint if the
user mistypes Rock piont. Currently I have it making the correct
suggestions when I have Rockpiont or Rock point but not the example I
gave. Here are the relevant parts of my config files:

https://gist.github.com/halogenandtoast/c7f9335f7fa94f7b03d8


RE: WordBreakSolrSpellChecker Usage

2014-12-16 Thread Dyer, James
Matt,

Seeing the response, my guess is you have point in your index, and that it 
has a higher frequency than rockpoint.  By default the spellchecker will 
never try to correct something that exists in your index.  Adding 
spellcheck.onlyMorePopular=true might help, but only if the correction has a 
higher frequency than the original.  Try using 
spellcheck.alternativeTermCount=n instead of 
spellcheck.onlyMorePopular=true.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
for more information.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Monday, December 15, 2014 10:23 AM
To: solr-user@lucene.apache.org
Subject: Re: WordBreakSolrSpellChecker Usage

I think you were right about maxChanges, that does seem get rid of the
ridiculous values. However I don't seem to be getting anything reasonable.
Most variations look something like:

http://localhost:8982/solr/development/select?q=Rock+pointfq=type%3ACompanywt=rubyindent=truedefType=edismaxqf=name_textstopwords=truelowercaseOperators=truespellcheck=truespellcheck.count=20spellcheck.onlyMorePopular=truespellcheck.extendedResults=truespellcheck.collate=truespellcheck.maxCollations=1spellcheck.maxCollationTries=10spellcheck.accuracy=0.5

{
  'responseHeader'={
'status'=0,
'QTime'=20},
  'response'={'numFound'=0,'start'=0,'docs'=[]
  },
  'spellcheck'={
'suggestions'=[
  'rock',{
'numFound'=5,
'startOffset'=0,
'endOffset'=4,
'origFreq'=3,
'suggestion'=[{
'word'='rocky',
'freq'=3},
  {
'word'='brook',
'freq'=6},
  {
'word'='york',
'freq'=460},
  {
'word'='oak',
'freq'=7},
  {
'word'='boca',
'freq'=3}]},
  'correctlySpelled',false]}}


I'm going to post both my solrconfig.xml and schema.xml because maybe
I'm just doing something crazy. They can both be found here:
https://gist.github.com/halogenandtoast/76fd5dcfae1c4edeba30


On Thu, Dec 11, 2014 at 1:19 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 Matt,

 There is no exact number here, but I would think most people would want
 count to be maybe 10-20.  Increasing this incurs a very small performance
 penalty for each term it generates suggestions for, but you probably won't
 notice a difference.  For maxCollationTries, 5 is a reasonable number but
 you might see improved collations if this is also perhaps 10.  With this
 one, you get a much larger performance penalty, but only when it need to
 try more combinations to return the maxCollations.  In your case you have
 this at 5 also, right?  I would reduce this to the maximum number of
 re-written queries your application or users is actually going to use.  In
 a lot of cases, 1 is the right number here.  This would improve performance
 for you in some cases.

 Possibly the reason “Rock point”  “Rockpoint” is failing is because you
 have maxChanges set to 10.  This tells it you are willing for it to break
 a word into 10 separate parts, or to combine up to 10 adjacent words into
 1.  Having taken a quick glance at the code, I think what is happening is
 it is trying things like r ock p oint and r o ck p o int, etc and never
 getting to your intended result.  In a typical scenario I would set
 maxChanges to 1-3, and often 1 is probably the most appropriate value
 here.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Matt Mongeau [mailto:halogenandto...@gmail.com]
 Sent: Thursday, December 11, 2014 11:34 AM
 To: solr-user@lucene.apache.org
 Subject: Re: WordBreakSolrSpellChecker Usage

 Is there a suggested value for this. I bumped them up to 20 and still
 nothing has seemed to change.

 On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James james.d...@ingramcontent.com
 
 wrote:

  My first guess here, is seeing it works some of the time but not others,
  is that these values are too low:
 
  str name=spellcheck.maxCollationTries5/str
  str name=spellcheck.count5/str
 
  You know spellcheck.count is too low if the suggestion you want is not in
  the suggestions part of the response, but increasing it makes it get
  included.
 
  You know that spellcheck.maxCollationTries is too low if it exists in
  suggestions but it is not getting suggested in the collation section.
 
  James Dyer
  Ingram Content Group
  (615) 213-4311
 
 
  -Original Message-
  From: Matt Mongeau [mailto:halogenandto...@gmail.com]
  Sent: Wednesday, December 10, 2014 12:43 PM
  To: solr-user@lucene.apache.org
  Subject: Fwd: WordBreakSolrSpellChecker Usage
 
  If I have my search component setup like this
  https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have
 an
  entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?
 
  This doesn't seem to be the case, but it works for Blackstone

RE: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Dyer, James
My first guess here, is seeing it works some of the time but not others, is 
that these values are too low:

str name=spellcheck.maxCollationTries5/str
str name=spellcheck.count5/str 

You know spellcheck.count is too low if the suggestion you want is not in the 
suggestions part of the response, but increasing it makes it get included.

You know that spellcheck.maxCollationTries is too low if it exists in 
suggestions but it is not getting suggested in the collation section.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Wednesday, December 10, 2014 12:43 PM
To: solr-user@lucene.apache.org
Subject: Fwd: WordBreakSolrSpellChecker Usage

If I have my search component setup like this
https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?

This doesn't seem to be the case, but it works for Blackstone with Black
stone. Any ideas on what I might be doing wrong?


RE: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Dyer, James
Matt,

There is no exact number here, but I would think most people would want count 
to be maybe 10-20.  Increasing this incurs a very small performance penalty for 
each term it generates suggestions for, but you probably won't notice a 
difference.  For maxCollationTries, 5 is a reasonable number but you might 
see improved collations if this is also perhaps 10.  With this one, you get a 
much larger performance penalty, but only when it need to try more combinations 
to return the maxCollations.  In your case you have this at 5 also, right?  I 
would reduce this to the maximum number of re-written queries your application 
or users is actually going to use.  In a lot of cases, 1 is the right number 
here.  This would improve performance for you in some cases.

Possibly the reason “Rock point”  “Rockpoint” is failing is because you have 
maxChanges set to 10.  This tells it you are willing for it to break a word 
into 10 separate parts, or to combine up to 10 adjacent words into 1.  Having 
taken a quick glance at the code, I think what is happening is it is trying 
things like r ock p oint and r o ck p o int, etc and never getting to your 
intended result.  In a typical scenario I would set maxChanges to 1-3, and 
often 1 is probably the most appropriate value here.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Thursday, December 11, 2014 11:34 AM
To: solr-user@lucene.apache.org
Subject: Re: WordBreakSolrSpellChecker Usage

Is there a suggested value for this. I bumped them up to 20 and still
nothing has seemed to change.

On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James james.d...@ingramcontent.com
wrote:

 My first guess here, is seeing it works some of the time but not others,
 is that these values are too low:

 str name=spellcheck.maxCollationTries5/str
 str name=spellcheck.count5/str

 You know spellcheck.count is too low if the suggestion you want is not in
 the suggestions part of the response, but increasing it makes it get
 included.

 You know that spellcheck.maxCollationTries is too low if it exists in
 suggestions but it is not getting suggested in the collation section.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Matt Mongeau [mailto:halogenandto...@gmail.com]
 Sent: Wednesday, December 10, 2014 12:43 PM
 To: solr-user@lucene.apache.org
 Subject: Fwd: WordBreakSolrSpellChecker Usage

 If I have my search component setup like this
 https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
 entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?

 This doesn't seem to be the case, but it works for Blackstone with Black
 stone. Any ideas on what I might be doing wrong?



RE: Word Break Spell Checker Implementation algorithm

2014-10-21 Thread Dyer, James
David,

I do not know of a published algorithm for this.  All it does is in the case of 
terms with 0 frequency, it checks the document frequency of the various parts 
that can be made from the terms by breaking them and/or by combining adjacent 
terms. There are tuning parameters available that let you limit how much work 
it will do to try and find a suitable replacement.  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/WordBreakSpellChecker.html
 .

This of course is slower than indexing shingles as the work is done at query 
time vs index time.  But it saves the added index size and indexing time 
required to index the shingles separately.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: David Philip [mailto:davidphilipshe...@gmail.com] 
Sent: Monday, October 20, 2014 9:07 AM
To: solr-user@lucene.apache.org
Subject: Word Break Spell Checker Implementation algorithm

Hi,

Could you please point me to the link where I can learn about the
theory behind the implementation of word break spell checker?
Like we know that the solr's DirectSolrSpellCheck component uses levenstian
distance algorithm, what is the algorithm used behind the word break spell
checker component? How does it detects the space that is needed if it
doesn't use shingle?


Thanks - David


RE: Data Import Handler for CSV file

2014-10-10 Thread Dyer, James
Nabil,

Unfortunately, the out-of-the box functionality for DIH lacks a lot of what the 
csv handler has to offer.  There is a LineEntityProcessor (see 
http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor), but this 
will just output each line in a field called rawLine.  It is up to you to 
then write a Transformer that will split it on commas (or better, use a lib 
like commons-csv to process it).

There is an extension available as an old patch that will give 
LineEntityProcessor the ability to handle delimited and fixed-width files.  
However, you'll need to apply the patch yourself and build DIH from source.   
See https://issues.apache.org/jira/browse/SOLR-2549 .

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: nabil Kouici [mailto:koui...@yahoo.fr] 
Sent: Thursday, October 09, 2014 4:26 PM
To: solr-user@lucene.apache.org; Ahmet Arslan
Subject: Re: Data Import Handler for CSV file

Hi Ahmet,
 
Thank you for this replay. Agree with you that csv update handler is fast but 
we need always to specify columns in the http request. In addition, I don't 
find documentation how to use csv update from solrj.

Could you please send me an example of DIH to load CSV file?

Regards,
Nabil.


Le Jeudi 9 octobre 2014 21h05, Ahmet Arslan iori...@yahoo.com.INVALID a écrit 
:
 


Hi Nabil,

whats wrong with csv update handler? It is quite fast.

By the way DIH has line entity processor, yes it is doable with existing DIH 
components.

Ahmet



On Thursday, October 9, 2014 9:58 PM, nabil Kouici koui...@yahoo.fr wrote:





Hi All,

Is it possible to have in solr a DIH to load from CSV file. Actually I'm using 
update/csv handler but not responding to my need.

Regards,
NKI.



RE: DIH - cacheImpl=SortedMapBackedCache - empty rows from sub entity

2014-10-02 Thread Dyer, James
Try using the cacheKey/cacheLookup parameters instead:

entity 
 name=en1 
 pk=id 
 transformer=DateFormatTransformer 
 query=SELECT id, product FROM table WHERE product = 'abc'
   
  entity 
   name=en2 
   cacheKey=id
   cacheLookup=en1.id
   transformer=DateFormatTransformer 
   cacheImpl=SortedMapBackedCache
   query=SELECT id, code FROM table2 
  /
/entity

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: stockii [mailto:stock.jo...@googlemail.com] 
Sent: Thursday, October 02, 2014 9:19 AM
To: solr-user@lucene.apache.org
Subject: DIH - cacheImpl=SortedMapBackedCache - empty rows from sub entity

Hello

i am fighting with cacheImpl=SortedMapBackedCache.

I want to refactor my ugly entities and so i try out sub-entities with
caching.
My Problem is that my cached subquery do not return any values from the
select. but why?

thats my entity 
entity name=en1 pk=id transformer=DateFormatTransformer 
  query=SELECT id, product FROM table WHERE product = 'abc'

entity name=en2 pk=id transformer=DateFormatTransformer
cacheImpl=SortedMapBackedCache
query= SELECT id, code FROM table2 
where=id = '${en1.id}'/
/entity


this is very fast an clear and nice... but it does not work. all from table2
is not coming to my index =(
BUT if i remove the line with cacheImpl=SortedMapBackedCache all data is
present, but every row is selecte each by each.
i thought that this construct, hopefully replace my ugly big join-query in a
single entity!?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-cacheImpl-SortedMapBackedCache-empty-rows-from-sub-entity-tp4162316.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Spellchecking and suggesting part numbers

2014-09-24 Thread Dyer, James
Alexander,

You could use a higher value for spellcheck.count, maybe 20 or so, then in your 
application pick out the suggestions that make changes on the right side.

Another option is to use DirectSolrSpellChecker (usually a better choice 
anyhow) and set the minPrefix field.  This will require up to n characters on 
the left side to match before it will make suggestions.  Taking a quick look at 
the code, it seems to me it won't try and correct anything in this prefix 
region also.  So perhaps you can set this to 2-4 (default=1).  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29
 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] 
Sent: Wednesday, September 24, 2014 9:06 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking and suggesting part numbers

Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:

searchComponent name=spellcheck_part 
class=solr.SpellCheckComponent
lst name=spellchecker
str name=classnamesolr.IndexBasedSpellChecker/str
str name=spellcheckIndexDir./spellchecker/str
str name=fielddid_you_mean_part/str
/lst
/searchComponent
requestHandler name=/spell_part class=solr.SearchHandler 
startup=lazy
lst name=defaults
str name=dfdid_you_mean_part/str
str name=spellcheckon/str
/lst
arr name=last-components
strspellcheck_part/str
/arr
/requestHandler


fieldType name=did_you_mean_part class=solr.TextField 
positionIncrementGap=100
analyzer type=index
charFilter 
class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory 
minGramSize=1 maxGramSize=20 side=front/
filter 
class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
charFilter 
class=solr.PatternReplaceCharFilterFactory pattern=[\s]+ replacement=/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory 
minGramSize=1 maxGramSize=20 side=front/
/analyzer
/fieldType

Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander




RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
Nathaniel,

Can you show us all of the parameters you are sending to the spellchecker?  
When you specify alternativeTermCount with spellcheck.q=quidam, what are 
the terms you expect to get back?  Also, are you getting any query results 
back?  If you are using a q that returns results, or more results than you 
specify for spellcheck.maxResultsForSuggest, spellcheck won't give you 
anything regardless of what you put for spellcheck.q.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 8:08 AM
To: solr-user@lucene.apache.org
Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hello,

I'm trying find the best way to fake the terms component for fuzzy 
queries. That is, I need the full set of index terms for each of the 
two queries quidam~1 and quidam~2.

I tried defining two suggesters with FuzzyLookupFactory, with 
maxEdits=1 and 2 respectively, but the results for quidam~1 include 
suffixes like quodammodo, which makes sense for a suggester but isn't 
what I want here.

Now I'm trying with the spell-checker. As far as I can see, 
IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
to distinguish between my two queries. DirectSolrSpellChecker seems 
like it should work, ie:

  searchComponent name=fuzzyterms class=solr.SpellCheckComponent
lst name=spellchecker
  str name=namefuzzy1/str
  str name=classnamesolr.DirectSolrSpellChecker/str
 int name=maxEdits1/int
...
/lst
lst name=spellchecker
  str name=namefuzzy2/str
  str name=classnamesolr.DirectSolrSpellChecker/str
 int name=maxEdits2/int
...
/lst
  /searchComponent

However the parameter spellcheck.alternativeTermCount has no effect, so 
the query spellcheck.q=quidam gives no results, but 
spellcheck.q=quiam (which doesn't exist in the index) gives the 
expected terms.

Am I missing something? Or is there a better way to do this?

Many thanks for any help and ideas,

Nathaniel


RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
Did you try spellcheck.alternativeTermCount with DirectSolrSpellChecker?  You 
can set it to whatever low value you actually want it to return back to you 
(perhaps 20 suggestions max?).

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 9:36 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hi James,

The request 
/spellcheck?spellcheck=truespellcheck.q=quiamspellcheck.dictionary=fuzzy2 
returns

quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
quis, quae, quas, quem, quid, quin, qui, qua

Replacing quiam (not in the index) by quidam (in the index) returns 
nothing at all, but I want it to return

quidam, quam, quia, quidem, quadam, quodam, quedam, ...

When I was using the same parameters with IndexBasedSpellChecker, by 
setting a high alternativeTermCount, I got results for both. But as I 
said, then I can't differentiate the different maxEdits.

The request handler is:

 requestHandler name=/spellcheck 
class=org.apache.solr.handler.component.SearchHandler
lst name=defaults
  str name=spellcheck.dictionaryfuzzy1/str
  str name=spellcheck.count20/str
  int name=spellcheck.alternativeTermCount100/int
/lst
arr name=last-components
  strfuzzyterms/str
/arr
  /requestHandler

Thanks!

Nathaniel

On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
james.d...@ingramcontent.com wrote:
 Nathaniel,
 
 Can you show us all of the parameters you are sending to the 
 spellchecker?  When you specify alternativeTermCount with 
 spellcheck.q=quidam, what are the terms you expect to get back?  
 Also, are you getting any query results back?  If you are using a q 
 that returns results, or more results than you specify for 
 spellcheck.maxResultsForSuggest, spellcheck won't give you anything 
 regardless of what you put for spellcheck.q.
 
 James Dyer
 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
 Sent: Monday, September 22, 2014 8:08 AM
 To: solr-user@lucene.apache.org
 Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount
 
 Hello,
 
 I'm trying find the best way to fake the terms component for fuzzy 
 queries. That is, I need the full set of index terms for each of the 
 two queries quidam~1 and quidam~2.
 
 I tried defining two suggesters with FuzzyLookupFactory, with 
 maxEdits=1 and 2 respectively, but the results for quidam~1 include 
 suffixes like quodammodo, which makes sense for a suggester but 
 isn't 
 what I want here.
 
 Now I'm trying with the spell-checker. As far as I can see, 
 IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
 to distinguish between my two queries. DirectSolrSpellChecker seems 
 like it should work, ie:
 
   searchComponent name=fuzzyterms class=solr.SpellCheckComponent
 lst name=spellchecker
   str name=namefuzzy1/str
   str name=classnamesolr.DirectSolrSpellChecker/str
  int name=maxEdits1/int
   ...
 /lst
 lst name=spellchecker
   str name=namefuzzy2/str
   str name=classnamesolr.DirectSolrSpellChecker/str
  int name=maxEdits2/int
 ...
 /lst
   /searchComponent
 
 However the parameter spellcheck.alternativeTermCount has no effect, 
 so 
 the query spellcheck.q=quidam gives no results, but 
 spellcheck.q=quiam (which doesn't exist in the index) gives the 
 expected terms.
 
 Am I missing something? Or is there a better way to do this?
 
 Many thanks for any help and ideas,
 
 Nathaniel


RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
You cannot use 100% because, as you say, 1 is intepreted as 1 document.  But 
you can do something like 99.9% .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 11:39 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Thank you, that works!

I'd already tried several values for maxQueryFrequency, but apparently 
without properly understanding it. I was confused by the line A lower 
threshold is better for small indexes when in fact I need a high value 
like 0.99, so every term returns suggestions. (Is it possible to set it 
to 100%? Because 1 gets interpreted as an absolute value.)

Nathaniel

On Mon, Sep 22, 2014 at 6:17 , Dyer, James 
james.d...@ingramcontent.com wrote:
 DirectSpellChecker defaults to not suggest anything for terms that 
 occur in 1% or more of the total documents in the index.  You can set 
 this higher in solrconfig.xml either with a fractional percent or a 
 whole-number absolute number of documents.
 
 See 
 http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMaxQueryFrequency%28float%29
  
 
 James Dyer
 Ingram Content Group
 (615) 213-4311
 
 
 -Original Message-
 From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
 Sent: Monday, September 22, 2014 9:41 AM
 To: solr-user@lucene.apache.org
 Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
 alternativeTermCount
 
 Yep, I tried it both as a default param in the request handler (as in 
 the config I sent), and in the request, but with no effect... That's 
 what surprised me, since it seems it should work.
 
 On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
 james.d...@ingramcontent.com wrote:
  Did you try spellcheck.alternativeTermCount with 
  DirectSolrSpellChecker?  You can set it to whatever low value you 
  actually want it to return back to you (perhaps 20 suggestions 
 max?).
  
  James Dyer
  Ingram Content Group
  (615) 213-4311
  
  
  -Original Message-
  From: Nathaniel Rudavsky-Brody 
 [mailto:nathaniel.rudav...@gmail.com] 
  Sent: Monday, September 22, 2014 9:36 AM
  To: solr-user@lucene.apache.org
  Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
  alternativeTermCount
  
  Hi James,
  
  The request 
  
 /spellcheck?spellcheck=truespellcheck.q=quiamspellcheck.dictionary=fuzzy2 
  returns
  
  quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
  quis, quae, quas, quem, quid, quin, qui, qua
  
  Replacing quiam (not in the index) by quidam (in the index) returns 
  nothing at all, but I want it to return
  
  quidam, quam, quia, quidem, quadam, quodam, quedam, ...
  
  When I was using the same parameters with IndexBasedSpellChecker, 
 by 
  setting a high alternativeTermCount, I got results for both. But as 
 I 
  said, then I can't differentiate the different maxEdits.
  
  The request handler is:
  
   requestHandler name=/spellcheck 
  class=org.apache.solr.handler.component.SearchHandler
  lst name=defaults
str name=spellcheck.dictionaryfuzzy1/str
str name=spellcheck.count20/str
int name=spellcheck.alternativeTermCount100/int
  /lst
  arr name=last-components
strfuzzyterms/str
  /arr
/requestHandler
  
  Thanks!
  
  Nathaniel
  
  On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
  james.d...@ingramcontent.com wrote:
   Nathaniel,
   
   Can you show us all of the parameters you are sending to the 
   spellchecker?  When you specify alternativeTermCount with 
   spellcheck.q=quidam, what are the terms you expect to get back? 
  
   Also, are you getting any query results back?  If you are using a 
  q 
   that returns results, or more results than you specify for 
   spellcheck.maxResultsForSuggest, spellcheck won't give you 
  anything 
   regardless of what you put for spellcheck.q.
   
   James Dyer
   Ingram Content Group
   (615) 213-4311
   
   
   -Original Message-
   From: Nathaniel Rudavsky-Brody 
  [mailto:nathaniel.rudav...@gmail.com] 
   Sent: Monday, September 22, 2014 8:08 AM
   To: solr-user@lucene.apache.org
   Subject: fuzzy terms, DirectSolrSpellChecker and 
  alternativeTermCount
   
   Hello,
   
   I'm trying find the best way to fake the terms component for 
  fuzzy 
   queries. That is, I need the full set of index terms for each of 
  the 
   two queries quidam~1 and quidam~2.
   
   I tried defining two suggesters with FuzzyLookupFactory, with 
   maxEdits=1 and 2 respectively, but the results for quidam~1 
  include 
   suffixes like quodammodo, which makes sense for a suggester but 
   isn't 
   what I want here.
   
   Now I'm trying with the spell-checker. As far as I can see, 
   IndexBasedSpellChecker doesn't let me set maxEdits, so I can't 
 use 
  it 
   to distinguish between my two queries. DirectSolrSpellChecker 
 seems

RE: Solr Spellcheck suggestions only return from /select handler when returning search results

2014-09-11 Thread Dyer, James
Thomas,

Yes, you are right about the problem being with the beginning of the word 
needing correction.  If you are using DirectSolrSpellChecker, you need to set 
the minPrefix parameter to 0.  Otherwise the default (1) requires the first 
character to match for it to try and correct it.

See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] 
Sent: Thursday, September 11, 2014 3:46 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Spellcheck suggestions only return from /select handler when 
returning search results

 Hi James, hi list,

I can confirm the existence of data that's within
1 Levenshtein step from ichtscheiben:

{
 responseHeader: {

status: 0,
 QTime: 0,
 params: {
 fl: name,spell,
 indent:
true,
 q: name:Sichtscheiben,
 _: 1410423419758,
 wt:
json,
 rows: 50
 }
 },
 response: {
 numFound: 6,
 start:
0,
 docs: [
 {
 name: Sichtscheiben,
 spell: Sichtscheiben

},
 {
 name: Sichtscheiben,
 spell: Sichtscheiben
 },
 {

name: Sichtscheiben,
 spell: Sichtscheiben
 },
 {
 name:
Sichtscheiben,
 spell: Sichtscheiben
 },
 {
 name:
Sichtscheiben,
 spell: Sichtscheiben
 },
 {
 name:
Sichtscheiben,
 spell: Sichtscheiben
 }
 ]
 }
}

Multiple records
exist that should match.

The note for alternativeTermCount is
appreciated.

I've tried another term: Transport. I get suggestions
when I use Transpor and Transpo, even Transpotr, but ransport
doesn't yield any suggestions. Maybe it's a question of the beginning of
a word and has not really anything to do with stemming.

Am 10.09.2014
15:19 schrieb Dyer, James: 

 Thomas,
 
 It looks like you've set
things up correctly in that while the user is searching against a
stemmed field (name), spellcheck is checking against a
lightly-analyzed copy of it (spell). This is the right way to do it as
spellcheck against stemmed forms is usually undesirable.
 
 But as
you've experienced, you will sometimes get results (due to stemming) and
also suggestions (because the spellechecker is looking at unstemmed
forms). If you do not want spellcheck to return anything when you get
results, you can set spellcheck.maxResultsForSuggest=0.
 
 Now
keeping in mind we're comparing unstemmed forms, can you verify you
indeed have something in your index that is within 2 edits of
ichtscheiben ? My guess is you probably don't, which would be why you
do not get spelling results in that case.
 
 Also, even if you do have
something within 2 edits, if ichtscheiben occurs in your index, by
default it won't try to correct it at all (even if the query returns
nothing, maybe because of filters or other required terms on the query).
In this case you need to set spellcheck.alternativeTermCount to a
non-zero value (try maybe 5).
 
 See
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount
[1] and following sections.
 
 James Dyer
 Ingram Content Group

(615) 213-4311
 
 -Original Message-
 From: Thomas Michael
Engelke [mailto:thomas.enge...@posteo.de] 
 Sent: Wednesday, September
10, 2014 5:00 AM
 To: Solr user
 Subject: Solr Spellcheck suggestions
only return from /select handler when returning search results
 

Hi,
 
 I'm experimenting with the Spellcheck component and have
therefor
 used the example configuration for spell checking to try
things out. My
 solrconfig.xml looks like this:
 
 searchComponent
name=spellcheck
 class=solr.SpellCheckComponent
 str

name=queryAnalyzerFieldTypespell/str
 !-- Multiple Spell

Checkers can be declared and used by this
 component
 --
 !-- a

spellchecker built from a field of the main index --
 lst

name=spellchecker
 str name=namedefault/str
 str

name=fieldspell/str
 str

name=classnamesolr.DirectSolrSpellChecker/str
 !-- the
spellcheck
 distance measure used, the default is the internal
levenshtein --
 str
 name=distanceMeasureinternal/str
 !--
uncomment this to require
 suggestions to occur in 1% of the
documents
 float
 name=thresholdTokenFrequency.01/float
 --

/lst
 !-- a
 spellchecker that can break or combine words. See
/spell handler below
 for usage --
 lst name=spellchecker

str
 name=namewordbreak/str
 str

name=classnamesolr.WordBreakSolrSpellChecker/str
 str

name=fieldspell/str
 str name=combineWordstrue/str
 str

name=breakWordstrue/str
 int name=maxChanges10/int

/lst
 
 /searchComponent
 
 And I've added the spellcheck
component to my
 /select request handler:
 
 requestHandler
name=/select
 class=solr.SearchHandler
 ...
 arr
name=last-components
 
 strspellcheck/str
 /arr

/requestHandler
 
 I have built up the
 spellchecker source in the
schema.xml from the name field:
 
 field
 name=spell type=spell
indexed=true stored=true required=false
 multiValued=false/

copyField source=name dest=spell
 maxChars=3 /
 ...

fieldType name=spell class=solr.TextField

positionIncrementGap=100
 analyzer

RE: Solr Spellcheck suggestions only return from /select handler when returning search results

2014-09-10 Thread Dyer, James
Thomas,

It looks like you've set things up correctly in that while the user is 
searching against a stemmed field (name), spellcheck is checking against a 
lightly-analyzed copy of it (spell).  This is the right way to do it as 
spellcheck against stemmed forms is usually undesirable.

But as you've experienced, you will sometimes get results (due to stemming) and 
also suggestions (because the spellechecker is looking at unstemmed forms).  If 
you do not want spellcheck to return anything when you get results, you can set 
spellcheck.maxResultsForSuggest=0.

Now keeping in mind we're comparing unstemmed forms, can you verify you indeed 
have something in your index that is within 2 edits of ichtscheiben ?  My 
guess is you probably don't, which would be why you do not get spelling results 
in that case.

Also, even if you do have something within 2 edits, if ichtscheiben occurs in 
your index, by default it won't try to correct it at all (even if the query 
returns nothing, maybe because of filters or other required terms on the 
query).  In this case you need to set spellcheck.alternativeTermCount to a 
non-zero value (try maybe 5).

See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
and following sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] 
Sent: Wednesday, September 10, 2014 5:00 AM
To: Solr user
Subject: Solr Spellcheck suggestions only return from /select handler when 
returning search results

 Hi,

I'm experimenting with the Spellcheck component and have therefor
used the example configuration for spell checking to try things out. My
solrconfig.xml looks like this:

 searchComponent name=spellcheck
class=solr.SpellCheckComponent
 str
name=queryAnalyzerFieldTypespell/str
 !-- Multiple Spell
Checkers can be declared and used by this
 component
 --
 !-- a
spellchecker built from a field of the main index --
 lst
name=spellchecker
 str name=namedefault/str
 str
name=fieldspell/str
 str
name=classnamesolr.DirectSolrSpellChecker/str
 !-- the spellcheck
distance measure used, the default is the internal levenshtein --
 str
name=distanceMeasureinternal/str
 !-- uncomment this to require
suggestions to occur in 1% of the documents
 float
name=thresholdTokenFrequency.01/float
 --
 /lst
 !-- a
spellchecker that can break or combine words. See /spell handler below
for usage --
 lst name=spellchecker
 str
name=namewordbreak/str
 str
name=classnamesolr.WordBreakSolrSpellChecker/str
 str
name=fieldspell/str
 str name=combineWordstrue/str
 str
name=breakWordstrue/str
 int name=maxChanges10/int
 /lst

/searchComponent

And I've added the spellcheck component to my
/select request handler:

 requestHandler name=/select
class=solr.SearchHandler
 ...
 arr name=last-components

strspellcheck/str
 /arr
 /requestHandler

I have built up the
spellchecker source in the schema.xml from the name field:

 field
name=spell type=spell indexed=true stored=true required=false
multiValued=false/
 copyField source=name dest=spell
maxChars=3 /
 ...
 fieldType name=spell class=solr.TextField
positionIncrementGap=100
 analyzer type=index
 tokenizer
class=solr.StandardTokenizerFactory/
 /analyzer
 analyzer
type=query
 tokenizer class=solr.StandardTokenizerFactory/

/analyzer
 /fieldType

As I'm querying the /select request handler,
I should get spellcheck suggestions with my results. However, I rarely
get a suggestion. Examples:

query: Sichtscheibe, spellcheck suggestion:
Sichtscheiben (works)
query: Sichtscheib, spellcheck suggestion:
Sichtscheiben (works)
query: ichtscheiben, no spellcheck suggestions

As
far as I can identify, I only get suggestions when I get real search
results. I get results for the first 2 examples, because the german
StemFilterFactory translates Sichtscheibe and Sichtscheiben into
Sichtscheib, so there are matches found. However, the third query
should result in a suggestion, as the Levenshtein distance is less than
in the second example.

Suggestions, improvements, corrections?

 


RE: Solr spellcheck returns more than 1 word for a 1 word spellcheck

2014-09-02 Thread Dyer, James
This is the WordBreakSolrSpellChecker, which is there to correct spelling 
errors involving misplaced whitespace (or is it white space ??)  To disable it, 
remove this or similar line from your requestHandler in solrconfig.xml:

str name=spellcheck.dictionarywordbreak/str

Keep in mind, if you want the best of both worlds, you can keep this there and 
using the collation feature, it will try and pick the best combination of 
spelling corrections that best fixes your user's query. See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate and 
following sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] 
Sent: Monday, September 01, 2014 6:44 AM
To: Solr user
Subject: Solr spellcheck returns more than 1 word for a 1 word spellcheck

 I'm in the process of incorporating Solr spellchecking in our product.
For that, I've created a new field:

 field name=spell type=spell
indexed=true stored=true required=false multiValued=false/

copyField source=name dest=spell maxChars=3 /

And in the
fieldType definitions:

 fieldType name=spell class=solr.TextField
positionIncrementGap=100
 analyzer
 tokenizer
class=solr.WhitespaceTokenizerFactory/
 /analyzer

/fieldType

Then I feed the names of products into the corresponding
core. They can have a lot of words (examples):

 door lock rear left

Door brake, door in front + rear fitting.

However, the names get pretty
long, and in the source data, they have been truncated. This sometimes
leaves parts of words at the end:

 The water pump can evacuate some
coo

I have created a spellcheck component, feeding of the `spell` field
defined earlier. Now for the problem.

Sometimes, when I look up a
slightly misspelled word, I get results I do not expect. Example
request:

 http://solr.url:8983/solr/en/spell?q=coole

This is (part of)
the response:

 str name=wordcooler/strint name=freq21/int

str name=wordcoo le/strint name=freq2/int
 str
name=wordcable/strint name=freq334/int
 str name=wordco o
le/strint name=freq4/int
 [...]

Now, as you can see, the
misspelled `coole` should have been `cooler`, and it's the first
suggestion. However, the second and fourth suggestion baffle me. After a
bit of research, I found this to be multiple words clunked together. As
I described above, `coo` was a part of a name that was truncated. I
found `co` the same way, and the source data contains a small number of
`o` characters on their own (product number names).

Now, my question
is: Why is Solr suggesting `multiple words` pasted together for a
spellcheck for a single word? Is there a way to prevent Solr from
pasting together word parts to forge suggestions? 
 


RE: Spellchecking suggestions won't collate

2014-08-20 Thread Dyer, James
Because my is the 7th suggestion down the list, it is going to need more than 
30 tries to figure out the one that can give some hits.  You can increase 
maxCollationTries if you're willing to endure the performance penalty of 
trying so many replacement queries.  This case actually highlights why 
DirecrSpellChecker by default doesn't even bother with short words like this.

Rather than letting the spellchecker check words this small, possibly you can 
just scan the user's input and make any words 4 characters long to be 
optional?  Or even just use a mm below 100%? (65% ?)  I realize this will give 
you a small loss of precision but the recall will be better and you'll have to 
rely less on spellcheck.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Friday, August 15, 2014 3:21 PM
To: Solr User List
Subject: Spellchecking suggestions won't collate

It must be Friday. I can't figure out why there is no collation value:

{
  responseHeader:{
status:0,
QTime:31,
params:{
  spellcheck:on,
  spellcheck.collateParam.qf:BUS_BUSINESS_NAME,
  spellcheck.maxResultsForSuggest:5,
  spellcheck.maxCollations:3,
  spellcheck.maxCollationTries:30,
  qf:BUS_BUSINESS_NAME_PHRASE,
  q.alt:*:*,
  spellcheck.collate:true,
  spellcheck.onlyMorePopular:false,
  defType:edismax,
  debugQuery:true,
  echoParams:all,
  spellcheck.count:10,
  spellcheck.alternativeTermCount:10,
  indent:true,
  q:Mi Next Promo,
  wt:json}},
  response:{numFound:0,start:0,maxScore:0.0,docs:[]
  },
  spellcheck:{
suggestions:[
  mi,{
numFound:10,
startOffset:0,
endOffset:2,
suggestion:[mr,
  mp,
  mid,
  mix,
  mb,
  mj,
  my,
  md,
  mc,
  ma]},
  next,{
numFound:3,
startOffset:3,
endOffset:7,
suggestion:[nest,
  news,
  neil]},
  promo,{
numFound:4,
startOffset:8,
endOffset:13,
suggestion:[photo,
  prime,
  pronto,
  prof]}]},

The actual business name is My Next Promo which I'm hoping would be the 
collation value.

Thanks,

Corey



RE: Spell check collation

2014-08-14 Thread Dyer, James
DirectSolrSpellChecker defaults with a minimum term length of 4.  So you'd need 
to bring this down with int name=minQueryLength1/int.  

But you might not like the results from this.  See: 
http://lucene.apache.org/core/4_6_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinQueryLength%28int%29

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Thursday, August 14, 2014 12:18 PM
To: Solr User List
Subject: Spell check collation

Solr 4.6

Current settings for my handler:
str name=defTypeedismax/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.maxCollations3/str
str name=spellcheck.maxCollationTries30/str
str name=qfBUS_BUSINESS_NAME_PHRASE/str
str name=spellcheck.count10/str
str name=spellcheck.alternativeTermCount10/str
str name=spellcheck.collatetrue/str
str name=spellcheck.onlyMorePopularfalse/str
str name=spellcheck.accuracy0.2/str

lst name=spellchecker
  str 
name=classnameorg.apache.solr.spelling.DirectSolrSpellChecker/str
  str name=namedefault/str
  str name=fieldspell/str
  str name=distanceMeasureinternal/str
  float name=accuracy0.5/float
  int name=maxEdits2/int
  int name=minPrefix1/int
  int name=maxInspections5/int
  float name=thresholdTokenFrequency0.01/float
  float name=maxQueryFrequency0.0001/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent

I'm querying:
h G's collision centre

hoping for a spell check suggestion of:
J G's collision centre

But there are no suggestions. Is there is term length limitation to 
spellchecking?

Thanks,

Corey



RE: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-12 Thread Dyer, James
Harun,

What do you mean by the terminal console?  Do you mean to say the admin gui 
freezes but you can still issue queries to solr directly through your browser?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] 
Sent: Tuesday, August 12, 2014 2:46 AM
To: solr-user@lucene.apache.org
Subject: Re: When I use minimum match and maxCollationTries parameters together 
in edismax, Solr gets stuck

I tried again to make sure. Server starts, I can see web admin gui but I 
can't navigate btw tabs. It just says loading. But on the terminal 
console everything seems normal.

Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com

On 12.08.2014 09:42, Harun Reşit Zafer wrote:
 I happens once the server is fully started. And when it gets stuck 
 sometimes I have to restart the server, sometimes I'm able to edit the 
 solrconfig.xml and reload it.

 Harun Reşit Zafer
 TÜBİTAK BİLGEM BTE
 Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
 T +90 262 675 3268
 W  http://www.hrzafer.com

 On 11.08.2014 17:32, Dyer, James wrote:
 Harun,

 Just to clarify, is this happening during startup when a warmup query 
 is running, or is this once the server is fully started? This might 
 be another instance of https://issues.apache.org/jira/browse/SOLR-5386 .

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr]
 Sent: Monday, August 11, 2014 8:39 AM
 To: solr-user@lucene.apache.org
 Subject: When I use minimum match and maxCollationTries parameters 
 together in edismax, Solr gets stuck

 Hi,

 In the following configuration when uncomment both mm and
 maxCollationTries lines, and run a query on |/select|, Solr gets stuck
 with no exception.

 I tried different values for both parameters and found that values for
 mm less than %40 still works.


 |requestHandler name=/select class=solr.SearchHandler
   !-- default values for query parameters can be specified, these
will be overridden by parameters in the request
 --
lst name=defaults
  str name=echoParamsexplicit/str
  str name=defTypeedismax/str
  int name=timeAllowed1000/int
  str name=qftitle^3 title_s^2 content/str
  str name=pftitle content/str
  str name=flid,title,content,score/str
  float name=tie0.1/float
  str name=lowercaseOperatorstrue/str
  str name=stopwordstrue/str
  !-- str name=mm75%/str--
  int name=rows10/int

  str name=spellcheckon/str
  str name=spellcheck.dictionarydefault/str
  str name=spellcheck.dictionarywordbreak/str
  str name=spellcheck.onlyMorePopulartrue/str
  str name=spellcheck.count5/str
  str name=spellcheck.maxResultsForSuggest5/str
  str name=spellcheck.extendedResultsfalse/str
  str name=spellcheck.alternativeTermCount2/str
  str name=spellcheck.collatetrue/str
  str name=spellcheck.collateExtendedResultstrue/str
  str name=spellcheck.maxCollationTries5/str
  !-- str name=spellcheck.collateParam.mm100%/str--

  str name=spellcheck.maxCollations3/str
/lst

arr name=last-components
  strspellcheck/str
/arr

   /requestHandler

 Any idea? Thanks
 |








RE: SqlEntityProcessor

2014-08-11 Thread Dyer, James
I've heard of a user adding a separate entity / section to the end of their 
data-config.xml with a SqlEntityProcessor and an UPDATE statement.  It would 
run after your main entity / section.  I have not tried it myself, and surely 
DIH was not designed to do this, but it might work.

A better solution might be to write a class implementing EventListener that 
does the db update you want and put an onImportEnd listener in your 
configuration.  See 
http://wiki.apache.org/solr/DataImportHandler#EventListeners for details.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Christof Lorenz [mailto:loc...@web.de] 
Sent: Sunday, August 10, 2014 6:52 AM
To: solr-user@lucene.apache.org
Subject: SqlEntityProcessor

Hi folks,

i am searching for a way to update a certain column in the rdbms for
each
item as soon as the item was indexed by solr. 
The column will be the indicator in the delta-query to select un-indexed
items.
We don't want to use the timestamp based mechanism that is default.

Any ideas how we could implement this ?

Regards,
Lochri




RE: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-11 Thread Dyer, James
Harun,

Just to clarify, is this happening during startup when a warmup query is 
running, or is this once the server is fully started?  This might be another 
instance of https://issues.apache.org/jira/browse/SOLR-5386 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] 
Sent: Monday, August 11, 2014 8:39 AM
To: solr-user@lucene.apache.org
Subject: When I use minimum match and maxCollationTries parameters together in 
edismax, Solr gets stuck

Hi,

In the following configuration when uncomment both mm and 
maxCollationTries lines, and run a query on |/select|, Solr gets stuck 
with no exception.

I tried different values for both parameters and found that values for 
mm less than %40 still works.


|requestHandler name=/select class=solr.SearchHandler
 !-- default values for query parameters can be specified, these
  will be overridden by parameters in the request
   --
  lst name=defaults
str name=echoParamsexplicit/str
str name=defTypeedismax/str
int name=timeAllowed1000/int
str name=qftitle^3 title_s^2 content/str
str name=pftitle content/str
str name=flid,title,content,score/str
float name=tie0.1/float
str name=lowercaseOperatorstrue/str
str name=stopwordstrue/str
!-- str name=mm75%/str--
int name=rows10/int

str name=spellcheckon/str
str name=spellcheck.dictionarydefault/str
str name=spellcheck.dictionarywordbreak/str
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.count5/str
str name=spellcheck.maxResultsForSuggest5/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.alternativeTermCount2/str
str name=spellcheck.collatetrue/str
str name=spellcheck.collateExtendedResultstrue/str
str name=spellcheck.maxCollationTries5/str
!-- str name=spellcheck.collateParam.mm100%/str--

str name=spellcheck.maxCollations3/str
  /lst

  arr name=last-components
strspellcheck/str
  /arr

 /requestHandler

Any idea? Thanks
|


-- 
Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com


RE: Data Import handler and join select

2014-08-07 Thread Dyer, James
Alejandro,

You can use a sub-entity with a cache using DIH.  This will solve the 
n+1-select problem and make it run quickly.  Unfortunately, the only built-in 
cache implementation is in-memory so it doesn't scale.  There is a fast, 
disk-backed cache using bdb-je, which I use in production.  See 
https://issues.apache.org/jira/browse/SOLR-2613 .  You will need to build this 
youself and include it on the classpath, and obtain a copy of bdb-je from 
Oracle.  While bdb-je is open source, its license is incompatible with ASL so 
this will never officially be part of Solr.

Once you have a disk-backed cache, you can specify it on the child entity like 
this:
entity name=parent query=select id, ... from parent table
entity 
name=child 
query=select foreignKey, ... from child_table
cacheKey=foreignKey 
cacheLookup=parent.id
processor=SqlEntityProcessor 
transformer=...
cacheImpl=BerkleyBackedCache
/
/entity

If you don't want to go down this path, you can achieve this all with one 
query, if you include and ORDER BY to sort by whatever field is used as Solr's 
uniqueKey, and add a dummy row at the end with a UNION:

SELECT p.uniqueKey, ..., 'A' as lastInd from PRODUCTS p 
INNER JOIN DESCRIPTIONS d ON p.uniqueKey = d.productKey
UNION SELECT 0 as uniqueKey, ... , 'B' as lastInd from dual 
ORDER BY uniqueKey, lastInd

Then your transformer would need to keep the lastUniqueKey in an instance 
variable and keep a running map of everything its seen for that key.  When the 
key changes, or if on the last row, send that map as the document.  Otherwise, 
the transformer returns null.  This will collect data from each row seen onto 
one document.

Keep in mind also, that in a lot of cases like this, it might just be easiest 
to write a program that uses solrj to send your documents rather than trying to 
make DIH's features fit your use-case.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alejandro Marqués Rodríguez [mailto:amarq...@paradigmatecnologico.com] 
Sent: Thursday, August 07, 2014 1:43 AM
To: solr-user@lucene.apache.org
Subject: Data Import handler and join select

Hi,

I have one problem while indexing with data import hadler while doing a
join select. I have two tables, one with products and another one with
descriptions for each product in several languages.

So it would be:

Products: ID, NAME, BRAND, PRICE, ...
Descriptions: ID, LANGUAGE, DESCRIPTION

I would like to have every product indexed as a document with a multivalued
field language which contains every language that has an associated
description and several dinamic fields description_ one for each language.

So it would be for example:

Id: 1
Name: Product
Brand: Brand
Price: 10
Languages: [es,en]
Description_es: Descripción en español
Description_en: English description

Our first approach was using sub-entities for the data import handler and
after implementing some transformers we had everything indexed as we
wanted. The sub-entity process added the descriptions for each language to
the solr document and then indexed them.

The problem was performance. I've read that using sub-entities affected
performance greatly, so we changed our process in order to use a join
instead.

Performance was greatly improved this way but now we have a problem. Each
time a row is processed a solr document is generated and indexed into solr,
but the data is not added to any previous data, but it replaces it.

If we had the previous example the query resulting from the join would be:

Id - Name - Brand - Price - Language - Description
1 - Product - Brand - 10 - es - Descripción en español
1 - Product - Brand - 10 - en - English description

So when indexing as both have the same id the only information I get is the
second row.

Is there any way for data import handler to manage this and allow the
documents to be indexed updating any previous data?

Thanks in advance



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


RE: Change order of spell checker suggestions issue

2014-08-07 Thread Dyer, James
Corey,

Looking more carefully at your responses than I did last time I answered this 
question, it looks like every correction is 2 edits in this example.  

unie  unity (et , insert y)
unie  unger (ig , insert r)
unie  unick (ec , insert k)
unie  united (delete t , insert d)
unie  unique (delete q, u)
unie  unity (et , insert y)
unie  unser (si , insert r)
unie  unyi (iy , ei)

So both score and freq will give it to you by frequency.  Usually when I'm 
in doubt of something like this working like it should, I try to come up with 
more than 1 clear-cut example.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Thursday, August 07, 2014 11:31 AM
To: Solr User List
Subject: Change order of spell checker suggestions issue

Solr Rev: 4.6 Lucidworks: 2.6.3

This is sort of a repeat question, sorry.

In the solrconfig.xml, will changing the value for the comparatorClass affect 
the sort of suggestions returned?

This is my spellcheck component:
searchComponent 
class=com.lucid.spellchecking.LucidSpellCheckComponent name=spellcheck
lst name=defaults
str 
name=spellcheck.onlyMorePopularfalse/str
str 
name=spellcheck.extendedResultstrue/str
str 
name=spellcheck.count5/str
/lst

str 
name=queryAnalyzerFieldTypetextSpell/str

lst name=spellchecker
str 
name=classnameorg.apache.solr.spelling.DirectSolrSpellChecker/str
str name=namedefault/str
str name=fieldspell/str
str 
name=distanceMeasureinternal/str
float 
name=accuracy0.5/float
int name=maxEdits2/int
int name=minPrefix1/int
int 
name=maxInspections5/int
str 
name=comparatorClassscore/str
float 
name=thresholdTokenFrequency1/float
int 
name=minQueryLength4/int
float 
name=maxQueryFrequency0.01/float
/lst
  /searchComponent

Searching for unie produces the following suggestions. But the suggestions 
appear to me to be by frequency (I've indicated Levenshtein distance in []):

lst

str name=wordunity/str [ 3  ]

int name=freq1200/int

/lst

lst

str name=wordunger/str [ 3  ]

int name=freq119/int

/lst

lst

str name=wordunick/str [ 3 ]

int name=freq16/int

/lst

lst

str name=wordunited/str [ 4 ]

int name=freq16/int

/lst

lst

str name=wordunique/str [ 4 ]

int name=freq10/int

/lst

lst

str name=wordunity/str [ 3 ]

int name=freq7/int

/lst

lst

str name=wordunser/str [ 3 ]

int name=freq7/int

/lst

lst

str name=wordunyi/str [ 2 ]

int name=freq7/int

/lst

Is something configured incorrectly or am I just needing more coffee?



RE: Debug DirectSolrSpellChecker Suggestion Sort Order

2014-08-01 Thread Dyer, James
Query results default to score.  But spelling suggestions sort by edit 
distance, with frequency as a secondary sort.  

unie = unger = 2 edits
unie = unick = 2 edits
unie = united = 3 edits
unie = unique = 3 edits
... etc ...

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Friday, August 01, 2014 3:01 PM
To: 'solr-user@lucene.apache.org'
Subject: Debug DirectSolrSpellChecker Suggestion Sort Order

Everything that I read says that the default sort order is by Score, yet this 
appears to me to be sorted by frequency:

lst name=suggestions
lst name=unie
int 
name=numFound10/int
int 
name=startOffset0/int
int 
name=endOffset4/int
int 
name=origFreq0/int
arr 
name=suggestion

lst

str name=wordunger/str

int name=freq119/int

/lst

lst

str name=wordunick/str

int name=freq16/int

/lst

lst

str name=wordunited/str

int name=freq16/int

/lst

lst

str name=wordunique/str

int name=freq10/int

/lst

lst

str name=wordunity/str

int name=freq7/int

/lst

lst

str name=wordunser/str

int name=freq7/int

/lst

lst

str name=wordunyi/str

int name=freq7/int

/lst

lst

str name=wordutke/str

int name=freq5/int

/lst

lst

str name=worduribe/str

int name=freq3/int

/lst

lst

 

RE: Searching words with spaces for word without spaces in solr

2014-07-31 Thread Dyer, James
If a user is searching on ice cream but your index has icecream, you can 
treat this like a spelling error.  WordBreakSolrSpellChecker would identify the 
fact that  while ice cream is not in your index, icecream and then you can 
re-query for the corrected version without the space.

The problem with solving this with analyers, is that you can analyze 
ice-cream as either ice cream or icecream (split or catenate on hyphen).  
You can even analyze IceCream  Ice Cream (catenate on case change).  But how 
is your analyzer going to know that icecream should index as two tokens: 
ice cream ?  You're asking analysis to do too much in this case.  This is 
where spellcheck can bridge the gap.

Of course, if you have a discrete list of words you want split like this, then 
you can do it with analysis using index-time synonyms.  In this case, you need 
to provide it with the list.  See 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 for more information.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: sunshine glass [mailto:sunshineglassof2...@gmail.com] 
Sent: Thursday, July 31, 2014 10:32 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching words with spaces for word without spaces in solr

I am not clear with this. This link is related to spell check. Can you
elaborate it more ?


On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 In addition to the analyzer configuration you're using, you might want to
 also use WordBreakSolrSpellChecker to catch possible matches that can't
 easily be solved through analysis.  For more information, see the section
 for it at https://cwiki.apache.org/confluence/display/solr/Spell+Checking

 James Dyer
 Ingram Content Group
 (615) 213-4311

 -Original Message-
 From: sunshine glass [mailto:sunshineglassof2...@gmail.com]
 Sent: Wednesday, July 30, 2014 9:38 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Searching words with spaces for word without spaces in solr

 This is the new configuration:

 fieldType name=text class=solr.TextField
  positionIncrementGap=100
analyzer type=index
  charFilter class=solr.HTMLStripCharFilterFactory/
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.ShingleFilterFactory maxShingleSize=2
  outputUnigrams=true tokenSeparator=/
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.SnowballPorterFilterFactory
  language=English protected=protwords.txt/
filter class=solr.SynonymFilterFactory
  synonyms=stemmed_synonyms_text_prime_index.txt ignoreCase=true
  expand=true/
/analyzer
analyzer type=query
  tokenizer class=solr.StandardTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords_text_prime_search.txt enablePositionIncrements=true
 /
  filter class=solr.ShingleFilterFactory maxShingleSize=2
  outputUnigrams=true tokenSeparator=/
  filter class=solr.WordDelimiterFilterFactory
  generateWordParts=1 generateNumberParts=1 catenateWords=1
  catenateNumbers=1 catenateAll=1 splitOnCaseChange=1/
  filter class=solr.SnowballPorterFilterFactory
  language=English protected=protwords.txt/
/fieldType
 
 
 These are current docs in my index:

 result name=response numFound=3 start=0
 doc
 str name=id2/str
 str name=titleIcecream/str
 long name=_version_1475063961342705664/long
 /doc
 doc
 str name=id3/str
 str name=titleIce-cream/str
 long name=_version_1475063961344802816/long
 /doc
 doc
 str name=id1/str
 str name=titleIce Cream/str
 long name=_version_1475063961203245056/long
 /doc
 /result
 /response

 Query:
 http://localhost:8983/solr/collection1/select?q=title:ice+creamdebug=true

 Response:

 result name=response numFound=2 start=0
 doc
 str name=id1/str
 str name=titleIce Cream/str
 long name=_version_1475063961203245056/long
 /doc
 doc
 str name=id3/str
 str name=titleIce-cream/str
 long name=_version_1475063961344802816/long
 /doc
 /result
 lst name=debug
 str name=rawquerystringtitle:ice cream/str
 str name=querystringtitle:ice cream/str
 str name=parsedquery
 (+(title:ice DisjunctionMaxQuery((title:cream/no_coord
 /str
 str name=parsedquery_toString+(title:ice (title:cream))/str
 lst name=explain
 str name=1
 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
 [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
 termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
 idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
 in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
 termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0

RE: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Dyer, James
Jia,

I agree that for the spellcheckers to work, you need  arr 
name=last-components instead of arr name=components.

But the x-box = xbox example ought to be solved by analyzing using 
WordDelimiterFilterFactory and catenateWords=1 at query-time.  Did you 
re-index after changing your analysis chain (you need to)?  Perhaps you can 
show your full analyzer configuration, and someone here can help you find the 
problem. Also, the Analysis page on the solr Admin UI is invaluable for 
debugging text-field analyzer problems.

Getting x box to analyze to xbox is trickier (but possible).  The 
WordBreakSpellChecker is probably your best option if you have cases like this 
in your data  users' queries. 

Of course, if you have a finite number of products that have spelling variants 
like this, SynonymFilterFactory might be all you need.  I would recommend using 
index-time synonyms for your case rather than query-time synonyms.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Wednesday, July 16, 2014 7:42 AM
To: solr-user@lucene.apache.org; j...@ece.ubc.ca
Subject: Re: questions on Solr WordBreakSolrSpellChecker and 
WordDelimiterFilterFactory

Hi Jia,

What happens when you use 

 arr name=last-components

instead of 

 arr name=components

Ahmet


On Wednesday, July 16, 2014 3:07 AM, j...@ece.ubc.ca j...@ece.ubc.ca wrote:



Hello everyone :)

I have a product called xbox indexed, and when the user search for
either x-box or x box i want the xbox product to be
returned.  I'm new to Solr, and from reading online, I thought I need
to use WordDelimiterFilterFactory for x-box case, and
WordBreakSolrSpellChecker for x box case. Is this correct?

(1) In my schema file, this is what I changed:
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=1 splitOnCaseChange=0 preserveOriginal=1/

But I don't see the xbox product returned when the search term is
x-box, so I must have missed something

(2) I tried to use  WordBreakSolrSpellChecker together with
DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
never got used:

searchComponent name=wc_spellcheck
class=solr.SpellCheckComponent
    str name=queryAnalyzerFieldTypewc_textSpell/str

    lst name=spellchecker
      str name=namedefault/str
      str name=fieldspellCheck/str
      str name=classnamesolr.DirectSolrSpellChecker/str
      str name=distanceMeasureinternal/str
          float name=accuracy0.3/float
            int name=maxEdits2/int
            int name=minPrefix1/int
            int name=maxInspections5/int
            int name=minQueryLength3/int
            float name=maxQueryFrequency0.01/float
            float name=thresholdTokenFrequency0.004/float
    /lst
lst name=spellchecker
    str name=namewordbreak/str
    str name=classnamesolr.WordBreakSolrSpellChecker/str
    str name=fieldspellCheck/str
    str name=combineWordstrue/str
    str name=breakWordstrue/str
    int name=maxChanges10/int
  /lst
  /searchComponent

  requestHandler name=/spellcheck
class=org.apache.solr.handler.component.SearchHandler
    lst name=defaults
        str name=dfSpellCheck/str
        str name=spellchecktrue/str
       str name=spellcheck.dictionarydefault/str
        str name=spellcheck.dictionarywordbreak/str
        str name=spellcheck.build true/str
       str name=spellcheck.onlyMorePopularfalse/str
       str name=spellcheck.count10/str
       str name=spellcheck.collatetrue/str
       str name=spellcheck.collateExtendedResultsfalse/str
    /lst
    arr name=components
      strwc_spellcheck/str
    /arr
  /requestHandler

I tried to build the dictionary this way:
http://localhost/solr/coreName/select?spellcheck=truespellcheck.build=true,
but the response returned is this:
response
lst name=responseHeader
int name=status0/int
int name=QTime0/int
lst name=params
str name=spellcheck.buildtrue/str
str name=spellchecktrue/str
/lst
/lst
str name=commandbuild/str
result name=response numFound=0 start=0/
/response

What's the correct way to build the dictionary?
Even though my requestHandler's name=/spellcheck, i wasn't able to
use
http://localhost/solr/coreName/spellcheck?spellcheck=truespellcheck.build=true
.. is there something wrong with my definition above?

(3) I also tried to use WordBreakSolrSpellChecker without the
DirectSolrSpellChecker as shown below:
searchComponent name=wc_spellcheck
class=solr.SpellCheckComponent

  str name=queryAnalyzerFieldTypewc_textSpell/str
    lst name=spellchecker
    str name=namedefault/str
    str name=classnamesolr.WordBreakSolrSpellChecker/str
    str name=fieldspellCheck/str
    str name=combineWordstrue/str
    str name=breakWordstrue/str
    int name=maxChanges10/int
  /lst
   /searchComponent

   requestHandler name=/spellcheck
class=org.apache.solr.handler.component.SearchHandler
    lst name=defaults
       

RE: Endeca to Solr Migration

2014-07-02 Thread Dyer, James
We migrated a big application from Endeca (6.0, I think) a several years ago.  
We were not using any of the business UI tools, but we found that Solr is a lot 
more flexible and performant than Endeca.  But with more flexibility comes more 
you need to know.

The hardest thing was to migrate the Endeca dimensions to Solr facets.  We had 
endeca-api specific dependencies throughout the application, even in the 
presentation layer.  We ended up writing a bridge api that allowed us to keep 
our endeca-specific code and translate the queries to solr queries.  We are 
storing a cross-reference between the N values from Endeca and key/value 
pairs to translate something like N=4000 to fq=Language:English.  With solr, 
there is more you need to do in your app that the backend doesn't manage for 
you.  In the end, though, it lets you sparate your concerns better.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mrg81 [mailto:maya...@gmail.com] 
Sent: Saturday, June 28, 2014 1:11 PM
To: solr-user@lucene.apache.org
Subject: Endeca to Solr Migration

Hello --

I wanted to get some details on Endeca to Solr Migration. I am
interested in few topics:

1. We would like to migrate the Faceted Navigation, Boosting individual
records and a few other items. 
2. But the biggest question is about the UI [Experience Manager] - I have
not found a tool that comes close to Experience Manager. I did read about
Hue [In response to Gareth's question on Migration], but it seems that we
will have to do a lot of customization to use that. 

Questions:

1. Is there a UI that we can use? Is it possible to un-hook the Experience
Manager UI and point to Solr?
2. How long does a typical migration take? Assuming that we have to migrate
the Faceted Navigation and Boosted records? 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Endeca-to-Solr-Migration-tp4144582.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Spell checker - limit on number of misspelt words in a search term.

2014-06-23 Thread Dyer, James
I do not believe there is such a setting.  Most likely you will need to 
increase the value for maxCollationTries to get it to discover the correct 
combination. Just be sure not to set this too high as queries with a lot of 
misspelled words (or for something your index simply doesn't have) will take 
longer to complete.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Tuesday, June 17, 2014 4:49 PM
To: solr-user@lucene.apache.org
Subject: Spell checker - limit on number of misspelt words in a search term.

Hi All,

I am using the Direct Spell checker component and I have collate =true in
my solrconfig.xml.

The issue that I noticed is that , when I have a search term with upto two
words in it and if both of them are misspelled  I get a collation query  as
a suggestion in the spellchecker output, if I increase the search term
length to 3 words and spell all of them incorrectly then I do not get a
collation query as an output in the spell checker suggestions.

Is there a setting in solrconfig.xml file that's  controlling this behavior
by restricting the length of the search term to be up to two misspelt words
to suggest a collation query, if so I would need to change the property.

Can anyone please let me know how to do so ?

Thanks.

Sent from my mobile.


RE: Solr spellcheck - onlyMorePopular threshold?

2014-06-09 Thread Dyer, James
I believe it will return the terms that are most similar to the queried terms 
but have a greater term frequency than the queried terms.  It doesn't actually 
care what the term frequencies are, only that they are greater than the 
frequencies of the terms you queried on.

I do not know your use case, but you may want to consider using 
spellcheck.alternativeTermCount instead of onlyMorePopular.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
and 
https://issues.apache.org/jira/browse/SOLR-2585?focusedCommentId=13096153page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13096153
 for why.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alistair [mailto:ali...@gmail.com] 
Sent: Monday, June 09, 2014 3:06 AM
To: solr-user@lucene.apache.org
Subject: Solr spellcheck - onlyMorePopular threshold?

Hello all,

I was wondering what does the onlyMorePopular option for spellchecking use
as its threshold? Will it always pick the suggestion that returns the most
queries or does it base its result based off of some threshold that can be
configured? 

Thanks!

Ali.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread Dyer, James
If wrangle is not in your index, and if it is within the max # of edits, then 
it should suggest it.

Are you getting anything back from spellcheck at all?  What is the exact query 
you are using?  How is the spellcheck field analyzed?  If you're using 
stemming, then wrangle and wrangler might be stemmed to the same word. (by 
the way, you shouldn't spellcheck against a stemmed or otherwise 
heavily-analyzed field).

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Monday, June 02, 2014 1:06 PM
To: solr-user@lucene.apache.org
Subject: Re: DirectSpellChecker not returning expected suggestions.

OK, I just realized that wrangle is a proper english word, probably thats
why I dont get a suggestion for wrangler in this case. How ever in my
test index there is no wrangle present , so even though this is a proper
english word , since there is no occurence of it in the index should'nt
Solr suggest me wrangler ?


On Mon, Jun 2, 2014 at 2:00 PM, S.L simpleliving...@gmail.com wrote:

 I do not get any suggestion (when I search for wrangle) , however I
 correctly get the suggestion wrangler when I search for wranglr , I am
 using the Direct and WordBreak spellcheckers in combination, I have not
 tried using anything else.

 Is the distance calculation of Solr different than what Levestien distance
 calculation ? I have set maxEdits to 1 , assuming that this corresponds to
 the maxDistance.

 Thanks for your help!


 On Mon, Jun 2, 2014 at 1:54 PM, david.w.smi...@gmail.com 
 david.w.smi...@gmail.com wrote:

 What do you get then?  Suggestions, but not the one you’re looking for, or
 is it deemed correctly spelled?

 Have you tried another spellChecker impl, for troubleshooting purposes?

 ~ David Smiley
 Freelance Apache Lucene/Solr Search Consultant/Developer
 http://www.linkedin.com/in/davidwsmiley


 On Sat, May 31, 2014 at 12:33 AM, S.L simpleliving...@gmail.com wrote:

  Hi All,
 
  I have a small test index of 400 documents , it happens to have an entry
  for  wrangler, When I search for wranglr, I correctly get the
 collation
  suggestion as wrangler, however when I search for wrangle , I do not
  get a suggestion for wrangler.
 
  The Levenstien distance between wrangle -- wrangler is same as the
  Levestien distance between wranglr--wrangler , I am just wondering why
 I
  do not get a suggestion for wrangle.
 
  Below is my Direct spell checker configuration.
 
  lst name=spellchecker
str name=namedirect/str
str name=fieldsuggestAggregate/str
str name=classnamesolr.DirectSolrSpellChecker/str
!-- the spellcheck distance measure used, the default is the
  internal levenshtein --
str name=distanceMeasureinternal/str
str name=comparatorClassscore/str
 
!-- minimum accuracy needed to be considered a valid spellcheck
  suggestion --
float name=accuracy0.7/float
!-- the maximum #edits we consider when enumerating terms: can
 be 1
  or 2 --
int name=maxEdits1/int
!-- the minimum shared prefix when enumerating terms --
int name=minPrefix3/int
!-- maximum number of inspections per result. --
int name=maxInspections5/int
!-- minimum length of a query term to be considered for
 correction
  --
int name=minQueryLength4/int
!-- maximum threshold of documents a query term can appear to be
  considered for correction --
float name=maxQueryFrequency0.01/float
!-- uncomment this to require suggestions to occur in 1% of the
  documents --
!--
float name=thresholdTokenFrequency.01/float
--
  /lst
 





RE: Wordbreak spellchecker excessive breaking.

2014-05-30 Thread Dyer, James
I am not sure why changing spellcheck parameters would prevent your server from 
restarting.  One thing to check is to see if you have warming queries running 
that involve spellcheck.  I think I remember from long ago there was (maybe 
still is) an obscure bug where sometimes it will lock up in rare cases when 
spellcheck is used in warming queries.  I do not remember exactly what caused 
this or if it was ever fixed.

Besides that, you might want to post a stack trace or describe what happens 
when it doesn't restart.  Perhaps someone here will know what the problem is.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Friday, May 30, 2014 12:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Wordbreak spellchecker excessive breaking.

James,

Thanks for clearly stating this , I was not able to find this documented
anywhere, yes I am using it with another spell checker (Direct) with the
collation on. I will try the maxChangtes and let you know.

On a side note , whenever I change the spellchecker parameter , I need to
rebuild the index  and delete the solr data directory before that  as my
Tomcat instance would not even start, can you let me know why ?

Thanks.




On Tue, May 27, 2014 at 12:21 PM, Dyer, James james.d...@ingramcontent.com
wrote:

 You can do this if you set it up like in the mail Solr example:

 lst name=spellchecker
 str name=namewordbreak/str
 str name=classnamesolr.WordBreakSolrSpellChecker/str
 str name=fieldname/str
 str name=combineWordstrue/str
 str name=breakWordstrue/str
 int name=maxChanges10/int
 /lst

 The combineWords and breakWords flags let you tell it which kind of
 workbreak correction you want.  maxChanges controls the maximum number of
 words it can break 1 word into, or the maximum number of words it can
 combine.  It is reasonable to set this to 1 or 2.

 The best way to use this is in conjunction with a regular spellchecker
 like DirectSolrSpellChecker.  When used together with the collation
 functionality, it should take a query like mob ile and depending on what
 actually returns results from your data, suggest either mobile or perhaps
 mob lie or both.  The one thing is cannot do is fix a transposition or
 misspelling and combine or break words in one shot.  That is, it cannot
 detect that mob lie should become mobile.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: S.L [mailto:simpleliving...@gmail.com]
 Sent: Saturday, May 24, 2014 4:21 PM
 To: solr-user@lucene.apache.org
 Subject: Wordbreak spellchecker excessive breaking.

 I am using Solr wordbreak spellchecker and the issue is that when I search
 for a term like mob ile expecting that the wordbreak spellchecker would
 actually resutn a suggestion for mobile it breaks the search term into
 letters like m o b  I have two issues with this behavior.

  1. How can I make Solr combine mob ile to mobile?
  2. Not withstanding the fact that my search term mob ile is being broken
 incorrectly into individual letters , I realize that the wordbreak is
 needed in certain cases, how do I control the wordbreak so that it does not
 break it into letters like m o b which seems like excessive breaking to
 me ?

 Thanks.



RE: Wordbreak spellchecker excessive breaking.

2014-05-27 Thread Dyer, James
You can do this if you set it up like in the mail Solr example:

lst name=spellchecker
str name=namewordbreak/str
str name=classnamesolr.WordBreakSolrSpellChecker/str  
str name=fieldname/str
str name=combineWordstrue/str
str name=breakWordstrue/str
int name=maxChanges10/int
/lst

The combineWords and breakWords flags let you tell it which kind of 
workbreak correction you want.  maxChanges controls the maximum number of 
words it can break 1 word into, or the maximum number of words it can combine.  
It is reasonable to set this to 1 or 2.

The best way to use this is in conjunction with a regular spellchecker like 
DirectSolrSpellChecker.  When used together with the collation functionality, 
it should take a query like mob ile and depending on what actually returns 
results from your data, suggest either mobile or perhaps mob lie or both.  
The one thing is cannot do is fix a transposition or misspelling and combine or 
break words in one shot.  That is, it cannot detect that mob lie should 
become mobile.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Saturday, May 24, 2014 4:21 PM
To: solr-user@lucene.apache.org
Subject: Wordbreak spellchecker excessive breaking.

I am using Solr wordbreak spellchecker and the issue is that when I search
for a term like mob ile expecting that the wordbreak spellchecker would
actually resutn a suggestion for mobile it breaks the search term into
letters like m o b  I have two issues with this behavior.

 1. How can I make Solr combine mob ile to mobile?
 2. Not withstanding the fact that my search term mob ile is being broken
incorrectly into individual letters , I realize that the wordbreak is
needed in certain cases, how do I control the wordbreak so that it does not
break it into letters like m o b which seems like excessive breaking to
me ?

Thanks.


RE: solr 4.2.1 spellcheck strange results

2014-05-16 Thread Dyer, James
To achieve what you want, you need to specify a lightly analyzed field (no 
stemming) for spellcheck.  For instance, if your solr.SpellCheckComponent in 
solrconfig.xml is set up with field of title_full, then try using 
title_full_unstemmed.  Also, if you are specifying a 
queryAnalyzerFieldType, it should be the same as your unstemmed text field.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: HL [mailto:freemail.grha...@gmail.com] 
Sent: Saturday, May 10, 2014 9:12 AM
To: solr-user@lucene.apache.org
Subject: solr 4.2.1 spellcheck strange results

Hi

I am querying the solr server spellcheck and the results I get back 
although at first glance look ok
it seems like solr is replying back as if it made the search with the 
wrong key.

so while I query the server with the word
καρδυα
Solr is responding me as if it was querying the database with the word 
καρδυ eliminating the last char
---
lst name=spellcheck
lst name=suggestions
lst name=καρδυ
---

Ideally, Solr should properly indicate that the suggestions correspond 
with καρδυα rather than καρδυ.

Is there a way to make solr respond with the original search word from 
the query in it's responce, instead of the one that is getting the hits 
from ??

Regars,
Harry



here is the complete solr responce
---
response
lst name=responseHeader
int name=status0/int
int name=QTime23/int
lst name=params
str name=spellchecktrue/str
str name=fl*,score/str
str name=start0/str
str name=qκαρδυα/str
str name=spellcheck.qκαρδυα/str
str name=qf
title_short^750 title_full_unstemmed^600 title_full^400 title^500 
title_alt^200 title_new^100 series^50 series2^30 author^300 
author_fuller^150 contents^10 topic_unstemmed^550 topic^500 
geographic^300 genre^300 allfields_unstemmed^10 fulltext_unstemmed^10 
allfields fulltext isbn issn
/str
str name=spellcheck.dictionarybasicSpell/str
str name=json.nlarrarr/str
str name=qtdismax/str
str name=wtxml/str
str name=rows0/str
/lst
/lst
result name=response numFound=0 start=0 maxScore=0.0/
lst name=spellcheck
lst name=suggestions
lst name=καρδυ
int name=numFound3/int
int name=startOffset0/int
int name=endOffset6/int
int name=origFreq0/int
arr name=suggestion
lst
str name=wordκαρδ/str
int name=freq5/int
/lst
lst
str name=wordκαρδι/str
int name=freq3/int
/lst
lst
str name=wordκαρυ/str
int name=freq1/int
/lst
/arr
/lst
bool name=correctlySpelledfalse/bool
/lst
/lst
/response




RE: Spell check [or] Did you mean this with Phrase suggestion

2014-05-16 Thread Dyer, James
Have you looked at spellcheck.collate, which re-writes the entire query with 
one or more corrected words?  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate .  There are 
several options shown at this link that controls how the collate feature 
works.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: vanitha venkatachalam [mailto:venkatachalam.vani...@gmail.com] 
Sent: Thursday, May 08, 2014 4:14 AM
To: solr-user@lucene.apache.org
Subject: Spell check [or] Did you mean this with Phrase suggestion

Hi,
We need a spell check component that suggest actual full phrase not just
words.

Say, we have list of brands : Nike corporation, Samsung electronics ,

when I search for tamsong, I like to get suggestions as samsung
electronics ( full phrase ) not just samsung ( words)
Please help.
-- 
regards,
Vanitha


RE: spellcheck if docsfound below threshold

2014-05-16 Thread Dyer, James
Its spellcheck.maxResultsForSuggest.

http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Jan Verweij - Reeleez [mailto:j...@reeleez.nl] 
Sent: Monday, May 12, 2014 2:12 AM
To: solr-user@lucene.apache.org
Subject: spellcheck if docsfound below threshold

Hi,

Is there a setting to only include spellcheck if the number of documents
found is below a certain threshold?

Or would we need to rerun the request with the spellcheck parameters based
on the docs found?

Kind regards,

Jan Verweij


RE: spellcheck.q and local parameters

2014-04-28 Thread Dyer, James
spellcheck.q is supposed to take a list of raw query terms, so what you're 
trying to do in your example won't work.  What you should do instead is 
space-delimit the actual query terms that exist in qq and (nothing else) use 
that for your value of spellcheck.q .  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Jeroen Steggink [mailto:jeroen.stegg...@contentstrategy.nl] 
Sent: Monday, April 28, 2014 3:01 PM
To: solr-user@lucene.apache.org
Subject: spellcheck.q and local parameters

Hi,

I'm having some trouble using the spellcheck.q parameter. The user's query is 
defined in the qq parameter and q parameter contains several other parameters 
for boosting.
I would like to use the qq parameter as a default for spellcheck.q.
I tried several ways of adding the qq parameter in the spellcheck.q parameter, 
but it doesn't seem to work. Is this at all possible or do I need to write a 
custom QueryConverter?

This is the configuration:

str name=q _query_:{!edismax qf=$qfQuery pf=$pfQuery bq=$boostQuery 
bf=$boostFunction v=$qq}/str
str name=spellcheck.q{!v=$qq}/str

I haven't included all the variables, because they seem unnecessary.

Regards,
Jeroen



RE: Volatile spellcheck index

2014-02-05 Thread Dyer, James
Alejandro,

Assuming you're using Solr 3.x, under:

searchComponent name=spellcheck class=solr.SpellCheckComponent
 lst name=spellchecker
 ...
 /lst
/searchComponent

...you can add:

str name=spellcheckIndexDir./spellchecker/str

...then the spell check index will be created on-disk and not in memory.

But in Solr 4.0, the default spellcheck implementation changed to 
org.apache.solr.spelling.DirectSolrSpellChecker, which does not create a 
separate index for for spellchecking, build does nothing, and you need not 
worry at all about these things.  The wiki still says experimental here but 
that is woefully out-of-date.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alejandro Marqués Rodríguez [mailto:amarq...@paradigmatecnologico.com] 
Sent: Wednesday, February 05, 2014 3:41 AM
To: solr-user@lucene.apache.org
Subject: Volatile spellcheck index

Hi,

I'm having a problem with the spell check index building. I've configured
the spell checker component to have the index built on optimize.

*  !-- Spell Check http://wiki.apache.org/solr/SpellCheckComponent
http://wiki.apache.org/solr/SpellCheckComponent --*
*  searchComponent name=spellcheck class=solr.SpellCheckComponent*
*  str name=queryAnalyzerFieldTypespell/str*

*  lst name=spellchecker*
*  str name=namespellchecker/str*
*  str name=fieldspell/str*
*  str name=accuracy0.7/str*
*  str name=buildOnOptimizetrue/str*
*  /lst*
*  /searchComponent*

*  !-- A request handler for demonstrating the spellcheck component.
http://wiki.apache.org/solr/SpellCheckComponent
http://wiki.apache.org/solr/SpellCheckComponent for details --*
*  requestHandler name=/spell class=solr.SearchHandler*
*lst name=defaults  *
*  str name=spellcheck.dictionaryspellchecker/str*
*  str name=spellcheckon/str*
*  str name=spellcheck.onlyMorePopularfalse/str*
*  str name=spellcheck.extendedResultsfalse/str*
*  str name=spellcheck.count1/str*
*/lst*
*arr name=last-components*
*  strspellcheck/str*
*/arr*
*  /requestHandler*

After the index process I launch an optimize request and the spellcheck
index is generated and everything is working fine. However, if I restart
Solr the spell check is not working anymore until I execute another
optimize request.

So, is this the expected way of working? Is the spell check index deleted
after every server restart? Is there any way to make it persistent?

And just one more question, I remember in previous Solr versions the
spellcheck had even its own folder under the data folder, so, for example I
could see if the spell check index had been generated just listing the
files under that folder. Does that folder still exist? Is there any way of
knowing if the spell check index has been generated without executing a
query that is supposed to return a correction?

Thanks in advance




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42



RE: How to override rollback behavior in DIH

2014-01-17 Thread Dyer, James
Peter,

I think you can override org.apache.solr.handler.dataimport.SolrWriter to have 
a custom (no-op) rollback method.  Your new writer should implement 
org.apache.solr.handler.dataimport.DIHWriter.  You can specify the writerImpl 
request parameter to specify the new class.

Unfortunately, it isn't actually this easy because your new writer is going to 
have to know what to do for all the other methods.  That is, there is no easy 
way to tell it how to write/commit/etc to Solr.  The default SolrWriter has a 
lot of hardcoded parameters it gets sent on construction in 
DataImportHandler#handleRequestBody.  You would have to somehow duplicate this 
construction on your own custom class.  See SOLR-3671 for an explanation of 
this dilemma.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: pkeegan01...@gmail.com [mailto:pkeegan01...@gmail.com] On Behalf Of Peter 
Keegan
Sent: Friday, January 17, 2014 7:51 AM
To: solr-user@lucene.apache.org
Subject: Re: How to override rollback behavior in DIH

Following up on this a bit - my main index is updated by a SolrJ client in
another process. If the DIH fails, the SolrJ client is never informed of
the index rollback, and any pending updates are lost. For now, I've made
sure that the DIH processor never throws an exception, but this makes it a
bit harder to detect the failure via the admin interface.

Thanks,
Peter


On Tue, Jan 14, 2014 at 11:12 AM, Peter Keegan peterlkee...@gmail.comwrote:

 I have a custom data import handler that creates an ExternalFileField from
 a source that is different from the main index. If the import fails (in my
 case, a connection refused in URLDataSource), I don't want to roll back any
 uncommitted changes to the main index. However, this seems to be the
 default behavior. Is there a way to override the IndexWriter rollback?

 Thanks,
 Peter




RE: Spellchecking problem

2013-12-20 Thread Dyer, James
If you are using spellcheck.maxCollateTries with a value greater than 0 the 
*collatation* section of your spellcheck response will give query corrections 
that are proven to produce hits.  Possibly you were looking at the first 
section where it gives individual word suggestions?  Or maybe one of your query 
parameters is misspelled (check case and that you have spellcheck. in front 
of all of them)?  If you can't figure it out, provide us the entire query 
string you're using, the spellcheck response you get back and also the relevant 
portions of solrconfig.xml.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gastone Penzo [mailto:gastone.pe...@gmail.com] 
Sent: Friday, December 20, 2013 7:43 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking problem

Hello,

i have problem with spellchecking.
i use solr to index an ecommerce products (dvd, cd, books ecc)
the collation is only one but in the index there'is the field: typology (of
product)
When i build spellchecking indexes, they are build together.
How can i have only suggestsions of one typology?

i read that if i user spellcheck.collate=true and i maxcollatetries  0,
solr evaluates every suggestion with fq parameter of the query. In my query
i have for example fq=typology:book
but it doesn't works. why?

i also tried collationparameter.fq=typology:book
the same

i use solr 4.3
thank you


-- 
*Gastone Penzo*



RE: Spellchecking problem

2013-12-20 Thread Dyer, James
Gastone,

You may, at least while developing, specify 
spellcheck.collateExtendedResults=true so you can see for sure it has 
verified how many hits each collation would return.

But my guess is that your mm parameter makes pretty much anything return some 
hits.  You might want to specify spellcheck.collateParam.mm=100% or something 
like that to restrict collations to only those queries that return hits if all 
the terms were required.

See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gastone Penzo [mailto:gastone.pe...@gmail.com] 
Sent: Friday, December 20, 2013 8:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecking problem

Thank you for your answer.

this is the querystring

http://seshat:9000/solr/browse/?q=otto+maialottofq=shelf:GIOqf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0 category_label^0 pf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0
category_label^0spellcheck=truespellcheck.collate=truespellcheck.maxCollationTries=10spellcheck.q=otto+il+maialottomm=2%3C-1+5%3C80%25

shelf is the field that rappresent the typology of product and GIO is the
typology (games)

the problem is the collation
the result gives ( Otto il polpo ) is the name of another product typology
(Book)
why?

the result is this.

lst name=spellcheck
lst name=suggestions
lst name=otto il maialotto
int name=numFound5/int
int name=startOffset0/int
int name=endOffset17/int
int name=origFreq0/int
arr name=suggestion
lst
str name=wordotto il polpo/str
int name=freq2/int
/lst
lst
str name=wordgigetto il maialetto  vol.0/str
int name=freq2/int
/lst
lst
str name=wordsotto il mare  vol.0/str
int name=freq2/int
/lst
lst
str name=wordsotto il mare/str
int name=freq2/int
/lst
lst
str name=wordotto il rinoceronte/str
int name=freq2/int
/lst
/arr
/lst
bool name=correctlySpelledtrue/bool
str name=collation(otto il polpo)/str
/lst
/lst

this is the conf:


str name=queryAnalyzerFieldTypetextSpell/str

lst name=spellchecker
  str name=namedefault/str
  str name=fieldspellcheckdef/str
  str name=spellcheckIndexDirspellchecker/str
  str name=spellcheckon/str
  str name=spellcheck.onlyMorePopularfalse/str
  str name=spellcheck.extendedResultstrue/str
  str name=spellcheck.count6/str
  str name=spellcheck.collatetrue/str
  float name=thresholdTokenFrequency.001/float
/lst

  /searchComponent

Thanks






2013/12/20 Dyer, James james.d...@ingramcontent.com

 If you are using spellcheck.maxCollateTries with a value greater than 0
 the *collatation* section of your spellcheck response will give query
 corrections that are proven to produce hits.  Possibly you were looking at
 the first section where it gives individual word suggestions?  Or maybe one
 of your query parameters is misspelled (check case and that you have
 spellcheck. in front of all of them)?  If you can't figure it out,
 provide us the entire query string you're using, the spellcheck response
 you get back and also the relevant portions of solrconfig.xml.

 James Dyer
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Gastone Penzo [mailto:gastone.pe...@gmail.com]
 Sent: Friday, December 20, 2013 7:43 AM
 To: solr-user@lucene.apache.org
 Subject: Spellchecking problem

 Hello,

 i have problem with spellchecking.
 i use solr to index an ecommerce products (dvd, cd, books ecc)
 the collation is only one but in the index there'is the field: typology (of
 product)
 When i build spellchecking indexes, they are build together.
 How can i have only suggestsions of one typology?

 i read that if i user spellcheck.collate=true and i maxcollatetries  0,
 solr evaluates every suggestion with fq parameter of the query. In my query
 i have for example fq=typology:book
 but it doesn't works. why?

 i also tried collationparameter.fq=typology:book
 the same

 i use solr 4.3
 thank you


 --
 *Gastone Penzo*




-- 
*Gastone Penzo*



RE: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Dyer, James
The first thing I would suggest is to try and run it not in debug mode.  DIH's 
debug mode limits the number of documents it will take in, so that might be all 
that is wrong here.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias 
Lux
Sent: Wednesday, December 18, 2013 4:04 AM
To: solr-user@lucene.apache.org
Subject: DataImport Handler, writing a new EntityProcessor

Hi all!

I've got a question regarding writing a new EntityProcessor, in the
same sense as the Tika one. My EntityProcessor should analyze jpg
images and create document fields to be used with the LIRE Solr plugin
(https://bitbucket.org/dermotte/liresolr). Basically I've taken the
same approach as the TikaEntityProcessor, but my setup just indexes
the first of 1000 images. I'm using a FileListEntityProcessor to get
all JPEGs from a directory and then I'm handing them over (see [2]).
My code for the EntityProcessor is at [1]. I've tried to use the
DataSource as well as the filePath attribute, but it ends up all the
same. However, the FileListEntityProcessor is able to read all the
files according to the debug output, but I'm missing the link from the
FileListEntityProcessor to the LireEntityProcessor.

I'd appreciate any pointer or help :)

cheers,
  Mathias

[1] LireEntityProcessor http://pastebin.com/JFajkNtf
[2] dataConfig http://pastebin.com/vSHucatJ

-- 
Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec



RE: SOLR DIH - Sub Entity with different datasource not working

2013-12-13 Thread Dyer, James
Without more of the stacktrace I don't think you'll get much help.  However, 
its my experience that exceptions that begin with Unable to execute query 
mean the db didn't like something about one or both queries.  I think it would 
have listed in there somewhere the actual query it didn't like, depending on 
your db driver.  If memory serves correct, i think the oracle driver also lists 
out why it didn't like the query in the exception.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lokn [mailto:nlokesh...@gmail.com] 
Sent: Friday, December 13, 2013 3:40 AM
To: solr-user@lucene.apache.org
Subject: SOLR DIH - Sub Entity with different datasource not working

Hi,
I have the data-config.xml with 2 data sources and the entity and sub
sub-entity connecting to datasource1 and datasource2 respectively.
When I do the full import, it is giving an error,
Exception :  org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: 

This is my db-data-config.xml  configuration file:

 dataSource name=datasource1 type=JdbcDataSource 
driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@blah user=aaa
password=aaa / 

 dataSource name=datasource2 type=JdbcDataSource 
driver=oracle.jdbc.OracleDriver url=jdbc:oracle:thin:@blah user=bbb
password=bbb / 
   
document 
  
entity name=item1 dataSource=datasource1 query=select NAME,
CREATED_BY from CSPR_TABLE
 field column=NAME name=name/
 field column=CREATED_BY name=user/
 
entity name=itemSubEntity dataSource=datasource1
query=select USER_ID , PREF from PREF where USER_ID =
'${item1.CREATED_BY}'
field column=USER_ID name=uid/
field column=PREF name=pref/
 /entity
/entity
/document

Let me know if there is anything wrong in this.

Thanks,
Lokesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-DIH-Sub-Entity-with-different-datasource-not-working-tp4106550.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Data Import Handler

2013-11-13 Thread Dyer, James
In solrcore.properties, put:

datasource.url=jdbc:xxx:yyy
datasource.driver=com.some.driver

In solrconfig.xml, put:

requestHandler name=/dih 
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
... 
str name=dsDriver${datasource.driver}/str
str name=dsUrl${datasource.url}/str
...
/lst
/requestHandler

In data-config.xml, put:
dataSource name=ds driver=${dataimporter.request.dsDriver} 
url=${dataimporter.request.dsUrl} /

Hope this works for you.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com] 
Sent: Wednesday, November 13, 2013 9:00 AM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

James can elaborate how to process driver=${dataimporter.request.driver} 
url =${dataimporter.request.url} and all where to mention these 
my purpose is to config my DB Details(url,uname,password) in properties file

-Original Message-
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: Wednesday, November 06, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

If you prepend the variable name with dataimporter.request, you can
include variables like these as request parameters:

dataSource name=ds driver=${dataimporter.request.driver}
url=${dataimporter.request.url} /

/dih?driver=some.driver.classurl=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally
add each property to solrconfig.xml like this:

requestHandler name=/dih
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=driver${dih.driver}/str
str name=url${dih.url}/str
/lst
/requestHandler

Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file







RE: [Spellcheck] NullPointerException on QueryComponent.mergeIds

2013-11-12 Thread Dyer, James
Jean-Marc,

This might not solve the particular problem you're having, but to get 
spellcheck to work properly in a distributed enviornment, be sure to set the 
shards.qt parameter to the name of your request handler.  See 
http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Jean-Marc Desprez [mailto:jm.desp...@gmail.com] 
Sent: Tuesday, November 12, 2013 8:57 AM
To: solr-user@lucene.apache.org
Subject: [Spellcheck] NullPointerException on QueryComponent.mergeIds

Hello,

I'm following this tutorial : http://wiki.apache.org/solr/SolrCloud with a
SolR 4.5.0

I'm at the very first step, only two replica and two shard and I have only
*one* document in the index.

When I try to get a spellcheck, I have this error :
java.lang.NullPointerException
at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843)

I do not understand what I'm doing wrong and how I can get an error on
mergeIds with only one document in the index (merge this doc with ... ??)

Some technical details :
URL :
http://127.0.0.1:8983/solr/bench/select?shards.qt=ri_spell_fr_FRq=sistemdistrib=true
If I set distrib to false, no error.

My uniqueKey is indexed and stored :

field name=ref type=string indexed=true stored=true
multiValued=false /
uniqueKeyref/uniqueKey


My conf :
requestHandler name=ri_spell_fr_FR class=solr.SearchHandler
lazy=true
  lst name=defaults
bool name=spellchecktrue/bool
str  name=spellcheck.onlyMorePopulartrue/str
str  name=spellcheck.extendedResultstrue/str
str  name=spellcheck.collateExtendedResultstrue/str
str  name=spellcheck.maxCollationTries3/str
str  name=spellcheck.collatetrue/str
str  name=spellcheck.count5/str
str  name=spellcheck.dictionaryri_spell_fr_FR/str
str  name=spellcheck.buildfalse/str
  /lst

  arr name=components
strspellcheck_fr_FR/str
  /arr
/requestHandler

searchComponent name=spellcheck_fr_FR class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypesuggest_fr_FR/str

  lst name=spellchecker
str name=nameri_spell_fr_FR/str
str name=fieldspell_fr_FR/str
str name=spellcheckIndexDir./spellchecker_fr_FR/str
str
name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str
  /lst

  ...

/searchComponent

With this URL :
http://127.0.0.1:8983/solr/bench/select?qt=ri_spell_fr_FRq=sistem

I have no error but the response is empty :
responselst name=responseHeaderint name=status0/intint
name=QTime1/int/lst/response


Thanks
Jean-Marc


RE: spellcheck solr 4.3.1

2013-11-11 Thread Dyer, James
There are 2 parameters you want to consider:

First is spellcheck.maxResultsForSuggest.  Because you have an OR query, 
you'll get hits if only 1 query term is in the index.  This parameter lets you 
tune it to make it suggest if the query returns n or fewer hits.  My memory 
tells me, however, that if you leave this parameter out entirely, it will still 
return suggestions for OR queries with some misspelled words (false memory on 
my part?).  Possibly you have this set to 1?  Omitting it might be a better 
option.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest 
.

Second is collateParam, which lets you override certain query parameters when 
the spellchecker is testing collations against the index.  For instance, if you 
have q.op=OR, the spellchecker will return collations that possibly only have 
1 correct term.  The reason is it simply checks if a collation will return any 
hits.  So you can overide this with spellcheck.collateParam.q.op=AND.  The 
same can be done for mm if using edismax.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX .

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Daniel Borup [mailto:d...@alpha-solutions.dk] 
Sent: Monday, November 11, 2013 7:38 AM
To: solr-user@lucene.apache.org
Subject: spellcheck solr 4.3.1

Hey

I am running af solr 4.3.1 and working is implementing spellcheck using 
solr.DirectSolrSpellChecker everything seems to be working  fine but at have 
one issue.

If I search for
http://localhost:8765/solr/MainIndex/spell?q=kim%20AND%20larsen

the result is some hits and the spell component return the following structure.

lst name=spellcheck
lst name=suggestions
bool name=correctlySpelledtrue/bool
/lst
/lst
I would have liked that if some suggest were found they were return

If I do a search for
http://localhost:8765/solr/MainIndex/spell?q=kim%20AND%20larsenn

with larsen spelled wrong (larsenn) the spell component return the following:

lst name=spellcheck
lst name=suggestions
lst name=larsenn
int name=numFound1/int
int name=startOffset8/int
int name=endOffset15/int
int name=origFreq0/int
arr name=suggestion
lst
str name=wordlarsen/str
int name=freq12/int
/lst
/arr
/lst
bool name=correctlySpelledfalse/bool
lst name=collation
str name=collationQuerykim AND larsen/str
int name=hits12/int
lst name=misspellingsAndCorrections
str name=kimkim/str
str name=larsennlarsen/str
/lst
/lst
/lst
/lst

In my point of view this is correct but, if I do the same search as above just 
as an OR search http://localhost:8765/solr/MainIndex/spell?q=kim%20OR%20larsenn
The spell component return some result and:

lst name=spellcheck
lst name=suggestions
bool name=correctlySpelledtrue/bool
/lst
/lst

larsenn now is spelled corrected according to solr, I cannot understand this 
behavior. Is there a setting to adjust the spell component so it always return 
suggestions ? or a way to have suggest in OR search with one wrong word working?






Med venlig hilsen / Best regards

Daniel Borup
Tel: (+45) 28 87 69 18
E-mail: d...@alpha-solutions.dkmailto:d...@alpha-solutions.dk

Alpha Solutions A/S
Sølvgade 10, 1.sal, DK-1307 Copenhagen K
Tel: (+45) 70 20 65 38
Web: www.alpha-solutions.dkhttp://www.alpha-solutions.dk/


** This message including any attachments may contain confidential and/or 
privileged information
intended only for the person or entity to which it is addressed. If you are not 
the intended recipient
you should delete this message. Any printing, copying, distribution or other 
use of this message is strictly prohibited.
If you have received this message in error, please notify the sender 
immediately by telephone
or e-mail and delete all copies of this message and any attachments from your 
system. Thank you.



RE: Data Import Handler

2013-11-06 Thread Dyer, James
If you prepend the variable name with dataimporter.request, you can include 
variables like these as request parameters:

dataSource name=ds driver=${dataimporter.request.driver} 
url=${dataimporter.request.url} /

/dih?driver=some.driver.classurl=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally add 
each property to solrconfig.xml like this:

requestHandler name=/dih 
class=org.apache.solr.handler.dataimport.DataImportHandler
lst name=defaults
str name=driver${dih.driver}/str
str name=url${dih.url}/str
/lst
/requestHandler

Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com] 
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file



RE: Need additional data processing in Data Import Handler prior to indexing

2013-10-29 Thread Dyer, James
Would an onImportEnd event listener serve your needs?

See http://wiki.apache.org/solr/DataImportHandler#EventListeners

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Dileepa Jayakody [mailto:dileepajayak...@gmail.com] 
Sent: Tuesday, October 29, 2013 3:48 PM
To: solr-user@lucene.apache.org
Subject: Need additional data processing in Data Import Handler prior to 
indexing

Hi All,

I'm a newbie to Solr, and I have a requirement to import data from a mysql
database; enhance  the imported content to identify Persons mentioned  and
index it as a separate field in Solr along with the other fields defined
for the original db query.

I'm using Apache Stanbol [1] for the content enhancement requirement.
I can get enhancement results for 'Person' type data in the content as the
enhancement result.

The data flow will be;
mysql-db  Solr data-import handler  Stanbol enhancer  Solr index

For the above requirement I need to perform additional processing at the
data-import handler prior to indexing to send a request to Stanbol and
process the enhancement response. I found some related examples on
modifying mysql data import handler to customize the query results in
db-data-config.xml by using a transformer script.
As per my requirement, In the data-import-handler I need to send a request
to Stanbol and process the response prior to indexing. But I'm not sure if
this can be achieved using a simple javascript.

Is there any other better way of achieving my requirement? Maybe writing a
custom filter in Solr?
Please share your thoughts. Appreciate any pointers as I'm a beginner for
Solr.

Thanks,
Dileepa


[1] https://stanbol.apache.org



RE: Spellcheck with Distributed Search (sharding).

2013-10-24 Thread Dyer, James
Is it that your request handler is named /suggest but you are setting 
shards.qt to /suggestion ?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Luis Cappa Banda [mailto:luisca...@gmail.com] 
Sent: Thursday, October 24, 2013 6:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellcheck with Distributed Search (sharding).

Any idea?


2013/10/23 Luis Cappa Banda luisca...@gmail.com

 More info:

 When executing the Query to a single Solr server it works:
 http://solr1:8080/events/data/suggest?q=mwt=jsonhttp://solrclusterd.buguroo.dev:8080/events/data/suggest?q=mwt=json

 {

- responseHeader:
{
   - status: 0,
   - QTime: 1
   },
- response:
{
   - numFound: 0,
   - start: 0,
   - docs: [ ]
   },
- spellcheck:
{
   - suggestions:
   [
  - m,
  -
  {
 - numFound: 4,
 - startOffset: 0,
 - endOffset: 1,
 - suggestion:
 [
- marca,
- marcacom,
- mis,
- mispelotas
]
 }
  ]
   }

 }


 But when choosing the Request handler this way it doesn't:
 http://solr1:8080/events/data/select?*qt=/sugges*twt=jsonq=*:*http://solrclusterd.buguroo.dev:8080/events/data/select?qt=/suggestwt=jsonq=*:*




 2013/10/23 Luis Cappa Banda luisca...@gmail.com

 Hello!

 I'be been trying to enable Spellchecking using sharding following the
 steps from the Wiki, but I failed, :-( What I do is:

 *Solrconfig.xml*


 *searchComponent name=suggest* class=solr.SpellCheckComponent
 lst name=spellchecker
  str name=namesuggest/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldsuggestion/str
  str name=buildOnOptimizetrue/str
 /lst
 /searchComponent


 *requestHandler name=/suggest* class=solr.SearchHandler
 lst name=defaults
  str name=dfsuggestion/str
 str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest/str
 str name=spellcheck.count10/str
  /lst
   arr name=last-components
 strsuggest/str
   /arr
 /requestHandler


 *Note:* I have two shards (solr1 and solr2) and both have the same
 solrconfig.xml. Also, bot indexes were optimized to create the spellchecker
 indexes.

 *Query*


 solr1:8080/events/data/select?q=mqt=/suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/data

 *
 *
 *Response*
 *
 *
 {

- responseHeader:
{
   - status: 404,
   - QTime: 12,
   - params:
   {
  - shards: solr1:8080/events/data,solr2:8080/events/data,
  - shards.qt: /suggestion,
  - q: m,
  - wt: json,
  - qt: /suggestion
  }
   },
- error:
{
   - msg: Server at http://solr1:8080/events/data returned non ok
   status:404, message:Not Found,
   - code: 404
   }

 }

 More query syntaxes that I used and that doesn't work:


 http://solr1:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data


 http://solr1:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solr1:8080/events/data,solr2:8080/events/datahttp://solrclusterd.buguroo.dev:8080/events/data/select?q=*:*spellcheck.q=mqt=suggestionshards.qt=/suggestionwt=jsonshards=solrclusterd.buguroo.dev:8080/events/data,solrclusterc.buguroo.dev:8080/events/data


 Any idea of what I'm doing wrong?

 Thank you very much in advance!

 Best regards,

 --
 - Luis Cappa




 --
 - Luis Cappa




-- 
- Luis Cappa



  1   2   3   4   >