RE: Index database with SolrJ using xml file directly throws an error

2019-03-01 Thread Dyer, James
Instead of dataConfig=data-config.xml, use config=data-config.xml .

From: sami 
Sent: Friday, March 1, 2019 3:05 AM
To: solr-user@lucene.apache.org
Subject: RE: Index database with SolrJ using xml file directly throws an error

Hi James,

Thanks for your reply. I am not absolotuely sure I understood everything
correctly here. I would like to index my database to start with fresh index.
I have already done it with DIH execute function.

>

It works absolutely fine. But, I want to use SolrJ API instead of using the
inbuilt execute function. The data-config.xml and solrconfig.xml works fine
with my database.

I am using the same data-config.xml file and solrconfig.xml file to do the
indexing with program mentioned in my query.

String url = "http://localhost:8983/solr/test";;
HttpSolrClient server = new HttpSolrClient.Builder(url).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "full-import");
params.set("clean", "true");
params.set("commit", "true");
params.set("optimize", "true");
params.set("dataConfig","data-config.xml"); *I tried this too. as you
suggested not to use full path. *
server.query(params);

I checked the xml file for any bogus characters too. BUT the same files work
fine with inbuilt DIH not with the code. What it could be?



--
Sent from: 
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Index database with SolrJ using xml file directly throws an error

2019-02-28 Thread Dyer, James
The parameter "dataConfig" should hold an actual xml document to override the 
data-config.xml file you store in zookeeper (cloud) or the configuration 
directory (standalone).  Typically you do not use this parameter.  Instead, 
specify the "config" parameter with the filename (eg. data-config.xml).  This 
file is the DIH configuration, not solrconfig.xml as you are using.  It is just 
the filename, or path starting at the base configuration directory, not a full 
path as you are using.  Unless you want users to override the DIH configuration 
at request time, it is best to specify the filename using the "config" 
parameter in the request handler's invariant section in solrconfig.xml.

From: sami 
Sent: Thursday, February 28, 2019 8:36 AM
To: solr-user@lucene.apache.org
Subject: Index database with SolrJ using xml file directly throws an error

I would like to index my database using SolrJ Java API. I have already tried
to use DIH directly from the Solr server. It works and indexes well. But
when I would like to use the same XML config file with SolrJ it throws an
error.

**Solr version 7.6.0 SolrJ 7.6.0**

Here is the full code I am using:

String url = "http://localhost:8983/solr/test";;
String dataConfig =
"D:/solr-7.6.0/server/solr/test/conf/solrconfig.xml";
HttpSolrClient server = new HttpSolrClient.Builder(url).build();
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("qt", "/dataimport");
params.set("command", "full-import");
params.set("clean", "true");
params.set("commit", "true");
params.set("optimize", "true");
params.set("dataConfig",dataConfig);
server.query(params);

But using this piece of code throws an error.

Exception in thread "main"
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://localhost:8983/solr/test: Data Config problem: Content
is not allowed in Prolog.

Am I doing it right? Reference:
https://stackoverflow.com/questions/31446644/how-to-do-solr-dataimport-i-e-from-rdbms-using-java-api/54905578#54905578

Is there any other way to index directly.



--
Sent from: 
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: [External] Setting Spellcheck for solr only for zero result

2018-09-26 Thread Dyer, James
Neel,

I do not think there is a way to entirely bypass spellchecking if there are 
results returned, and I'm not so sure performance would noticeably improve if 
it did this.  Clients can easily check to see if results were returned and can 
ignore the spellcheck response in these cases, if desired.

The one exception to this is if you are using "spellcheck.collate=true" with  
"spellcheck.maxCollationTries" set to a value > 0.  In this case, if your main 
query uses "o.op=OR" or a low "mm" value, you might want to force it to only 
return collations with all matching words.  In this case you would use 
something like "spellcheck.collateParam.mm=100%" to be sure it only returned 
re-written queries for which all the words matched.

The "spellcheck.maxResultsForSuggest" parameter is designed to be used in 
conjunction with "spellcheck.alternativeTermCount" to produce 
did-you-mean-style suggestions when a query returns only a few hits and at 
least some of the terms were in the index (but may be misspelled nevertheless).

James Dyer
Ingram Content Group

-Original Message-
From: neel choudhury [mailto:findneel2...@gmail.com] 
Sent: Sunday, September 23, 2018 2:58 PM
To: solr-user@lucene.apache.org
Subject: [External] Setting Spellcheck for solr only for zero result

I am looking for setting up spellcheck for solr correctly. For performance
reason (and avoiding confusion) I don't want to give any suggestion for any
query which returns at least one result. Solr provides a parameter
spellcheck.maxResultsForSuggest. For my use case i need to set is as 0 as I
only want suggestions when no result is returned. However looking into the
code of SpellCheckComponent in Solr i saw that for 0 value for
spellcheck.maxResultsForSuggest is ignored because of greater than sign. Is
there a way i can suppress spell suggestion even if 1 result is returned.

private Integer maxResultsForSuggest(ResponseBuilder rb) {
SolrParams params = rb.req.getParams();
float maxResultsForSuggestParamValue =
params.getFloat(SpellingParams.SPELLCHECK_MAX_RESULTS_FOR_SUGGEST,
0.0f);
Integer maxResultsForSuggest = null;

if (maxResultsForSuggestParamValue > 0.0f) {
...}

return maxResultsForSuggest
 }


RE: [External] [Solr 7.1.0] spellcheck.maxCollationTries > 0 no results

2018-08-09 Thread Dyer, James
It doesn't appear to me that the collator works with "spellcheck.q".  Looking 
at the unit test (SpellCheckCollatorTest.java), this is not a use-case that is 
being tested.  I opened https://issues.apache.org/jira/browse/SOLR-12650 to 
track this bug.

As a workaround, you can remove "spellcheck.q" and it might work.  You also 
probably want smaller values for spellcheck.count and 
spellcheck.maxCollationTries, maybe 10-20 for these.

James Dyer
Ingram Content Group

From: agorriz [mailto:agor...@tacitknowledge.com]
Sent: Wednesday, August 08, 2018 8:38 AM
To: solr-user@lucene.apache.org
Subject: [External] [Solr 7.1.0] spellcheck.maxCollationTries > 0 no results

I have a problem with solr suggested terms, when I search for a miss spelled
phrase or word, for example "halogan balbs" (0 results found) I want a
suggestion which will lead to results (eg "halogen bulbs").

I'm able to get a suggested phrase enabling spellcheck.collation and
spellcheck.maxCollationTries = 0, but unfortunately the suggested phrase
does not always generate results (eg. searching for "fence panel" (1 result)
suggests "face paper" (0 results)

According to documentation, in order to bypass the problem of 0 results on
the collated query I can configure spellcheck.maxCollationTries > 0, but by
doing so I noticed that the returned collation is always empty, even when
the single suggested words collated would generate results.

My question is, why is that happening and how can I avoid it?

Following an example of query for "halogen balbs" that does not work as I'm
expecting:

http://localhost:8983/solr/master_Product_default/select?fq=(catalogId:%22ProductCatalog%22%20AND%20catalogVersion:%22Online%22)&q=((code_string:halogan^100.0))%20OR%20((code_string:balbs^100.0))%20OR%20((code_string:%22halogan%20balbs%22~10.0^100.0)%20OR%20(brand.search_text_mv:%22halogan%20balbs%22~10.0^300.0)%20OR%20(categoryName_text_en_mv:%22halogan%20balbs%22~10.0^700.0)%20OR%20(type.search_text_mv:%22halogan%20balbs%22~10.0^800.0)%20OR%20(name_text_en:%22halogan%20balbs%22~10.0^500.0))&rows=20&spellcheck.dictionary=default&spellcheck.q=halogan%20balbs&spellcheck=true&spellcheck.collate=true&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&spellcheck.count=100&spellcheck.maxCollationTries=500

that query returns the following:

"spellcheck":{
"suggestions":[
"halogan",{
"numFound":1,
"startOffset":0,
"endOffset":7,
"origFreq":0,
"suggestion":[{
"word":"halogen",
"freq":84}]},
"balb",{
"numFound":1,
"startOffset":8,
"endOffset":13,
"origFreq":0,
"suggestion":[{
"word":"bulb",
"freq":198}]}],
"correctlySpelled":false,
"collations":[]}}

Note that halogen and bulb is returned as single suggestion but collations
is empty, whilst if I run the query with "spellcheck.maxCollationTries=0"
then I get "halogen bulb" as suggested collation query:

"spellcheck":{
"suggestions":[
"halogan",{
"numFound":1,
"startOffset":0,
"endOffset":7,
"origFreq":0,
"suggestion":[{
"word":"halogen",
"freq":84}]},
"balb",{
"numFound":1,
"startOffset":8,
"endOffset":13,
"origFreq":0,
"suggestion":[{
"word":"bulb",
"freq":198}]}],
"correctlySpelled":false,
"collations":[
"collation",{
"collationQuery":"halogen bulb",
"hits":0,
"misspellingsAndCorrections":[
"halogan","halogen",
"balb","bulb"]}]}}

I would expect this behaviour to happen if searching for "halogen bulb"
returns 0 results, but in this particular case the search returns results:

http://localhost:8983/solr/master_Product_default/select?fq=(catalogId:%22ProductCatalog%22%20AND%20catalogVersion:%22Online%22)&q=((code_string:halogen^100.0))%20OR%20((code_string:bulb^100.0))%20OR%20((code_string:%22halogen%20bulb%22~10.0^100.0)%20OR%20(brand.search_text_mv:%22halogen%20bulb%22~10.0^300.0)%20OR%20(categoryName_text_en_mv:%22halogen%20bulb%22~10.0^700.0)%20OR%20(type.search_text_mv:%22halogen%20bulb%22~10.0^800.0)%20OR%20(name_text_en:%22halogen%20bulb%22~10.0^500.0))&rows=20&spellcheck.dictionary=default&spellcheck.q=halogen%20bulb&spellcheck=true&spellcheck.collate=true&spellcheck.extendedResults=true&spellcheck.collateExtendedResults=true&spellcheck.count=100&spellcheck.maxCollationTries=500

returns:

"response":{"numFound":42,"start":0,"docs":[
{...}





--
Sent from: 
http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


RE: Error configuring Spell Checker

2018-04-17 Thread Dyer, James
(moving to solr-user@lucene.apache.org)

Gene,

I can reproduce your problem if I misspell the "spellcheck.dictionary" 
parameter in my query.  But I see your query has "direct" which matches the 
"name" element of one of your spellcheckers.  I think the actual problem in 
your case might be that you have separate  sections in your 
configuration.  This might be causing only the last Search Component named 
"spellcheck" to be active.  I believe you need to have just one:


 
  direct
  ...
 
 
  index
  ...
 
 
  wordbreak
  ...
 
 


James Dyer
Ingram Content Group

From: genel [mailto:g...@tekdata.com] 
Sent: Monday, April 16, 2018 12:25 PM
To: java-u...@lucene.apache.org
Subject: Error configuring Spell Checker
Importance: Low

We've been using SOLR for quite awhile. I'm attempting to install spell
checking.

I think I have the basic configuration correct, because the wordbreak
component seems to work, but none of the others do. 

I consistently get an NPE error 


. java.lang.NullPointerException at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:147)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073) at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658) at
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:457) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:223)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Unknown Source) 

Relevant part of solrconfig:





direct
text_spell
solr.DirectSolrSpellChecker
internal
0.5
2
2
5
4
0.01
.01




index
solr.IndexBasedSpellChecker
./spellchecker
text_spell
true





wordbreak
solr.WordBreakSolrSpellChecker
text_spell
true
true
10

 


explicit 
20 
text

direct

on
true
10
5
5
true
true
10
5


spellcheck



Relevant part of schema:


























url:

http://localhost:8983/solr/gene/spell?spellcheck.q=rainb&spellcheck=true

I've tried just about everything I can think of, what am I missing?



--
Sent from: http://lucene.472066.n3.nabble.com/Lucene-Java-Users-f532864.html

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



RE: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation

2017-01-17 Thread Dyer, James
This sounds a lot like SOLR-4489.  However it looks like this was fixed prior 
to you version (4.5).  So it could be you found another case where this bug 
still exists.

The other thing is the default Query Converter cannot handle all cases, and it 
could be the query you are sending is beyond its abilities?  Even in this case, 
it'd be nice if it failed more gracefully than this.

Could you provide the query parameters you are sending and also how you have 
spellcheck configured?

James Dyer
Ingram Content Group


-Original Message-
From: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Sent: Thursday, January 05, 2017 8:22 AM
To: 'solr-user@lucene.apache.org' 
Subject: StringIndexOutOfBoundsException "in" SpellCheckCollator.getCollation

I am seeing many exceptions like this in my Solr [5.4.1] log:
null:java.lang.StringIndexOutOfBoundsException: String index out of range: -2
at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
at java.lang.StringBuilder.replace(StringBuilder.java:262)
at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:236)
at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:93)
at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:238)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:203)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:273)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2073)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:658)
...
at java.lang.Thread.run(Thread.java:745)

What am I potentially facing here?

Thx
Clemens


RE: Can't get spelling suggestions to work properly

2017-01-17 Thread Dyer, James
Jimi,

Generally speaking, spellcheck does not work well against fields with stemming, 
or other "heavy" analysis.  I would  to a field that is tokenized 
on whitespace with little else, and use that field for spellcheck.

By default, the spellchecker does not suggest for words in the index.  So if 
the user misspells a word but the misspelling is actually some other word that 
is indexed, it will never suggest.  You can orverride this behavior by 
specifying  "spellcheck.alternativeTermCount" with a value >0.  This is how 
many suggestions it should give for words that indeed exist in the index.  This 
can be the same value as "spellcheck.count", but you may wish to set it to a 
lower value.

I do not recommend using "spellcheck.onlyMorePopular".  It is similar to 
"spellcheck.alternativeTermCount", but in my opinion, the later gives a better 
experience.

You might also wish to set "spellcheck.maxResultsForSuggest".  If you set this, 
then the spellchecker will not suggest anything if more results are returned 
than the value you specify.  This is helpful in providing "did you mean"-style 
suggestions for queries that return few results.

If you would like to ensure the suggestions combine nicely into a re-written 
query that returns results, then specify both "spellcheck.collate=true" and 
"spellcheck.maxCollationTries" to a value >0 (possibly 5-10).  This will cause 
it to internally check the re-written queries (aka. Collations) and report back 
on how many results you get for each.  If you are using "q.op=OR" or a low 
value for "mm", then you will likely want to override this with something like 
"spellcheck.collateParam.mm=0".  Otherwise every combination will get reported 
as returning results.

I hope this and other comments you've gotten helps demystify spellcheck 
configuration.  I do agree it is fairly complicated and frustrating to get it 
just right.

James Dyer
Ingram Content Group

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:16 AM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

I just noticed why setting maxResultsForSuggest to a high value was not a good 
thing. Because now it show spelling suggestions even on correctly spelled words.

I think, what I would need is the logic of SuggestMode. 
SUGGEST_WHEN_NOT_IN_INDEX, but with a configurable limit instead of it being 
hard coded to 0. Ie just as maxQueryFrequency works.

/Jimi

-Original Message-
From: jimi.hulleg...@svensktnaringsliv.se 
[mailto:jimi.hulleg...@svensktnaringsliv.se] 
Sent: Friday, January 13, 2017 5:56 PM
To: solr-user@lucene.apache.org
Subject: RE: Can't get spelling suggestions to work properly

Hi Alessandro,

Thanks for your explanation. It helped a lot. Although setting 
"spellcheck.maxResultsForSuggest" to a value higher than zero was not enough. I 
also had to set "spellcheck.alternativeTermCount". With that done, I now get 
suggestions when searching for 'mycet' (a misspelling of the Swedish word 
'mycket', that didn't return suggestions before).

Although, I'm still not able to fully understand how to configure this 
properly. Because with this change there now are other misspelled searches that 
now longer gives suggestions. The problem here is stemming, I suspect. Because 
the main search fields use stemming, so that in some cases one can get lots of 
results for spellings that doesn't exist in the index at all (or, at least not 
in the spelling-field). How can I configure this component so that those 
suggestions are still included? Do I need to set maxResultsForSuggest to a 
really high number? Like Integer.MAX_VALUE? I feel that such a setting would 
defeat the purpose of that parameter, in a way. But I'm not sure how else to 
solve this.

Also, there is one other things I wonder about the spelling suggestions, that 
you might have the answer to. Is there a way to make the logic case 
insensitive, but the presentation case sensitive? For example, a search for 
'georg washington' now would return 'george washington' as a suggestion, but ' 
Georg Washington' would be even better.

Regards
/Jimi


-Original Message-
From: alessandro.benedetti [mailto:abenede...@apache.org] 
Sent: Thursday, January 12, 2017 5:14 PM
To: solr-user@lucene.apache.org
Subject: Re: Can't get spelling suggestions to work properly

Hi Jimi,
taking a look to the *maxQueryFrequency*  param :

Your understanding is correct.

1) we don't provide misspelled suggestions if we set the param to 1, and we 
have a minimum of 1 doc freq for the term .

2) we don't provide misspelled suggestions if the doc frequency of the term is 
greater than the max limit set.

Let us explore the code :

if (suggestMode==SuggestMode.SUGGEST_WHEN_NOT_IN_INDEX && docfreq > 0) {
  return new SuggestWord[0];
}
/// If we are working in "Not in Index Mode" , with a document frequency >0 we 
get n

RE: CachedSqlEntityProcessor with delta-import

2016-10-21 Thread Dyer, James
Sowmya,

My memory is that the cache feature does not work with Delta Imports.  In fact, 
I believe that nearly all DIH features except straight JDBC imports do not work 
with Delta Imports.  My advice is to not use the Delta Import feature at all as 
the same result can (often more-efficiently) be accomplished following the 
approach outlined here: 
https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport

James Dyer
Ingram Content Group

-Original Message-
From: Mohan, Sowmya [mailto:sowmya.mo...@icf.com] 
Sent: Tuesday, October 18, 2016 10:07 AM
To: solr-user@lucene.apache.org
Subject: CachedSqlEntityProcessor with delta-import

Good morning,

Can CachedSqlEntityProcessor be used with delta-import? In my setup when 
running a delta-import with CachedSqlEntityProcessor, the child entity values 
are not correctly updated for the parent record. I am on Solr 4.3. Has anyone 
experienced this and if so how to resolve it?

Thanks,
Sowmya.



RE: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE

2016-07-29 Thread Dyer, James
You need to set the "spellcheck.maxCollationTries" parameter to a value greater 
than zero.  The higher the value, the more queries it checks for hits, and the 
longer it could potentially take.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-Thespellcheck.maxCollationTriesParameter

James Dyer
Ingram Content Group

-Original Message-
From: SRINI SOLR [mailto:srini.s...@gmail.com] 
Sent: Friday, July 22, 2016 12:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.3.1 - Spell-Checker with MULTI-WORD PHRASE

Hi all - please help me here

On Thursday, July 21, 2016, SRINI SOLR  wrote:
> Hi All -
> Could you please help me on spell check on multi-word phrase as a whole...
> Scenario -
> I have a problem with solr spellcheck suggestions for multi word phrases.
With the query for 'red chillies'
>
>
q=red+chillies&wt=xml&indent=true&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true
>
> I get
>
> 
> 
> 2
> 4
> 12
> 0
> 
> chiller4
> challis2
> 
> 
> false
> red chiller
> 
>
> The problem is, even though 'chiller' has 4 results in index, 'red
chiller' has none. So we end up suggesting a phrase with 0 result.
>
> What can I do to make spellcheck work on the whole phrase only?
>
> Please help me here ...


RE: using spell check on phrases

2016-06-10 Thread Dyer, James
Kaveh,

If your query has "mm" set to zero or a low value, then you may want to 
override this when the spellchecker checks possible collations.  For example:

spellcheck.collateParam.mm=100%

You may also want to consider adding "spellcheck.maxResultsForSuggest" to your 
query, so that it will return spelling suggestions even when the query returns 
some results.  Also if you set "spellcheck.alternativeTermCount", then it will 
try to correct all of the query keywords, including those that exist in the 
dictionary.

See https://cwiki.apache.org/confluence/display/solr/Spell+Checking for more 
information.

James Dyer
Ingram Content Group

-Original Message-
From: kaveh minooie [mailto:ka...@plutoz.com] 
Sent: Monday, June 06, 2016 8:19 PM
To: solr-user@lucene.apache.org
Subject: using spell check on phrases

Hi everyone

I am using solr 6 and DirectSolrSpellChecker, and edismax parser. the 
problem that I am having is that when the query is a phrase, every 
single word in the phrase need to be misspelled for the spell checker to 
gets activated and gives suggestions. if only one of the word is 
misspelled then it just says that spelling is correct:
true

I was wondering if anyone has encountered this situation before and 
knows how to solve it?

thanks,

-- 
Kaveh Minooie



RE: How get around solr's spellcheck maxEdit limit of 2?

2016-01-22 Thread Dyer, James
See the old docs at 
https://wiki.apache.org/solr/SpellCheckComponent#Configuration

In particular, you need this line in solrconfig.xml:

./spellchecker


James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Friday, January 22, 2016 11:20 AM
To: solr-user@lucene.apache.org
Subject: Re: How get around solr's spellcheck maxEdit limit of 2?

Ok, But IndexBasedSpellChecker needs a directory where all indexes are
stored to do spell check. I don't have any idea about
IndexBasedSpellChecker. If you send me snap configuration of that. It will
help me.. Thanks

On Fri, Jan 22, 2016 at 1:45 AM Dyer, James 
wrote:

> But if you really need more than 2 edits, I think IndexBasedSpellChecker
> supports it.
>
> James Dyer
> Ingram Content Group
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Thursday, January 21, 2016 11:29 AM
> To: solr-user
> Subject: Re: How get around solr's spellcheck maxEdit limit of 2?
>
> bq: ...is anyway to increase that maxEdit
>
> IIUC, increasing maxEdit beyond 2 increases the space/time required
> unacceptably, that limit is there on purpose, put there by people who
> know their stuff.
>
> Best,
> Erick
>
> On Thu, Jan 21, 2016 at 12:39 AM, Nitin Solanki 
> wrote:
> > I am using Solr for spell Correction. Solr is limited to maxEdit of 2.
> Does
> > there is anyway to increase that maxEdit without using phonetic mapping ?
> > Please any suggestions
>
>


RE: How get around solr's spellcheck maxEdit limit of 2?

2016-01-21 Thread Dyer, James
But if you really need more than 2 edits, I think IndexBasedSpellChecker 
supports it.

James Dyer
Ingram Content Group

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, January 21, 2016 11:29 AM
To: solr-user
Subject: Re: How get around solr's spellcheck maxEdit limit of 2?

bq: ...is anyway to increase that maxEdit

IIUC, increasing maxEdit beyond 2 increases the space/time required
unacceptably, that limit is there on purpose, put there by people who
know their stuff.

Best,
Erick

On Thu, Jan 21, 2016 at 12:39 AM, Nitin Solanki  wrote:
> I am using Solr for spell Correction. Solr is limited to maxEdit of 2. Does
> there is anyway to increase that maxEdit without using phonetic mapping ?
> Please any suggestions



RE: Spellcheck response format differs between a single core and SolrCloud

2016-01-11 Thread Dyer, James
Ryan,

The json response format changed for Solr 5.0.  See 
https://issues.apache.org/jira/browse/SOLR-3029 .  Is the single-core solr 
running a 4.x version with the cloud solr running 5.x ?  If they are both on 
the same major version, then we have a bug.

James Dyer
Ingram Content Group


-Original Message-
From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] 
Sent: Monday, January 11, 2016 12:32 AM
To: solr-user@lucene.apache.org
Subject: Spellcheck response format differs between a single core and SolrCloud

Hello,

I am using the spellcheck component for spelling suggestions and I've used
the same configurations in two separate projects, the only difference is
one project uses a single core and the other is a collection on SolrCloud
with three shards. The single core has about 56K docs and the one on
SolrCloud has 1M docs. Strangely, the format of the response is slightly
different between the two and I'm not sure why (particularly the collations
part). Was wondering if any can shed some light on this? Below is my
configuration and the results I'm getting.

This is in my "/select" searchHandler:


on
false
5
2
5
true
true
5
3

And my spellcheck component:



  
  
default
spelling
solr.DirectSolrSpellChecker
internal
0.5
2
1
5
4
0.01
  


Examples of each output can be found here:
https://gist.github.com/ryac/ceff8da00ec9f5b84106

Thanks,
Ryan


RE: DIH Caching w/ BerkleyBackedCache

2015-12-16 Thread Dyer, James
Todd,

I have no idea if this will perform acceptable with so many multiple values.  I 
doubt the solr/patch code was really optimized for such a use case.  In my 
production environment, I have je-6.2.31.jar on the classpath.  I don't think 
I've tried it with other versions.

James Dyer
Ingram Content Group

-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Wednesday, December 16, 2015 10:21 AM
To: solr-user@lucene.apache.org
Subject: RE: DIH Caching w/ BerkleyBackedCache

James,

I apologize for the late response.


Dyer, James-2 wrote
> With the DIH request, are you specifying "cacheDeletePriorData=false"

We are not specifying that property (it looks like it defaults to "false").
I'm actually seeing this issue when running a full clean/import.

It appears that the Berkeley DB "cleaner" is always removing the oldest file
once there are three. In this case, I'll see two 1GB files and then as the
third file is being written (after ~200MB) the oldest 1GB file will fall off
(i.e. get deleted). I'm only utilizing ~13% disk space at the time. I'm
using Berkeley DB version 4.1.6 with Solr 4.8.1. I'm not specifying any
other configuration properties other than what I mentioned before. I simply
cannot figure out what is going on with the "cleaner" logic that would deem
that file "lowest utilized". Any other Berkeley DB/system configuration I
could consider that would affect this?

It's possible that this caching simply might not be suitable for our data
set where one document might contain a field with tens of thousands of
values... maybe this is the bottleneck with using this database as every add
copies in the prior data and then the "cleaner" removes the old stuff. Maybe
it's working like it should but just incredibly slow... I can get a full
index without caching in about two hours, however, when using this caching
it was still running after 24 hours (still caching the sub-entity).

Thanks again for the reply.

Respectfully,
Todd



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4245777.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Data Import Handler - Multivalued fields - splitBy

2015-12-04 Thread Dyer, James
Brian,

Be sure to have...

transformer="RegexTransformer"

...in your  tag.  It’s the RegexTransformer class that looks for 
"splitBy".

See https://wiki.apache.org/solr/DataImportHandler#RegexTransformer for more 
information.

James Dyer
Ingram Content Group


-Original Message-
From: Brian Narsi [mailto:bnars...@gmail.com] 
Sent: Friday, December 04, 2015 3:10 PM
To: solr-user@lucene.apache.org
Subject: Data Import Handler - Multivalued fields - splitBy

I have the following:





I believe I had the following working (splitting on pipe delimited)



But it does not work now.



In-fact now I have even tried



But I cannot get the values to split into an array.

Any thoughts/suggestions what may be wrong?

Thanks,


RE: Spellcheck error

2015-12-03 Thread Dyer, James
Matt,

Can you give some information about how your spellcheck field is analyzed and 
also if you're using a custom query converter.  Also, try and place the bare 
terms you want checked in spellcheck.q (ex, if your query is q=+movie +theatre, 
then spellcheck.q=movie theatre).  Does it work in this case?  Also, could you 
give the exact query you're using?

This is the very same bug as in the 3 tickets you mention.  We clearly haven't 
solved all of the possible ways this bug can be triggered.  But we cannot fix 
this unless we can come up with a unit test that reliably reproduces it.  At 
the very least, we should handle these problems better than throwing SIOOB like 
this.

Long term, there is probably a better design we could come up with for how 
terms are identified within queries and how collations are generated.

James Dyer
Ingram Content Group


-Original Message-
From: Matt Pearce [mailto:m...@flax.co.uk] 
Sent: Thursday, December 03, 2015 10:40 AM
To: solr-user
Subject: Spellcheck error

Hi,

We're using Solr 5.3.1, and we're getting a 
StringIndexOutOfBoundsException from the SpellCheckCollator. I've done 
some investigation, and it looks like the problem is that the corrected 
string is shorter than the original query.

For example, the search term is "theatre", the suggested correction is 
"there". The error is being thrown when replacing the original query 
with the shorter replacement.

This is the stack trace:
java.lang.StringIndexOutOfBoundsException: String index out of range: -2
 at 
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:824)
 at java.lang.StringBuilder.replace(StringBuilder.java:262)
 at 
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
 at 
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
 at 
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:237)
 at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:202)
 at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:277)

The error looks very similar to those described in 
https://issues.apache.org/jira/browse/SOLR-4489, 
https://issues.apache.org/jira/browse/SOLR-3608 and 
https://issues.apache.org/jira/browse/SOLR-2509, most of which are closed.

Any suggestions would be appreciated, or should I open a JIRA ticket?

Thanks,

Matt

-- 
Matt Pearce
Flax - Open Source Enterprise Search
www.flax.co.uk



RE: DIH Caching w/ BerkleyBackedCache

2015-11-20 Thread Dyer, James
Todd,

With the DIH request, are you specifying "cacheDeletePriorData=false".  Looking 
at the BerkleyBackedCache code if this is set to true, it deletes the cache and 
assumes the current update is to fully repopulate it.  If you want to do an 
incremental update to the cache, it needs to be false.  You might also need to 
specify "clean=false", but I'm not sure if this is a requirement.

I've used DIH with BerkleyBackedCache for a few years and it works well for us. 
 But rather than using it inline, we have a number of DIH handlers that just 
build caches, then when they're all built, a final DIH joins data from the 
caches and indexes it to solr.  We also do like you are, with several handlers 
running at once, each doing part of the data.

But I have to warn you this code hasn't been maintained by anyone.  I'm using 
an older DIH jar (4.6) with newer solr.  I think there might have been an api 
change or something that prevented the uncommitted caching code from working 
with newer versions, but I honestly forget.  This is probably a viable solution 
if you don't want to write any code, but it might take some trial and error 
getting it to work.

James Dyer
Ingram Content Group


-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Tuesday, November 17, 2015 8:11 AM
To: solr-user@lucene.apache.org
Subject: Re: DIH Caching w/ BerkleyBackedCache

Mikhail Khludnev wrote
> It's worth to mention that for really complex relations scheme it might be
> challenging to organize all of them into parallel ordered streams.

This will most likely be the issue for us which is why I would like to have
the Berkley cache solution to fall back on, if possible. Again, I'm not sure
why but it appears that the Berkley cache is overwriting itself (i.e.
cleaning up unused data) when building the database... I've read plenty of
other threads where it appears folks are having success using that caching
solution.


Mikhail Khludnev wrote
> threads... you said? Which ones? Declarative parallelization in
> EntityProcessor worked only with certain 3.x version.

We are running multiple DIH instances which query against specific
partitions of the data (i.e. mod of the document id we're indexing).



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142p4240562.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: DIH Caching with Delta Import

2015-10-21 Thread Dyer, James
The DIH Cache feature does not work with delta import.  Actually, much of DIH 
does not work with delta import.  The workaround you describe is similar to the 
approach described here: 
https://wiki.apache.org/solr/DataImportHandlerDeltaQueryViaFullImport , which 
in my opinion is the best way to implement partial updates with DIH.

James Dyer
Ingram Content Group

-Original Message-
From: Todd Long [mailto:lon...@gmail.com] 
Sent: Tuesday, October 20, 2015 8:02 PM
To: solr-user@lucene.apache.org
Subject: DIH Caching with Delta Import

It appears that DIH entity caching (e.g. SortedMapBackedCache) does not work
with deltas... is this simply a bug with the DIH cache support or somehow by
design?

Any ideas on a workaround for this? Ideally, I could just omit the
"cacheImpl" attribute but that leaves the query (using the default processor
in my case) without the appropriate where clause including the "cacheKey"
and "cacheLookup". Should SqlEntityProcessor be smart enough to ignore the
cache with deltas and simply append a where clause which includes the
"cacheKey" and "cacheLookup"? Or possibly just include a where clause which
includes ('${dih.request.command}' = 'full-import' or cacheKey =
cacheLookup)? I suppose those could be used to mitigate the issue but I was
hoping for possibly a better solution.

Any help would be greatly appreciated. Thank you.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-with-Delta-Import-tp4235598.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: DIH parallel processing

2015-10-15 Thread Dyer, James
Nabil,

What we do is have multiple dih request handlers configured in solrconfig.xml.  
Then in the sql query we put something like "where mod(id, ${partition})=0".  
Then an external script calls a full import on each request handler at the same 
time and monitors the response.  This isn't the most elegant solution but it 
gets around the fact that DIH is single-threaded.

James Dyer
Ingram Content Group


-Original Message-
From: nabil Kouici [mailto:koui...@yahoo.fr] 
Sent: Thursday, October 15, 2015 3:58 AM
To: Solr-user
Subject: DIH parallel processing

Hi All,
I'm using DIH to index more than 15M from Sql Server to Solr. This take more 
than 2 hours. Big amount of this time is consumed by data fetching from 
database. I'm thinking about a solution to have parallel (thread) loud in the 
same DIH. Each thread load a part of data.
Do you have any experience with this kind of situation?
Regards,Nabil. 


RE: File-based Spelling

2015-10-13 Thread Dyer, James
Mark,

The older spellcheck implementations create an n-gram sidecar index, which is 
why you're seeing your name split into 2-grams like this.  See the IR Book by 
Manning et al, section 3.3.4 for more information.  Based on the results you're 
getting, I think it is loading your file correctly.  You should now try a query 
against this spelling index, using words *not* in the file you loaded that are 
within 1 or 2 edits from something that is in the dictionary.  If it doesn't 
yield suggestions, then post the relevant sections of the solrconfig.xml, 
schema.xml and also the query string you are trying.

James Dyer
Ingram Content Group


-Original Message-
From: Mark Fenbers [mailto:mark.fenb...@noaa.gov] 
Sent: Monday, October 12, 2015 2:38 PM
To: Solr User Group
Subject: File-based Spelling

Greetings!

I'm attempting to use a file-based spell checker.  My sourceLocation is 
/usr/share/dict/linux.words, and my spellcheckIndexDir is set to 
./data/spFile.  BuildOnStartup is set to true, and I see nothing to 
suggest any sort of problem/error in solr.log.  However, in my 
./data/spFile/ directory, there are only two files: segments_2 with only 
71 bytes in it, and a zero-byte write.lock file.  For a source 
dictionary having 480,000 words in it, I was expecting a bit more 
substance in the ./data/spFile directory.  Something doesn't seem right 
with this.

Moreover, I ran a query on the word Fenbers, which isn't listed in the 
linux.words file, but there are several similar words.  The results I 
got back were odd, and suggestions included the following:
fenber
f en be r
f e nb er
f en b er
f e n be r
f en b e r
f e nb e r
f e n b er
f e n b e r

But I expected suggestions like fenders, embers, and fenberry, etc. I 
also ran a query on Mark (which IS listed in linux.words) and got back 
two suggestions in a similar format.  I played with configurables like 
changing the fieldType from text_en to string and the characterEncoding 
from UTF-8 to ASCII, etc., but nothing seemed to yield any different 
results.

Can anyone offer suggestions as to what I'm doing wrong?  I've been 
struggling with this for more than 40 hours now!  I'm surprised my 
persistence has lasted this long!

Thanks,
Mark


RE: Spell Check and Privacy

2015-10-12 Thread Dyer, James
Arnon,

Use "spellcheck.collate=true" with "spellcheck.maxCollationTries" set to a 
non-zero value.  This will give you re-written queries that are guaranteed to 
return hits, given the original query and filters.  If you are using an "mm" 
value other than 100%, you also will want specify 
"spellcheck.collateParam.mm=100%". (or if using "q.op=OR", then use 
"spellcheck.collateParam.q.op=AND")

Of course, the first section of the spellcheck result will still show every 
possible suggestion, so your client needs to discard these and not divulge them 
to the user.  If you need to know word-by-word how the collations were 
constructed, then specify "spellcheck.collateExtendedResults=true".  Use the 
extended collation results for this information and not the first section of 
the spellcheck results.

This is all fairly well-documented on the old solr wiki:  
https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate

James Dyer
Ingram Content Group

-Original Message-
From: Arnon Yogev [mailto:arn...@il.ibm.com] 
Sent: Monday, October 12, 2015 2:33 AM
To: solr-user@lucene.apache.org
Subject: Spell Check and Privacy

Hi,

Our system supports many users from different organizations and with 
different ACLs. 
We consider adding a spell check ("did you mean") functionality using 
DirectSolrSpellChecker. However, a privacy concern was raised, as this 
might lead to private information being revealed between users via the 
suggested terms. Using the FileBasedSpellChecker is another option, but 
naturally a static list of terms is not optimal.

Is there a best practice or a suggested method for these kind of cases?

Thanks,
Arnon



RE: String index out of range exception from Spell check

2015-09-28 Thread Dyer, James
This looks similar to SOLR-4489, which is marked fixed for version 4.5.  If 
you're using an older version, the fix is to upgrade.  

Also see SOLR-3608, which is similar but here it seems as if the user's query 
is more than spellcheck was designed to handle.  This should still be looked at 
and possibly we can come up with a way to handle these cases.

A way to work around these bugs is to strip your query down to raw terms, 
separated by spaces, and use "spellcheck.q" with the raw terms only.

James Dyer
Ingram Content Group


-Original Message-
From: davidphilip cherian [mailto:davidphilipcher...@gmail.com] 
Sent: Sunday, September 27, 2015 3:50 PM
To: solr-user@lucene.apache.org
Subject: String index out of range exception from Spell check

There are irregular exceptions from spell check component. Below is the
stack trace. This is not common for all the q terms but have often seen
them occurring for specific queries after enabling spellcheck.collate
method.



String index out of range: -3



java.lang.StringIndexOutOfBoundsException: String index out of range: -3 at
java.lang.AbstractStringBuilder.replace(AbstractStringBuilder.java:789) at
java.lang.StringBuilder.replace(StringBuilder.java:266) at
org.apache.solr.spelling.SpellCheckCollator.getCollation(SpellCheckCollator.java:235)
at
org.apache.solr.spelling.SpellCheckCollator.collate(SpellCheckCollator.java:92)
at
org.apache.solr.handler.component.SpellCheckComponent.addCollationsToResponse(SpellCheckComponent.java:230)
at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:197)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:226)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:497) at
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:722)



500


RE: Spellcheck / Suggestions : Append custom dictionary to SOLR default index

2015-08-25 Thread Dyer, James
Max,

If you know the entire list of words you want to spellcheck against, you can 
use FileBasedSpellChecker.  See 
http://wiki.apache.org/solr/FileBasedSpellChecker .

If, however, you have a field you want to spellcheck against but also want 
additional words added, consider using a copy of the field for spellcheck 
purposes, and then index the additional terms to that field.   You may be able 
to accomplish this easily, for instance, by using index-time synonyms in the 
analysis chain for the spellcheck field.  Or you could just append them to any 
document (more than once if you want to boost the term frequency).

Keep in mind that while this will work fine for regular word-by-word spell 
suggestions, collations are not going to work well with these approaches.

James Dyer
Ingram Content Group

-Original Message-
From: Max Chadwick [mailto:mpchadw...@gmail.com] 
Sent: Monday, August 24, 2015 9:43 PM
To: solr-user@lucene.apache.org
Subject: Spellcheck / Suggestions : Append custom dictionary to SOLR default 
index

Is there a way to append a set of words the the out-of-box solr index when
using the spellcheck / suggestions feature?


RE: exclude folder in dataimport handler.

2015-08-20 Thread Dyer, James
I took a quick look at FileListEntityProcessor#init, and it looks like it 
applies the "excludes" regex to the filename element of the path only, and not 
to the directories.

If your filenames do not have a naming convention that would let you use it 
this way, you might be able to write a transformer to get what you want.

James Dyer
Ingram Content Group


-Original Message-
From: coolmals [mailto:coolm...@gmail.com] 
Sent: Thursday, August 20, 2015 12:57 PM
To: solr-user@lucene.apache.org
Subject: exclude folder in dataimport handler.

I am importing files from my file system and want to exclude import of files
from folder called templatedata. How do i configure that in entity. 
excludes="templatedata" doesnt seem to work.

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/exclude-folder-in-dataimport-handler-tp4224267.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr spell check not showing any suggestions for other language

2015-08-05 Thread Dyer, James
Talha,

Can you try putting your queried keyword in "spellcheck.q" ?

James Dyer
Ingram Content Group


-Original Message-
From: talha [mailto:talh...@gmail.com] 
Sent: Wednesday, August 05, 2015 10:13 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr spell check not showing any suggestions for other language

Dear James

Thank you for your reply.

I tested analyser without “solr.EnglishPossessiveFilterFactory” but still no
luck. I also updated analyser please find this below.


  



  



with above configuration for “text_sugggest” i got following results

For Correct Bangla Word: সহজ Solr response is 
Note: i set rows to 0 to skip results


  0
  2
  
সহজ
true
0
xml
1438787238383
  




  
true
  



For an Incorrect Bangla Word: সহগ where i just changed last letter and Solr
response is


  0
  7
  
সহগ
true
0
xml
1438787208052
  




  
false
  






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spell-check-not-showing-any-suggestions-for-other-language-tp4220950p4221033.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr spell check not showing any suggestions for other language

2015-08-05 Thread Dyer, James
Talha,

Possibly this english-specific analysis in your "text_suggest" field is 
interfering:  solr.EnglishPossessiveFilterFactory ?

Another guess is you're receiving more than 5 results and 
"maxResultsForSuggest" is set to 5.

But I'm not sure.  Maybe someone can help with more information from you?

Can you provide a few document examples that have Bangla text, then the full 
query request with a misspelled Bangla word (from the document examples you 
provide), then the full spellcheck response, and the total # of documents 
returned ? 

James Dyer
Ingram Content Group

-Original Message-
From: talha [mailto:talh...@gmail.com] 
Sent: Wednesday, August 05, 2015 5:20 AM
To: solr-user@lucene.apache.org
Subject: Solr spell check not showing any suggestions for other language

Solr spell check is not showing any suggestions for other language.I have
indexed mutli-languages (english and bangla) in same core.It's showing
suggestions for wrongly spelt english word but in case of wrongly spelt
bangla word it showing "correctlySpelled = false" but not showing any
suggestions for it.

Please check my configuration for spell check below

solrconfig.xml


  

explicit
10
product_name

on
default
wordbreak
true
5
2
5
true
true
5
3

  
  
spellcheck
  



  text_suggest

  
default
suggest
solr.DirectSolrSpellChecker
internal
0.5
  

  
wordbreak
suggest
solr.WordBreakSolrSpellChecker
true
true
10
5
  



schema.xml


  





  
  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spell-check-not-showing-any-suggestions-for-other-language-tp4220950.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Solr spell check mutliwords

2015-07-30 Thread Dyer, James
Talha,

In your configuration, you have this set:

5

...which means it will consider the query "correctly spelled" and offer no 
suggestions if there are 5 or more results. You could omit this parameter and 
it will always suggest when possible.  

Possibly, a better option would be to add "spellcheck.collateParam.mm=100%" or 
"spellcheck.collateParam.q.op=100%", so when testing collations against the 
index, it will require all the terms to match something.  See 
https://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX for 
more information.

James Dyer
Ingram Content Group

-Original Message-
From: talha [mailto:talh...@gmail.com] 
Sent: Wednesday, July 22, 2015 9:34 AM
To: solr-user@lucene.apache.org
Subject: Solr spell check mutliwords

Could not figure out actual reason why my configured Solr spell checker not
giving desire output. In my indexed data query: symphony+mobile has around
3.5K+ docs and spell checker detect it as correctly spelled. When i
miss-spell "symphony" in query: symphony+mobile it showing only results for
"mobile" and spell checker detect this query as correctly spelled. I have
searched this query in different combination. Please find search result stat

Query: symphony 
ResultFound: 1190
SpellChecker: correctly spelled

Query: mobile
ResultFound: 2850
SpellChecker: correctly spelled

Query: simphony
ResultFound: 0
SpellChecker: symphony 
Collation Hits: 1190

Query: symphony+mobile
ResultFound: 3585
SpellChecker: correctly spelled 

Query: simphony+mobile
ResultFound: 2850
SpellChecker: correctly spelled

Query: symphony+mbile
ResultFound: 1190
SpellChecker: correctly spelled 

In last two quries it should suggest something for miss-spelled word
"simphony" and "mbile"

Please find my configuration below. Only spell check configuration are given

solrconfig.xml

  
  

explicit
10
product_name

on
default
wordbreak
true
5
2
5
true
true
5
3

  
  
spellcheck
  
  

  

  text_suggest

  
default
suggest
solr.DirectSolrSpellChecker
internal
0.5
  

  
wordbreak
suggest
solr.WordBreakSolrSpellChecker
true
true
10
5
  

  

schema.xml

  
  





  
  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spell-check-mutliwords-tp4218580.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Protwords in solr spellchecker

2015-07-10 Thread Dyer, James
Kamal,

Given the constraint that you cannot re-index the data, your best bet might be 
to simply filter out the suggestions at the application level, or maybe even 
have a proxy do it.

Possibly another option, you might be able to extend DirectSolrSpellchecker and 
override #getSuggestions(), calling super(), then post-filtering out your stop 
words from the response.  You'll want to request a few more terms so you're 
more likely to get results even if a term or two get filtered out.  You can 
specify your custom spell checker in solrconfig.xml.

James Dyer
Ingram Content Group


-Original Message-
From: Alessandro Benedetti [mailto:benedetti.ale...@gmail.com] 
Sent: Friday, July 10, 2015 7:00 AM
To: solr-user@lucene.apache.org
Subject: Re: Protwords in solr spellchecker

So let's try to analyse the situation from the spellchecking point of view .
First of all we follow David suggestions and we add in the QueryTime
analysis, the StopWordsFilter, with our configured "bad" words.

*Starting scenario*
- we have the protected words in our index, we still want them to be in
there

Let's explore the different kind of Spellcheckers available, where do they
take the suggestions ? :

*Index Based Spellchecker*
The suggestions will come from an auxiliary index.

*Direct Spellchecker*
The suggestions will come from the current index.

*File based spellchecker*
It uses an external file to get the spelling suggestions from, so we can
curate this file properly with only good words, and we are fine.
But I guess you would like to use a blacklist, in this case we are going to
have a white list.

*Query Time*
At query time *the query is analysed *and a token stream is provided.
Then depending on the implementation we trigger a different lookup.
In the case of the Direct Spellchecker, if I remember well :
For each token a FST with all the supported inflections is generated and an
intersection happen with the Index FST ( based on the field), and the
suggestion is returned.

Unfortunately a proper* query time analysis will not help .*
When we analyse the query we have the misspelled word "sexe" that is not
going to be recognised as the bad word.
Then the inflections are calculated, the FST built and the intersection
will actually produce the feared suggestion "sex" .
This because the word is in the index.

If we can't modify the index, the *Direct Spellcheck is not an option *if
my understanding is correct.

Let's see if the Index Based spellcheck can help …
Unfortunately also in this case, the auxiliary index produced is based on
the analysed form of the original field.

If you really can not re-index content I would suggest you an
implementation based on a concept similar to the AnalyzingSuggester in Solr.

Open to clarify your further questions.








2015-07-10 9:31 GMT+01:00 davidphilip cherian 
:

> Hi Kamal,
>
> Not necessarily. You can have different filters applied at index time and
> query time. (note that the order in which filters are defined matters). You
> could just add the stop filter at query time.
> Have your own custom data type defined (similar to 'text_en' that will be
> in schem.xml) and perhaps use standard/whitespace tokenizer followed by
> stop filter at query time.
>
> Tip: Use analysis tool that is available in solr admin page to further
> understand the analysis chain of data types.
>
> HTH
>
>
>
> On Fri, Jul 10, 2015 at 1:03 PM, Kamal Kishore Aggarwal <
> kkroyal@gmail.com> wrote:
>
> > Hi David,
> >
> > This one is a good suggestion. But, if add these *adult* keywords in the
> > stopwords.txt file, it will be requiring the re-indexing of these
> keywords
> > related data.
> >
> > How can I see the change instantly. Is there any other great suggestion
> > that you can suggest me.
> >
> >
> >
> >
> > On Thu, Jul 9, 2015 at 12:09 PM, davidphilip cherian <
> > davidphilipcher...@gmail.com> wrote:
> >
> > > The best bet is to use solr.StopFilterFactory.
> > > Have all such words added to stopwords.txt and add this filter to your
> > > analyzer.
> > >
> > > Reference links
> > >
> > >
> >
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StopFilterFactory
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-StopFilter
> > >
> > > HTH
> > >
> > >
> > > On Thu, Jul 9, 2015 at 11:50 AM, Kamal Kishore Aggarwal <
> > > kkroyal@gmail.com> wrote:
> > >
> > > > Hi Team,
> > > >
> > > > I am currently working with Java-1.7, Solr-4.8.1 with tomcat 7. Is
> > there
> > > > any feature by which I can refrain the following words to appear in
> > spell
> > > > suggestion.
> > > >
> > > > For example: Somebody searches for sexe, I does not want to show him
> > sex
> > > as
> > > > the spell suggestion via solr. How can I stop these type of keywords
> to
> > > be
> > > > shown in suggestion.
> > > >
> > > > Any help is appreciated.
> > > >
> > > >
> > > > Regards
> > > > Kamal Kishore
> > > > Solr Beginner
> > > >
> > >
> >
>



RE: Spell checking the synonym list?

2015-07-09 Thread Dyer, James
Ryan,

If you use index-time synonyms on the spellcheck field, this will give you what 
you want.

For instance, if the document has "lawyer" and you index both terms 
"lawyer","attorney", then the spellchecker will see that "atorney" is 1 edit 
away from an indexed term and will suggest "attorney". 

You'll need to have the same synonyms set up against the query field, but you 
have the option of making these query-time synonyms if you prefer.

James Dyer
Ingram Content Group

-Original Message-
From: Ryan Yacyshyn [mailto:ryan.yacys...@gmail.com] 
Sent: Thursday, July 09, 2015 2:28 AM
To: solr-user@lucene.apache.org
Subject: Spell checking the synonym list?

Hi all,

I'm wondering if it's possible to have spell checking performed on terms in
the synonym list?

For example, let's say I have documents with the word "lawyer" in them and
I add "lawyer, attorney" in the synonyms.txt file. Then a query is made for
the word "atorney". Is there any way to provide spell checking on this?

Thanks,
Ryan


RE: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

2015-04-14 Thread Dyer, James
Elisabeth,

Currently ConjunctionSolrSpellChecker only supports adding 
WordBreakSolrSpellchecker to IndexBased- FileBased- or DirectSolrSpellChecker.  
In the future, it would be great if it could handle other Spell Checker 
combinations.  For instance, if you had a (e)dismax query that searches 
multiple fields, to have a separate spellchecker for each of them.

But CSSC is not hardened for this more general usage, as hinted in the API doc. 
 The check done to ensure all spellcheckers use the same stringdistance object, 
I believe, is a safeguard against using this class for functionality it is not 
able to correctly support.  It looks to me that SOLR-6271 was opened to fix the 
bug in that it is comparing references on the stringdistance.  This is not a 
problem with WBSSC because this one does not support string distance at all.

What you're hoping for, however, is that the requirement for the string 
distances be the same to be removed entirely.  You could try modifying the code 
by removing the check.  However beware that you might not get the results you 
desire!  But should this happen, please, go ahead and fix it for your use case 
and then donate the code.  This is something I've personally wanted for a long 
time.

James Dyer
Ingram Content Group


-Original Message-
From: elisabeth benoit [mailto:elisaelisael...@gmail.com] 
Sent: Tuesday, April 14, 2015 7:37 AM
To: solr-user@lucene.apache.org
Subject: using DirectSpellChecker and FileBasedSpellChecker with Solr 4.10.1

Hello,

I am using Solr 4.10.1 and trying to use DirectSolrSpellChecker and
FileBasedSpellchecker in same request.

I've applied change from patch 135.patch (cf Solr-6271). I've tried running
the command "patch -p1 -i 135.patch --dry-run" but it didn't work, maybe
because the patch was a fix to Solr 4.9, so I just replaced line in
ConjunctionSolrSpellChecker

else if (!stringDistance.equals(checker.getStringDistance())) {
 throw new IllegalArgumentException(
 "All checkers need to use the same StringDistance.");
   }


by

else if (!stringDistance.equals(checker.getStringDistance())) {
throw new IllegalArgumentException(
"All checkers need to use the same StringDistance!!! 1:" +
checker.getStringDistance() + " 2: " + stringDistance);
  }

as it was done in the patch

but still, when I send a spellcheck request, I get the error

msg": "All checkers need to use the same StringDistance!!!
1:org.apache.lucene.search.spell.LuceneLevenshteinDistance@15f57db32:
org.apache.lucene.search.spell.LuceneLevenshteinDistance@280f7e08"

From error message I gather both spellchecker use same distanceMeasure
LuceneLevenshteinDistance, but they're not same instance of
LuceneLevenshteinDistance.

Is the condition all right? What should be done to fix this properly?

Thanks,
Elisabeth


RE: Solr phonetics with spelling

2015-03-10 Thread Dyer, James
Ashish,

I would not recommend using spellcheck against a phonetic-analyzed field.  
Instead, you can use  to create a separate field that is lightly 
analyzed and use the copy for spelling.  

James Dyer
Ingram Content Group


-Original Message-
From: Ashish Mukherjee [mailto:ashish.mukher...@gmail.com] 
Sent: Tuesday, March 10, 2015 7:05 AM
To: solr-user@lucene.apache.org
Subject: Solr phonetics with spelling

Hello,

Couple of questions related to phonetics -

1. If I enable the phonetic filter in managed-schema file for a particular
field, how does it affect the spell handler?

2. What is the meaning of the inject attribute within  in
managed-schema? The documentation is not very clear about it.

Regards,
Ashish


RE: Why collations are coming even I set the value of spellcheck.count to zero(0)

2015-02-18 Thread Dyer, James
I think when you set "count"/"alternativeTermCount" to zero, the defaults (10?) 
are used instead.  Instead of setting these to zero, just use 
"spellcheck=false".  These 2 parameters control suggestions, not collations.

To turn off collations, set "spellcheck.collate=false".  Also, I wouldn't set 
"maxCollationTries" as high as 100, as it could (sometimes) potentially check 
100 possibly collations against the index and that would be very slow.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Wednesday, February 18, 2015 2:37 AM
To: solr-user@lucene.apache.org
Subject: Why collations are coming even I set the value of spellcheck.count to 
zero(0)

Hi Everyone,
I have set the value of spellcheck.count = 0 and
spellcheck.alternativeTermCount = 0. Even though collations are coming when
I search any query which is misspelled. Why so?
I also set the value of spellcheck.maxCollations = 100 and
spellcheck.maxCollationTries = 100. What I know that collations are built
on suggestions. So, Have I any misunderstanding about collation or any
other configuration issue. Any help Please?


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-18 Thread Dyer, James
It will try to give you suggestions up to the number you specify, but if fewer 
are available it will not give you any more.

James Dyer
Ingram Content Group

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:40 PM
To: solr-user@lucene.apache.org
Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

Thanks James,
  I tried the same thing
spellcheck.count=10&spellcheck.alternativeTermCount=5. And I got 5
suggestions of both "life" and "hope" but not like this * The spellchecker
will try to return you up to 10 suggestions for "hope", but only up to 5
suggestions for "life". *


On Wed, Feb 18, 2015 at 1:10 AM, Dyer, James 
wrote:

> Here is an example to illustrate what I mean...
>
> - query q=text:(life AND
> hope)&spellcheck.count=10&spellcheck.alternativeTermCount=5
> - suppose at least one document in your dictionary field has "life" in it
> - also suppose zero documents in your dictionary field have "hope" in them
> - The spellchecker will try to return you up to 10 suggestions for "hope",
> but only up to 5 suggestions for "life"
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Tuesday, February 17, 2015 11:35 AM
> To: solr-user@lucene.apache.org
> Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount
>
> Hi James,
>     How can you say that "count" doesn't use
> index/dictionary then from where suggestions come.
>
> On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James <
> james.d...@ingramcontent.com>
> wrote:
>
> > See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
> > the following section, for details.
> >
> > Briefly, "count" is the # of suggestions it will return for terms that
> are
> > *not* in your index/dictionary.  "alternativeTermCount" are the # of
> > alternatives you want returned for terms that *are* in your dictionary.
> > You can set them to the same value, unless you want fewer suggestions
> when
> > the terms is in the dictionary.
> >
> > James Dyer
> > Ingram Content Group
> >
> > -Original Message-
> > From: Nitin Solanki [mailto:nitinml...@gmail.com]
> > Sent: Tuesday, February 17, 2015 5:27 AM
> > To: solr-user@lucene.apache.org
> > Subject: spellcheck.count v/s spellcheck.alternativeTermCount
> >
> > Hello Everyone,
> >   I got confusion between spellcheck.count and
> > spellcheck.alternativeTermCount in Solr. Any help in details?
> >
>


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
Here is an example to illustrate what I mean...

- query q=text:(life AND 
hope)&spellcheck.count=10&spellcheck.alternativeTermCount=5
- suppose at least one document in your dictionary field has "life" in it
- also suppose zero documents in your dictionary field have "hope" in them
- The spellchecker will try to return you up to 10 suggestions for "hope", but 
only up to 5 suggestions for "life"

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 11:35 AM
To: solr-user@lucene.apache.org
Subject: Re: spellcheck.count v/s spellcheck.alternativeTermCount

Hi James,
How can you say that "count" doesn't use
index/dictionary then from where suggestions come.

On Tue, Feb 17, 2015 at 10:29 PM, Dyer, James 
wrote:

> See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and
> the following section, for details.
>
> Briefly, "count" is the # of suggestions it will return for terms that are
> *not* in your index/dictionary.  "alternativeTermCount" are the # of
> alternatives you want returned for terms that *are* in your dictionary.
> You can set them to the same value, unless you want fewer suggestions when
> the terms is in the dictionary.
>
> James Dyer
> Ingram Content Group
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Tuesday, February 17, 2015 5:27 AM
> To: solr-user@lucene.apache.org
> Subject: spellcheck.count v/s spellcheck.alternativeTermCount
>
> Hello Everyone,
>   I got confusion between spellcheck.count and
> spellcheck.alternativeTermCount in Solr. Any help in details?
>


RE: spellcheck.count v/s spellcheck.alternativeTermCount

2015-02-17 Thread Dyer, James
See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.count and the 
following section, for details.

Briefly, "count" is the # of suggestions it will return for terms that are 
*not* in your index/dictionary.  "alternativeTermCount" are the # of 
alternatives you want returned for terms that *are* in your dictionary.  You 
can set them to the same value, unless you want fewer suggestions when the 
terms is in the dictionary.

James Dyer
Ingram Content Group

-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, February 17, 2015 5:27 AM
To: solr-user@lucene.apache.org
Subject: spellcheck.count v/s spellcheck.alternativeTermCount

Hello Everyone,
  I got confusion between spellcheck.count and
spellcheck.alternativeTermCount in Solr. Any help in details?


RE: Collations are not working fine.

2015-02-13 Thread Dyer, James
Nitin,

Can you post the full spellcheck response when you query:

q=gram_ci:"gone wthh thes wint"&wt=json&indent=true&shards.qt=/spell

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Friday, February 13, 2015 1:05 AM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi James Dyer,
  I did the same as you told me. Used
WordBreakSolrSpellChecker instead of shingles. But still collations are not
coming or working.
For instance, I tried to get collation of "gone with the wind" by searching
"gone wthh thes wint" on field=gram_ci but didn't succeed. Even, I am
getting the suggestions of wtth as *with*, thes as *the*, wint as *wind*.
Also I have documents which contains "gone with the wind" having 167 times
in the documents. I don't know that I am missing something or not.
Please check my below solr configuration:

*URL: *localhost:8983/solr/wikingram/spell?q=gram_ci:"gone wthh thes
wint"&wt=json&indent=true&shards.qt=/spell

*solrconfig.xml:*


textSpellCi

  default
  gram_ci
  solr.DirectSolrSpellChecker
  internal
  0.5
  2
  0
  5
  2
  0.9
  freq


  wordbreak
  solr.WordBreakSolrSpellChecker
  gram
  true
  true
  5





  gram_ci
  default
  on
  true
  25
  true
  1
  25
  true
  50
  50
  true


  spellcheck

  

*Schema.xml: *




   










RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
I opened LUCENE-6237 for this.  I can't promise when I or someone else will 
actually complete this, but it wouldn't be very difficult to do either.  Seeing 
your use-case, I think this would be a nice little improvement.

James Dyer
Ingram Content Group


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 3:25 PM
To: solr-user@lucene.apache.org
Subject: RE: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

Yeah that should work. Is this something you will change in the code?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185489.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
Got it.  Took a quick look at the code and I see it uses the maximum frequency 
of the terms.  And in your case, one of these terms ("holy" and "wood"), occurs 
71,000 times.  It wouldn't be too difficult to change this to use the average 
frequency of the terms or the minimum.  But currently the only options is to 
use the maximum or the sum of the frequencies.  Possibly the minimum is a 
better predictor of how relevant a suggestion is though.

James Dyer
Ingram Content Group


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 1:27 PM
To: solr-user@lucene.apache.org
Subject: RE: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

I did some testing and the order of dictionaries doesn't seem to have an
effect. They are sorted by frequency. So if mm was applied "holy wood" would
have a lower frequency and solve this problem.

  "suggestions":[
  "holywood",{
"numFound":4,
"startOffset":0,
"endOffset":8,
"origFreq":4,
"suggestion":[{
"word":"holy wood",
"freq":71828},
  {
"word":"hollywood",
"freq":2669},
  {
"word":"holyrood",
"freq":14},
  {
"word":"homewood",
"freq":737}]},
  "correctlySpelled",false,
  "collation","(holy wood)",
  "collation","hollywood"]}}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185461.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
I think the problem is when it combines suggestions from DirectSolrSpellChecker 
and WorkBreakSolrSpellChecker, it gets two lists of possiblities in edit 
distance order.  And when it combines these lists, all it does is interleave 
the 2 lists: 1 from the first list, then 1 from the 2nd list, then 1 from the 
1st, etc.  

So I think if you ran the query with just Direct, you'd see 1 list for each 
potentially misspelled word, and then if you ran the query with just WordBreak, 
you'd see a difference list for each potentially misspelled word.  And then 
when running with both spellcheckers, you'll see them interleaved 
every-other-one.

It might (or might not) depend on the order you specify the 2 spellcheckers in 
solrconfig.xml.  Maybe (not sure here) the first one is guaranteed to provide 
the first suggestion, so long as it provides at least one.  You might want to 
see if you have WordBreak specified first, and if so, then switch them.  
Because when collations are tested, it just goes through the lists, top to 
bottom and tries the various combinations until either "maxCollationTries" or 
"maxCollations" is exhausted.  And it will give you the "good" collations it 
finds in the order it finds them.

Possibly, an easy workaround is to just increase "maxCollations" by 1 more and 
then use the suggestion with the most hits.  This will be a small performance 
penalty though every time it has to find collations, as testing the 
possibilities is expensive.

James Dyer
Ingram Content Group


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 11:55 AM
To: solr-user@lucene.apache.org
Subject: RE: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

James,

That is very useful information. I tested it and can confirm that disabling
spellcheck in warmer solves core reload problem. 

Now with my use case I'm not trying to spellcheck and correct a whitespace.
If "holy wood" was queried with a mm of 100% it would have fewer hits then
hollywood and this would then be the best correction.

Is there a way to do this?

 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185423.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
Okke,

There is no way to have it both correct spelling and whitespace in the same 
correction.  So unfortunately there is no easy fix for your use-case.  The old 
shingle method of correcting whitespace might work for this, but it might also 
introduce other problems.

I saw your comments on SOLR-5386 and I appreciate your reminder about that 
issue.  The easiest workaround is to put "spellcheck.maxCollationTries=0" in 
all of your warming queries.  Better yet, just use "spellcheck=false" in the 
warming queries because having spellcheck enabled in the warming queries serves 
no purpose but to make searchers take longer to open.

James Dyer
Ingram Content Group

-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 9:51 AM
To: solr-user@lucene.apache.org
Subject: RE: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

Thank you for that answer James.

Increasing spellcheck.count did the trick.

Funny result for query "holywood" the suggestion is "holy wood" instead of
"hollywood". Eventhough I have a mm of 100%.

Any way to fix that?

BTW when using maxCollationTries Solr hangs on core reload. Apparantly an
old bug, but hard to find as logs show nothing.

Below the results for "holywood": 

"suggestions":[
  "holywood",{
"numFound":4,
"startOffset":0,
"endOffset":8,
"origFreq":4,
"suggestion":[{
"word":"holy wood",
"freq":70559},
  {
"word":"hollywood",
"freq":2649},
  {
"word":"holyrood",
"freq":14},
  {
"word":"homewood",
"freq":737}]},
  "correctlySpelled",false,
  "collation","(holy wood)"]}}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352p4185368.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: alternativeTermCount and WordBreakSolrSpellChecker combination not working

2015-02-10 Thread Dyer, James
Okke,

My first guess is that the additional results from the word break spellchecker 
is causing additional per-term results and the correct answer is not making the 
list.  So you might need to increase "spellcheck.count" and/or 
"spellcheck.alternativeTermCount" .

My second guess is that the correct answer is still in the per-term results but 
low enough down now that wordbreak is producing additional results, that the 
correct answer never gets tested as a possible collation.  In this case, if 
you're already getting your maximum collations back, just not the one you 
wanted, then increase "spellcheck.maxCollations".  Otherwise, try increasing 
"spellcheck.maxCollationTries".

If this doesn't help, then go ahead and post the pertinent sections of 
solrconfig.xml, schema.xml, and show what you change when adding wordbreak.  
Then also include before & after query url's with the full spellcheck responses.

James Dyer
Ingram Content Group


-Original Message-
From: O. Klein [mailto:kl...@octoweb.nl] 
Sent: Tuesday, February 10, 2015 8:49 AM
To: solr-user@lucene.apache.org
Subject: alternativeTermCount and WordBreakSolrSpellChecker combination not 
working

Because of a lot of misspellings in content I am using alternativeTermCount
and maxResultsForSuggest to get suggestions even if terms are in index.
However when adding wordbreak dictionary the collation that was given before
is now empty.

Is there a way to make this work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/alternativeTermCount-and-WordBreakSolrSpellChecker-combination-not-working-tp4185352.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Collations are not working fine.

2015-02-10 Thread Dyer, James
Nitin,

I have not tested using shingles with collations but my guess here is the 
collation feature is not going to work as expected with a shingled index.  So 
try re-indexing without the shingles and see if it gives you more intuitive 
results.  If that helps, and if you want to still correct whitespace errors, 
then consider using WordBreakSolrSpellChecker instead of shingles (the main 
solr example demonstrates how).  

Beyond that, without some queries *and* the full spellcheck response, and an 
explanation as to why you feel the spellcheck response is wrong, I'm not sure 
you will get much more help with this.

Here is what "hits" in the collation response means:

> By "hits", it means if you replaced the "q" parameter on the original
> query but left everything else the same (filters, etc), this is how many
> results you would get.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Monday, February 09, 2015 11:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Collations are not working fine.

Hi *James Dyer*
*,*
   I have not done stemming and my
spellcheck.alternativeTermCount is set equals to spellcheck.count. Below, I
have pasted my solrconfig.xml and schema.xml configuration.


*URL: *
localhost:8983/solr/wikingram/spell?q=gram_ci:"deligh"&wt=json&indent=true&shards.qt=/spell

*solrconfig.xml:*


textSpellCi

  default
  gram_ci
  solr.DirectSolrSpellChecker
  internal
  0.5
  2
  0
  5
  2
  0.9
  freq





  gram_ci
  default
  on
  true
  25
  true
  1
  25
  true
  50
  50
  true


  spellcheck

  

*Schema.xml: *




   

    









On Tue, Feb 10, 2015 at 1:23 AM, Dyer, James 
wrote:

> Nitin,
>
> My guess here is that your spellcheck field is a field that has stemming.
> This might be why you get a collation that return "wind" even though the
> user queried "wnd" and it does not get any suggestions.  Perhaps "wnd" is
> stemmed the same as "wind" ?  (Spellcheck usually works best if you
> "copyField" the query field to something that is tokenized but not heavily
> analyzed, and use the copy as the spellcheck dictionary.)
>
> The other problem might be because "wind" is in the index but you are not
> using "spellcheck.alternativeTermCount".  If you set this to the same value
> as "spellcheck.count", then it will give suggestions even when words exist
> in the index.
>
> By "hits", it means if you replaced the "q" parameter on the original
> query but left everything else the same (filters, etc), this is how many
> results you would get.
>
> If you need more help, please include in your message the pertinent
> sections of solrconfig.xml, schema.xml and also the full query url you are
> using and the full spellcheck response.
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: Nitin Solanki [mailto:nitinml...@gmail.com]
> Sent: Monday, February 09, 2015 7:47 AM
> To: solr-user@lucene.apache.org
> Subject: Collations are not working fine.
>
> I am working on spell checking in Solr. I have implemented Suggestions and
> collations in my spell checker component.
>
> Most of the time collations work fine but in few case it fails.
>
> *Working*:
> I tried query:*gone wthh thes wnd*: In this "wnd" doesn't give suggestion
> "wind" but collation is coming right = "gone with the wind", hits = 117
>
>
> *Not working:*
> But when I tried query: *gone wthh thes wint*: In this "wint" does give
> suggestion "wind" but collation is not coming right. Instead of gone with
> the wind it gives gone with the west, hits = 1.
>
> And I want to also know what is *hits* in collations.
>


RE: Collations are not working fine.

2015-02-09 Thread Dyer, James
Nitin,

My guess here is that your spellcheck field is a field that has stemming.  This 
might be why you get a collation that return "wind" even though the user 
queried "wnd" and it does not get any suggestions.  Perhaps "wnd" is stemmed 
the same as "wind" ?  (Spellcheck usually works best if you "copyField" the 
query field to something that is tokenized but not heavily analyzed, and use 
the copy as the spellcheck dictionary.)

The other problem might be because "wind" is in the index but you are not using 
"spellcheck.alternativeTermCount".  If you set this to the same value as 
"spellcheck.count", then it will give suggestions even when words exist in the 
index.

By "hits", it means if you replaced the "q" parameter on the original query but 
left everything else the same (filters, etc), this is how many results you 
would get.

If you need more help, please include in your message the pertinent sections of 
solrconfig.xml, schema.xml and also the full query url you are using and the 
full spellcheck response.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Monday, February 09, 2015 7:47 AM
To: solr-user@lucene.apache.org
Subject: Collations are not working fine.

I am working on spell checking in Solr. I have implemented Suggestions and
collations in my spell checker component.

Most of the time collations work fine but in few case it fails.

*Working*:
I tried query:*gone wthh thes wnd*: In this "wnd" doesn't give suggestion
"wind" but collation is coming right = "gone with the wind", hits = 117


*Not working:*
But when I tried query: *gone wthh thes wint*: In this "wint" does give
suggestion "wind" but collation is not coming right. Instead of gone with
the wind it gives gone with the west, hits = 1.

And I want to also know what is *hits* in collations.


RE: Solr 4.9 Calling DIH concurrently

2015-02-04 Thread Dyer, James
Yes, that is what I mean.  In my case, for each "/dataimport" in the "defaults" 
section, I also put something like this:

1

...and then reference it in the data-config.xml with 
${dataimporter.request.currentPartition} .  This way the same data-config.xml 
can be used for each handler.

As I said before, while this works (and this is what I do in production), it 
seems generally preferable to write code for this use-case.

James Dyer
Ingram Content Group


-Original Message-
From: meena.sri...@mathworks.com [mailto:meena.sri...@mathworks.com] 
Sent: Tuesday, February 03, 2015 4:24 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr 4.9 Calling DIH concurrently

Thanks James. After lots of search and reading now I think I understand a
little from your answer.
If I understand correctly my solrconfig.xml will have section like this



  db-data-config1.xml

  



  db-data-config1.xml

  

.
.
.
.
.


  db-data-config1.xml

  


Is this correct. If its true then I can call 8 such requests 
8

and solr will commit data when the 
100

of 100MB is reached per thread.

Thanks again for your time.

Thanks
Meena






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-9-Calling-DIH-concurrently-tp4183744p4183750.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Solr 4.9 Calling DIH concurrently

2015-02-03 Thread Dyer, James
DIH is single-threaded.  There was once a threaded option, but it was buggy and 
subsequently was removed.  

What I do is partition my data and run multiple dih request handlers at the 
same time.  It means redundant sections in solrconfig.xml and its not very 
elegant but it works.

For instance, for a sql query, I add something like this: "where mod(id, 
${dataimporter.request.numPartitions})=${dataimporter.request.currentPartition}".

I think, though, most users who want to make the most out of multithreading 
write their own program and use the solrj api to send the updates.

James Dyer
Ingram Content Group


-Original Message-
From: meena.sri...@mathworks.com [mailto:meena.sri...@mathworks.com] 
Sent: Tuesday, February 03, 2015 3:43 PM
To: solr-user@lucene.apache.org
Subject: Solr 4.9 Calling DIH concurrently

Hi 

I am using solr 4.9 and need to index million of documents from database. I
am using DIH and sending request to fetch by ids. Is there a way to run
multiple indexing threads, concurrently in DIH. 
I want to take advantage of 

parameter. How do I do it. I am just invoking DIH handler using solrj
HttpSolrServer.
And issue requests sequentially.
http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=100&minId=1

http://localhost:8983/solr/db/dataimport?command=full-import&clean=false&maxId=201&minId=101





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-4-9-Calling-DIH-concurrently-tp4183744.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-02-02 Thread Dyer, James
1 is not too small a value, in fact, it’s the default value.  Of course the 
more combinations it has to try, the slower it will run, but the penalty is 
small enough you're not going to notice.  The only problem you might have is if 
you use a lot of 1-character stop-words, you might get these stop-words back as 
nonsense suggestions (assuming you do not filter stop words for your spelling 
dictionary field, but do remove them on the query field).  But I'd try it if I 
were you.  It's probably the best option in your case.

James Dyer
Ingram Content Group

-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Friday, January 30, 2015 5:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

Nice! It works indeed!
Sorry I didn't noticed that before.

But what if I want the same for the iPhone?
I mean suggesting "I phone" for users who searched "iphone". Minbreaklength
of 1 is just too small isn't it?

Il sabato 31 gennaio 2015, Dyer, James-2 [via Lucene] <
ml-node+s472066n4183176...@n3.nabble.com> ha scritto:

> You need to decrease this to at least 2 because the length of "go" is <3.
>
> 3
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: fabio.bozzo [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4183176&i=0>]
> Sent: Wednesday, January 28, 2015 4:55 PM
> To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4183176&i=1>
> Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker
>
> I tried increasing my alternativeTermCount to 5 and enable extended
> results.
> I also added a filter fq parameter to clarify what I mean:
>
> *Querying for "go pro" is good:*
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 2,
> "params": {
>   "q": "go pro",
>   "indent": "true",
>   "fq": "marchio:\"GO PRO\"",
>   "rows": "1",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "_": "1422485581792"
> }
>   },
>   "response": {
> "numFound": 27,
> "start": 0,
> "docs": [
>   {
> "codice_produttore_s": "DK00150020",
> "codice_s": "5.BAT.27407",
> "id": "27407",
> "marchio": "GO PRO",
> "barcode_interno_s": "185323000958",
> "prezzo_acquisto_d": 16.12,
> "data_aggiornamento_dt": "2012-06-21T00:00:00Z",
> "descrizione": "BATTERIA GO PRO HERO ",
> "prezzo_vendita_d": 39.9,
> "categoria": "Batterie",
> "_version_": 1491583424191791000
>   },
>
>  
>
> ]
>   },
>   "spellcheck": {
> "suggestions": [
>   "go pro",
>   {
> "numFound": 1,
> "startOffset": 0,
> "endOffset": 6,
> "origFreq": 433,
> "suggestion": [
>   {
> "word": "gopro",
> "freq": 2
>   }
> ]
>   },
>   "correctlySpelled",
>   false,
>   "collation",
>   [
> "collationQuery",
> "gopro",
> "hits",
> 3,
> "misspellingsAndCorrections",
> [
>   "go pro",
>   "gopro"
> ]
>   ]
> ]
>   }
> }
>
> While querying for "gopro" is not:
>
> {
>   "responseHeader": {
> "status": 0,
> "QTime": 6,
> "params": {
>   "q": "gopro",
>   "indent": "true",
>   "fq": "marchio:\"GO PRO\"",
>   "rows": "1",
>   "wt": "json",
>   "spellcheck.extendedResults": "true",
>   "_": "1422485629480"
> }
>   },
>   "response": {
> "numFound": 3,
> "start": 0,
> "docs": [
>   {
> "codice_produttore_s": "DK0030010",

RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-30 Thread Dyer, James
You need to decrease this to at least 2 because the length of "go" is <3.

3

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Wednesday, January 28, 2015 4:55 PM
To: solr-user@lucene.apache.org
Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

I tried increasing my alternativeTermCount to 5 and enable extended results.
I also added a filter fq parameter to clarify what I mean:

*Querying for "go pro" is good:*

{
  "responseHeader": {
"status": 0,
"QTime": 2,
"params": {
  "q": "go pro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485581792"
}
  },
  "response": {
"numFound": 27,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK00150020",
"codice_s": "5.BAT.27407",
"id": "27407",
"marchio": "GO PRO",
"barcode_interno_s": "185323000958",
"prezzo_acquisto_d": 16.12,
"data_aggiornamento_dt": "2012-06-21T00:00:00Z",
"descrizione": "BATTERIA GO PRO HERO ",
"prezzo_vendita_d": 39.9,
"categoria": "Batterie",
"_version_": 1491583424191791000
  },

 

]
  },
  "spellcheck": {
"suggestions": [
  "go pro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 6,
"origFreq": 433,
"suggestion": [
  {
"word": "gopro",
"freq": 2
  }
]
  },
  "correctlySpelled",
  false,
  "collation",
  [
"collationQuery",
"gopro",
"hits",
3,
"misspellingsAndCorrections",
[
  "go pro",
  "gopro"
]
  ]
]
  }
}

While querying for "gopro" is not:

{
  "responseHeader": {
"status": 0,
"QTime": 6,
"params": {
  "q": "gopro",
  "indent": "true",
  "fq": "marchio:\"GO PRO\"",
  "rows": "1",
  "wt": "json",
  "spellcheck.extendedResults": "true",
  "_": "1422485629480"
}
  },
  "response": {
"numFound": 3,
"start": 0,
"docs": [
  {
"codice_produttore_s": "DK0030010",
"codice_s": "5.VID.39163",
"id": "38814",
"marchio": "GO PRO",
"barcode_interno_s": "818279012477",
"prezzo_acquisto_d": 150.84,
"data_aggiornamento_dt": "2014-12-24T00:00:00Z",
"descrizione": "VIDEOCAMERA GO-PRO HERO 3 WHITE NUOVO SLIM",
"prezzo_vendita_d": 219,
"categoria": "Fotografia",
"_version_": 1491583425479442400
  },

]
  },
  "spellcheck": {
"suggestions": [
  "gopro",
  {
"numFound": 1,
"startOffset": 0,
"endOffset": 5,
"origFreq": 2,
"suggestion": [
  {
"word": "giro",
"freq": 6
  }
]
  },
  "correctlySpelled",
  false
]
  }
}

---

I'd like "go pro" as a suggestion for "gopro" too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182735.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-28 Thread Dyer, James
Try using something larger than 2 for alternativeTermCount.  5 is probably ok 
here.  If that doesn't work, then post the exact query you are using and the 
full extended spellcheck results.

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 3:59 PM
To: solr-user@lucene.apache.org
Subject: RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

I have this in my solrconfig:




explicit
10
catch_all

on
default
wordbreak
false
5
2
100
true
true
5
3



spellcheck




Although my spellchecker does work, suggesting for misspelled terms, it
doesn't work for the example above:
I mean terms which are both valid, ("gopro"=100 docs; "go pro"=150 'others'
docs).
I want to suggest "gopro" for "go pro" search term and vice-versa, even if
they're both perfectly valid terms in the index. Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172p4182398.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: SpellingQueryConverter and query parsing

2015-01-27 Thread Dyer, James
Having worked with the spellchecking code for the last few years, I've often 
wondered the same thing, but I never looked seriously into it.  I'm sure 
there's probably some serious hurdles, hence the Query Converter.  The easy 
thing to do here is to use "spellcheck.q", and then pass in space-delimited 
keywords.  This bypasses the query converter entirely for custom situations 
like yours.

But please, if you find a way to plug the actual query parser into spellcheck, 
consider opening a jira & contributing the code, even if what you end up with 
isn't in a final polished state for general use.

James Dyer
Ingram Content Group


-Original Message-
From: Scott Stults [mailto:sstu...@opensourceconnections.com]
Sent: Tuesday, January 27, 2015 11:26 AM
To: solr-user@lucene.apache.org
Subject: SpellingQueryConverter and query parsing

Hello!

SpellingQueryConverter "parses" the incoming query in sort of a quick and
dirty way with a regular expression. Is there a reason the query string
isn't parsed with the _actual_ parser, if one was configured for that type
of request? Even better, could the parsed query object be added to the
response in some way so that the query wouldn't need to be parsed twice?
The individual terms could then be visited and substituted in-place without
needing to worry about preserving the meaning of operators in the query.

The motive in my question is, I may need to implement a QueryConverter
because I'm using a custom parser, and using that parser in the
QueryConverter itself seems like the right thing to do. That wasn't done
though in SpellingQueryConverter, so I wan't to find out why before I go
blundering into a known minefield.


Thanks!
-Scott


RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread Dyer, James
You need to set "spellcheck.alternativeTermCount" to a value greater than zero. 
 Without it, spellcheck will never suggest for something in the index.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-Thespellcheck.alternativeTermCountParameter

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 9:57 AM
To: solr-user@lucene.apache.org
Subject: Re: Suggesting broken words with solr.WordBreakSolrSpellChecker

Good, I'll try.
But imagine I have 100 documents containing "go pro" and 150 documents
containing "gopro".
Suggestions of the "other" term do not come up in any case.

2015-01-27 16:21 GMT+01:00 Dyer, James-2 [via Lucene] <
ml-node+s472066n4182254...@n3.nabble.com>:

> I think the word break spellchecker will do what you want.  But, if I were
> you, I'd dial back "maxChanges" to 1 or 2.  You don't want it slicing a
> word into 10 parts or trying to combine 10 adjacent words.  You also need
> the "minBreakLength" to be no more than 2, if you want it to break "go"
> (length=2) off of "gopro".
>
> James Dyer
> Ingram Content Group
>
>
> -Original Message-
> From: fabio.bozzo [mailto:[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=4182254&i=0>]
> Sent: Tuesday, January 27, 2015 2:58 AM
> To: [hidden email] <http:///user/SendEmail.jtp?type=node&node=4182254&i=1>
> Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker
>
> I indexed an electronics e-commerce product catalog.
>
> This is a typical document from my collection:
>
>
> "docs": [
>   {
> "prezzo_vendita_d": 39.9,
> "codice_produttore_s": "DK00150020",
> "codice_s": "5.BAT.27407",
> "descrizione": "BATTERIA GO PRO HERO ",
> "barcode_interno_s": "185323000958",
> "categoria": "Batterie",
> "prezzo_acquisto_d": 16.12,
> "marchio": "GO PRO",
> "data_aggiornamento_dt": "2012-06-21T00:00:00Z",
> "id": "27407",
> "_version_": 1491274123542790100
>   },
>   {
> "codice_produttore_s": "DK0052043",
> "codice_s": "05.SP.42760",
> "id": "42760",
> "marchio": "SP GADGETS",
> "barcode_interno_s": "4028017520430",
> "prezzo_acquisto_d": 34.4,
> "data_aggiornamento_dt": "2014-11-04T00:00:00Z",
> "descrizione": "SP POS CASE GOPRO OLIVE LARGE",
> "prezzo_vendita_d": 59.95,
> "_version_": 1491274406746390500
>   }
> ...]
> I want my spellchecker to suggest "go pro" to users searching "gopro"
> (without whitespace).
>
> I also want users searching "go pro" to find "gopro" products, too.
>
> Here's a little bit of my configuration:
>
> *schema.xml*
> 
>  stored="true"/>
>  stored="true"/>
>  stored="true"/>
>
>  indexed="true"
> stored="false" multiValued="true" />
>  stored="false"
> multiValued="true" />
>
> 
> 
> 
> 
>
> 
> 
> 
> 
> ...
>
>  positionIncrementGap="100">
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>  ignoreCase="true"
> articles="lang/contractions_it.txt"/>
> 
> 
>  words="lang/stopwords_it.txt" format="snowball" />
> 
> 
> 
> 
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1" />
>
>  ignoreCase="true"
> articles="lang/contractions_it.txt"/>
> 
> 
>  words="lang/stopwords_it.txt&qu

RE: Stop word suggestions are coming when I indexed sentence using ShingleFilterFactory

2015-01-27 Thread Dyer, James
Can you give a little more information as to how you have the spellchecker 
configured in solrsonfig.xml?  Also, it would help if you showed a query and 
the spell check response and then explain what you wanted it to return vs what 
it actually returned.  

My guess is that the stop words you mention exist in your spelling index and 
you're not using the "alternativeTermCount" parameter, which tells it to 
suggest for terms that exist in the index.

I take it also you're using shingles to get word-break suggestions?  You might 
have better luck with this using WordBreakSolrSpellchecker instead of shingles.

James Dyer
Ingram Content Group


-Original Message-
From: Nitin Solanki [mailto:nitinml...@gmail.com] 
Sent: Tuesday, January 27, 2015 5:06 AM
To: solr-user@lucene.apache.org
Subject: Stop word suggestions are coming when I indexed sentence using 
ShingleFilterFactory

Hi,
  I am getting the suggestion of both correct words and misspell
words but not getting, stop words suggestions. Why? Even I am not using
solr.StopFilterFactory.


Schema.xml :

**


   




 
 








RE: Suggesting broken words with solr.WordBreakSolrSpellChecker

2015-01-27 Thread Dyer, James
I think the word break spellchecker will do what you want.  But, if I were you, 
I'd dial back "maxChanges" to 1 or 2.  You don't want it slicing a word into 10 
parts or trying to combine 10 adjacent words.  You also need the 
"minBreakLength" to be no more than 2, if you want it to break "go" (length=2) 
off of "gopro".  

James Dyer
Ingram Content Group


-Original Message-
From: fabio.bozzo [mailto:f.bo...@3-w.it] 
Sent: Tuesday, January 27, 2015 2:58 AM
To: solr-user@lucene.apache.org
Subject: Suggesting broken words with solr.WordBreakSolrSpellChecker

I indexed an electronics e-commerce product catalog.

This is a typical document from my collection:


"docs": [
  {
"prezzo_vendita_d": 39.9,
"codice_produttore_s": "DK00150020",
"codice_s": "5.BAT.27407",
"descrizione": "BATTERIA GO PRO HERO ",
"barcode_interno_s": "185323000958",
"categoria": "Batterie",
"prezzo_acquisto_d": 16.12,
"marchio": "GO PRO",
"data_aggiornamento_dt": "2012-06-21T00:00:00Z",
"id": "27407",
"_version_": 1491274123542790100
  },
  {
"codice_produttore_s": "DK0052043",
"codice_s": "05.SP.42760",
"id": "42760",
"marchio": "SP GADGETS",
"barcode_interno_s": "4028017520430",
"prezzo_acquisto_d": 34.4,
"data_aggiornamento_dt": "2014-11-04T00:00:00Z",
"descrizione": "SP POS CASE GOPRO OLIVE LARGE",
"prezzo_vendita_d": 59.95,
"_version_": 1491274406746390500
  }
...]
I want my spellchecker to suggest "go pro" to users searching "gopro"
(without whitespace).

I also want users searching "go pro" to find "gopro" products, too.

Here's a little bit of my configuration:

*schema.xml*

















...




























*solr-config.xml*



explicit
10
catch_all

on
default
wordbreak
false
5
2
5
true
true
5
3



spellcheck



...


text_general


default
catch_all_original
solr.DirectSolrSpellChecker
internal
0.5
2  
1
5
4
0.01



wordbreak
solr.WordBreakSolrSpellChecker  
catch_all_original
true
true
10
3





*Is the spellchecker the right solution or is this the case for something
else, like the "more like this" feature?*

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Suggesting-broken-words-with-solr-WordBreakSolrSpellChecker-tp4182172.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: can't make sense of spellchecker results when using techproducts example

2015-01-09 Thread Dyer, James
Chris,

- DirectSpellChecker has a setting for "minPrefix" which the techproducts 
example sets to 1 (also the default).  So it will never try to correct the 
first character.  I think this is both a performance optimization and is based 
on the assumption that we rarely misspell the first character.  This is why it 
will not  correct "hell" to "dell".  I think it will allow you to set this to 
0, if you want your sample query to work.

- The "maxCollationTries" feature re-writes "q" / "spellcheck.q", and then 
using all the other parameters, queries internally to see if there any hits.  
This doesn't play very well when "q.op=OR" / "mm=1".  So when you see a 
collation like "here ultrasharp" / "heat ..." etc, you see it is indeed getting 
some hits.  So it considers it a valid query re-write, despite the absurdity.  
We could improve this example config by adding 
"spellcheck.collateParam.q.op=AND" to the defaults.  (When using dismax, you 
would add "spellcheck.collateParam.mm=100%")  Also, while the "collateParam" 
functionality is in the old Solr wiki, it doesn't seem to be in the reference 
manual, so we probably should add it as this would be pretty important for a 
lot of users.

- Unless using the legacy IndexBasedSpellChecker / FileBasedSpellchecker, you 
need not use "spellcheck.build".  Its a no-op for both Direct and WordBreak, as 
these do not use sidecar indexes.

So without changing the config, these queries illustrate the spellchecker 
pretty well, including the word-break functionality.

http://localhost:8983/solr/techproducts/spell?spellcheck.q=dzll+ultra%20sharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND
http://localhost:8983/solr/techproducts/spell?spellcheck.q=dellultrasharp&df=text&spellcheck=true&spellcheck.collateParam.q.op=AND

Spellcheck has a lot of gotchas, and I would wish we could dream up a way to 
make it easy for people.  I remember it being a struggle for me when I was a 
new user, and I know we get lots of questions on the user-list about it.

My apologies to you for not answering this sooner.

James Dyer
Ingram Content Group


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Wednesday, December 17, 2014 6:49 PM
To: solr-user@lucene.apache.org
Subject: can't make sense of spellchecker results when using techproducts 
example


Ok, so i've been working on updating hte ref guide to account for hte new 
way to run the "examples" in 5.0.

The spell checking page...

https://cwiki.apache.org/confluence/display/solr/Spell+Checking

...has some examples that loosely corroloate to the "techproducts" 
example, but even if you ignore the specifics of those examples, i need 
help understanding the basic behavior of hte spellchecker as configured in 
the techproducts

Assuming you run this...

bin/solr -e techproducts

with that example running & those docs indexed, this URL gives me 
results i can't explain...

http://localhost:8983/solr/techproducts/spell?spellcheck.q=hell+ultrashar&df=text&spellcheck=true&spellcheck.build=true

(see below)

1) "dell" is not listed as a possible suggestion for for "hell" (even if 
the dictionary thinks "hold" is a better suggestion, why isn't "dell" even 
included in the list of possibilities?

2) in the "collation" section, i can't make any sense of what these 
results mean -- how is "hello ultrasharp" a suggested collationQuery when 
*none* of the example docs contain both "hello" and "ultrasharp" ?

http://localhost:8983/solr/techproducts/select?df=text&q=%2Bhello+%2Bultrasharp


So WTF is up with these spell check results?






   0
   15

build



   
 
   6
   0
   4
   0
   
 
   hello
   1
 
 
   here
   2
 
 
   heat
   1
 
 
   hold
   1
 
 
   html
   1
 
 
   héllo
   1
 
   
 
 
   1
   5
   14
   0
   
 
   ultrasharp
   1
 
   
 
   
   false
   
 
   hello ultrasharp
   2
   
 hello
 ultrasharp
   
 
 
   here ultrasharp
   3
   
 here
 ultrasharp
   
 
 
   heat ultrasharp
   2
   
 heat
 ultrasharp
   
 
 
   hold ultrasharp
   2
   
 hold
 ultrasharp
   
 
 
   html ultrasharp
   2
   
 html
 ultrasharp
   
 
   








-Hoss
http://www.lucidworks.com/



RE: Multiword mispellings

2014-12-18 Thread Dyer, James
Matt,

Unfortunately this kind of correction is not supported.  The word break spell 
checker works independently from the distance-based spellcheckers so it cannot 
correct both whitespace problems and other misspellings together.  

If you really need this, then you'll need to go with the shingle approach where 
you create your spellcheck field with both the base terms and also shingles 
(adjacent terms combined as 1 term).  In this case, "rock piont" would be 
considered a single term and the string difference would be 2, with one 
insertion (the space) and one transposition.  I believe there is a field 
analyzer out there that will do this for you.  I think you're supposed to set 
it up for both at index time (to catch when the user omits whitespace) and 
query time (to catch when the user adds whitespace).

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Thursday, December 18, 2014 2:40 PM
To: solr-user@lucene.apache.org
Subject: Multiword mispellings

Is it possible for Solr's SpellCheckComponent to suggest "Rockpoint" if the
user mistypes "Rock piont". Currently I have it making the correct
suggestions when I have "Rockpiont" or "Rock point" but not the example I
gave. Here are the relevant parts of my config files:

https://gist.github.com/halogenandtoast/c7f9335f7fa94f7b03d8


RE: Spellchecker delivers far too few suggestions

2014-12-18 Thread Dyer, James
Martin,

If you would like to get suggestions even for terms occurring in the index, set 
"spellcheck.alternativeTermCount" to a value >0 .  You can use the same value 
as for "spellcheck.count", or a lower value if you want fewer results than for 
terms not in the index.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-The{{spellcheck.alternativeTermCount}}Parameter

With this, you might also want to set "spellcheck.maxResultsForSuggest" to a 
value >0.  This will prevent the spellchecker from doing work even when enough 
results returned that you wouldn't want to suggest anything to the user.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-The{{spellcheck.maxResultsForSuggest}}Parameter

Used with the "maxCollationTries" parameter, you should be getting fairly good 
"did-you-mean"-style suggestions.

See 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking#SpellChecking-The{{spellcheck.maxCollationTries}}Parameter

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Martin Dietze [mailto:mdie...@gmail.com] 
Sent: Thursday, December 18, 2014 3:02 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecker delivers far too few suggestions

On 17 December 2014 at 18:08, Erick Erickson  wrote:
> This is seeming like a puzzler...

I’ve got to the point that I do get suggestions if I find no document
at all. The problem was seemingly caused by the way I quoted my search
queries.

Still I don’t get suggestions for terms that are in the index. For
instance, if I create a document that contains the term “bnak”, I
would like to display a result like: “found one occurrence of ‘bnak’,
but did you mean: ”.

Is there a setting I’ve missed?


-- 
-- mdie...@gmail.com --/-- mar...@the-little-red-haired-girl.org 
- / http://herbert.the-little-red-haired-girl.org / -



RE: WordBreakSolrSpellChecker Usage

2014-12-16 Thread Dyer, James
Matt,

Seeing the response, my guess is you have "point" in your index, and that it 
has a higher frequency than "rockpoint".  By default the spellchecker will 
never try to correct something that exists in your index.  Adding 
"spellcheck.onlyMorePopular=true" might help, but only if the correction has a 
higher frequency than the original.  Try using 
"spellcheck.alternativeTermCount=n" instead of 
"spellcheck.onlyMorePopular=true".  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
for more information.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Monday, December 15, 2014 10:23 AM
To: solr-user@lucene.apache.org
Subject: Re: WordBreakSolrSpellChecker Usage

I think you were right about maxChanges, that does seem get rid of the
ridiculous values. However I don't seem to be getting anything reasonable.
Most variations look something like:

http://localhost:8982/solr/development/select?q=Rock+point&fq=type%3ACompany&wt=ruby&indent=true&defType=edismax&qf=name_text&stopwords=true&lowercaseOperators=true&spellcheck=true&spellcheck.count=20&spellcheck.onlyMorePopular=true&spellcheck.extendedResults=true&spellcheck.collate=true&spellcheck.maxCollations=1&spellcheck.maxCollationTries=10&spellcheck.accuracy=0.5

{
  'responseHeader'=>{
'status'=>0,
'QTime'=>20},
  'response'=>{'numFound'=>0,'start'=>0,'docs'=>[]
  },
  'spellcheck'=>{
'suggestions'=>[
  'rock',{
'numFound'=>5,
'startOffset'=>0,
'endOffset'=>4,
'origFreq'=>3,
'suggestion'=>[{
'word'=>'rocky',
'freq'=>3},
  {
'word'=>'brook',
'freq'=>6},
  {
'word'=>'york',
'freq'=>460},
  {
'word'=>'oak',
'freq'=>7},
  {
'word'=>'boca',
'freq'=>3}]},
  'correctlySpelled',false]}}


I'm going to post both my solrconfig.xml and schema.xml because maybe
I'm just doing something crazy. They can both be found here:
https://gist.github.com/halogenandtoast/76fd5dcfae1c4edeba30


On Thu, Dec 11, 2014 at 1:19 PM, Dyer, James 
wrote:
>
> Matt,
>
> There is no exact number here, but I would think most people would want
> "count" to be maybe 10-20.  Increasing this incurs a very small performance
> penalty for each term it generates suggestions for, but you probably won't
> notice a difference.  For "maxCollationTries", 5 is a reasonable number but
> you might see improved collations if this is also perhaps 10.  With this
> one, you get a much larger performance penalty, but only when it need to
> try more combinations to return the "maxCollations".  In your case you have
> this at 5 also, right?  I would reduce this to the maximum number of
> re-written queries your application or users is actually going to use.  In
> a lot of cases, 1 is the right number here.  This would improve performance
> for you in some cases.
>
> Possibly the reason “Rock point” > “Rockpoint” is failing is because you
> have "maxChanges" set to 10.  This tells it you are willing for it to break
> a word into 10 separate parts, or to combine up to 10 adjacent words into
> 1.  Having taken a quick glance at the code, I think what is happening is
> it is trying things like "r ock p oint" and "r o ck p o int", etc and never
> getting to your intended result.  In a typical scenario I would set
> "maxChanges" to 1-3, and often 1 is probably the most appropriate value
> here.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> Sent: Thursday, December 11, 2014 11:34 AM
> To: solr-user@lucene.apache.org
> Subject: Re: WordBreakSolrSpellChecker Usage
>
> Is there a suggested value for this. I bumped them up to 20 and still
> nothing has seemed to change.
>
> On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James  >
> wrote:
>
> > My first guess here, is seeing it works some of the time but not others,
> > is that these values are too low:
> >
> > 5
> > 5
> >
> > You know spellcheck.count is too low if the suggestion you want is not in
> > the &qu

RE: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Dyer, James
Matt,

There is no exact number here, but I would think most people would want "count" 
to be maybe 10-20.  Increasing this incurs a very small performance penalty for 
each term it generates suggestions for, but you probably won't notice a 
difference.  For "maxCollationTries", 5 is a reasonable number but you might 
see improved collations if this is also perhaps 10.  With this one, you get a 
much larger performance penalty, but only when it need to try more combinations 
to return the "maxCollations".  In your case you have this at 5 also, right?  I 
would reduce this to the maximum number of re-written queries your application 
or users is actually going to use.  In a lot of cases, 1 is the right number 
here.  This would improve performance for you in some cases.

Possibly the reason “Rock point” > “Rockpoint” is failing is because you have 
"maxChanges" set to 10.  This tells it you are willing for it to break a word 
into 10 separate parts, or to combine up to 10 adjacent words into 1.  Having 
taken a quick glance at the code, I think what is happening is it is trying 
things like "r ock p oint" and "r o ck p o int", etc and never getting to your 
intended result.  In a typical scenario I would set "maxChanges" to 1-3, and 
often 1 is probably the most appropriate value here.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Thursday, December 11, 2014 11:34 AM
To: solr-user@lucene.apache.org
Subject: Re: WordBreakSolrSpellChecker Usage

Is there a suggested value for this. I bumped them up to 20 and still
nothing has seemed to change.

On Thu, Dec 11, 2014 at 9:42 AM, Dyer, James 
wrote:

> My first guess here, is seeing it works some of the time but not others,
> is that these values are too low:
>
> 5
> 5
>
> You know spellcheck.count is too low if the suggestion you want is not in
> the "suggestions" part of the response, but increasing it makes it get
> included.
>
> You know that spellcheck.maxCollationTries is too low if it exists in
> "suggestions" but it is not getting suggested in the "collation" section.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Matt Mongeau [mailto:halogenandto...@gmail.com]
> Sent: Wednesday, December 10, 2014 12:43 PM
> To: solr-user@lucene.apache.org
> Subject: Fwd: WordBreakSolrSpellChecker Usage
>
> If I have my search component setup like this
> https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
> entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?
>
> This doesn't seem to be the case, but it works for "Blackstone" with "Black
> stone". Any ideas on what I might be doing wrong?
>


RE: WordBreakSolrSpellChecker Usage

2014-12-11 Thread Dyer, James
My first guess here, is seeing it works some of the time but not others, is 
that these values are too low:

5
5 

You know spellcheck.count is too low if the suggestion you want is not in the 
"suggestions" part of the response, but increasing it makes it get included.

You know that spellcheck.maxCollationTries is too low if it exists in 
"suggestions" but it is not getting suggested in the "collation" section.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Matt Mongeau [mailto:halogenandto...@gmail.com] 
Sent: Wednesday, December 10, 2014 12:43 PM
To: solr-user@lucene.apache.org
Subject: Fwd: WordBreakSolrSpellChecker Usage

If I have my search component setup like this
https://gist.github.com/halogenandtoast/cf9f296d01527080f18c and I have an
entry for “Rockpoint” shouldn’t “Rock point” generate suggestions?

This doesn't seem to be the case, but it works for "Blackstone" with "Black
stone". Any ideas on what I might be doing wrong?


RE: Word Break Spell Checker Implementation algorithm

2014-10-21 Thread Dyer, James
David,

I do not know of a published algorithm for this.  All it does is in the case of 
terms with 0 frequency, it checks the document frequency of the various parts 
that can be made from the terms by breaking them and/or by combining adjacent 
terms. There are tuning parameters available that let you limit how much work 
it will do to try and find a suitable replacement.  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/WordBreakSpellChecker.html
 .

This of course is slower than indexing shingles as the work is done at query 
time vs index time.  But it saves the added index size and indexing time 
required to index the shingles separately.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: David Philip [mailto:davidphilipshe...@gmail.com] 
Sent: Monday, October 20, 2014 9:07 AM
To: solr-user@lucene.apache.org
Subject: Word Break Spell Checker Implementation algorithm

Hi,

Could you please point me to the link where I can learn about the
theory behind the implementation of word break spell checker?
Like we know that the solr's DirectSolrSpellCheck component uses levenstian
distance algorithm, what is the algorithm used behind the word break spell
checker component? How does it detects the space that is needed if it
doesn't use shingle?


Thanks - David


RE: Data Import Handler for CSV file

2014-10-10 Thread Dyer, James
Nabil,

Unfortunately, the out-of-the box functionality for DIH lacks a lot of what the 
csv handler has to offer.  There is a LineEntityProcessor (see 
http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor), but this 
will just output each line in a field called "rawLine".  It is up to you to 
then write a Transformer that will split it on commas (or better, use a lib 
like commons-csv to process it).

There is an extension available as an old patch that will give 
LineEntityProcessor the ability to handle delimited and fixed-width files.  
However, you'll need to apply the patch yourself and build DIH from source.   
See https://issues.apache.org/jira/browse/SOLR-2549 .

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: nabil Kouici [mailto:koui...@yahoo.fr] 
Sent: Thursday, October 09, 2014 4:26 PM
To: solr-user@lucene.apache.org; Ahmet Arslan
Subject: Re: Data Import Handler for CSV file

Hi Ahmet,
 
Thank you for this replay. Agree with you that csv update handler is fast but 
we need always to specify columns in the http request. In addition, I don't 
find documentation how to use csv update from solrj.

Could you please send me an example of DIH to load CSV file?

Regards,
Nabil.


Le Jeudi 9 octobre 2014 21h05, Ahmet Arslan  a écrit 
:
 


Hi Nabil,

whats wrong with csv update handler? It is quite fast.

By the way DIH has line entity processor, yes it is doable with existing DIH 
components.

Ahmet



On Thursday, October 9, 2014 9:58 PM, nabil Kouici  wrote:





Hi All,

Is it possible to have in solr a DIH to load from CSV file. Actually I'm using 
update/csv handler but not responding to my need.

Regards,
NKI.



RE: DIH - cacheImpl="SortedMapBackedCache" - empty rows from sub entity

2014-10-02 Thread Dyer, James
Try using the cacheKey/cacheLookup parameters instead:

   
  


James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: stockii [mailto:stock.jo...@googlemail.com] 
Sent: Thursday, October 02, 2014 9:19 AM
To: solr-user@lucene.apache.org
Subject: DIH - cacheImpl="SortedMapBackedCache" - empty rows from sub entity

Hello

i am fighting with cacheImpl="SortedMapBackedCache".

I want to refactor my ugly entities and so i try out sub-entities with
caching.
My Problem is that my cached subquery do not return any values from the
select. but why?

thats my entity 






this is very fast an clear and nice... but it does not work. all from table2
is not coming to my index =(
BUT if i remove the line with "cacheImpl="SortedMapBackedCache" all data is
present, but every row is selecte each by each.
i thought that this construct, hopefully replace my ugly big join-query in a
single entity!?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-cacheImpl-SortedMapBackedCache-empty-rows-from-sub-entity-tp4162316.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Spellchecking and suggesting part numbers

2014-09-24 Thread Dyer, James
Alexander,

You could use a higher value for spellcheck.count, maybe 20 or so, then in your 
application pick out the suggestions that make changes on the right side.

Another option is to use DirectSolrSpellChecker (usually a better choice 
anyhow) and set the "minPrefix" field.  This will require up to n characters on 
the left side to match before it will make suggestions.  Taking a quick look at 
the code, it seems to me it won't try and correct anything in this prefix 
region also.  So perhaps you can set this to 2-4 (default=1).  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29
 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] 
Sent: Wednesday, September 24, 2014 9:06 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking and suggesting part numbers

Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:



solr.IndexBasedSpellChecker
./spellchecker
did_you_mean_part




did_you_mean_part
on


spellcheck_part




















Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander




RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
You cannot use 100% because, as you say, 1 is intepreted as "1 document".  But 
you can do something like 99.9% .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 11:39 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Thank you, that works!

I'd already tried several values for maxQueryFrequency, but apparently 
without properly understanding it. I was confused by the line "A lower 
threshold is better for small indexes" when in fact I need a high value 
like 0.99, so every term returns suggestions. (Is it possible to set it 
to 100%? Because 1 gets interpreted as an absolute value.)

Nathaniel

On Mon, Sep 22, 2014 at 6:17 , Dyer, James 
 wrote:
> DirectSpellChecker defaults to not suggest anything for terms that 
> occur in 1% or more of the total documents in the index.  You can set 
> this higher in solrconfig.xml either with a fractional percent or a 
> whole-number absolute number of documents.
> 
> See 
> http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMaxQueryFrequency%28float%29
>  
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
> Sent: Monday, September 22, 2014 9:41 AM
> To: solr-user@lucene.apache.org
> Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
> alternativeTermCount
> 
> Yep, I tried it both as a default param in the request handler (as in 
> the config I sent), and in the request, but with no effect... That's 
> what surprised me, since it seems it should work.
> 
> On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
>  wrote:
>>  Did you try "spellcheck.alternativeTermCount" with 
>>  DirectSolrSpellChecker?  You can set it to whatever low value you 
>>  actually want it to return back to you (perhaps 20 suggestions 
>> max?).
>>  
>>  James Dyer
>>  Ingram Content Group
>>  (615) 213-4311
>>  
>>  
>>  -Original Message-
>>  From: Nathaniel Rudavsky-Brody 
>> [mailto:nathaniel.rudav...@gmail.com] 
>>  Sent: Monday, September 22, 2014 9:36 AM
>>  To: solr-user@lucene.apache.org
>>  Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
>>  alternativeTermCount
>>  
>>  Hi James,
>>  
>>  The request 
>>  
>> /spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
>>  returns
>>  
>>  quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
>>  quis, quae, quas, quem, quid, quin, qui, qua
>>  
>>  Replacing quiam (not in the index) by quidam (in the index) returns 
>>  nothing at all, but I want it to return
>>  
>>  quidam, quam, quia, quidem, quadam, quodam, quedam, ...
>>  
>>  When I was using the same parameters with IndexBasedSpellChecker, 
>> by 
>>  setting a high alternativeTermCount, I got results for both. But as 
>> I 
>>  said, then I can't differentiate the different maxEdits.
>>  
>>  The request handler is:
>>  
>>   >  class="org.apache.solr.handler.component.SearchHandler">
>>  
>>fuzzy1
>>20
>>100
>>  
>>  
>>fuzzyterms
>>  
>>
>>  
>>  Thanks!
>>  
>>  Nathaniel
>>  
>>  On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
>>   wrote:
>>>   Nathaniel,
>>>   
>>>   Can you show us all of the parameters you are sending to the 
>>>   spellchecker?  When you specify "alternativeTermCount" with 
>>>   "spellcheck.q=quidam", what are the terms you expect to get back? 
>>>  
>>>   Also, are you getting any query results back?  If you are using a 
>>>  "q" 
>>>   that returns results, or more results than you specify for 
>>>   "spellcheck.maxResultsForSuggest", spellcheck won't give you 
>>>  anything 
>>>   regardless of what you put for "spellcheck.q".
>>>   
>>>   James Dyer
>>>   Ingram Content Group
>>>   (615) 213-4311
>>>   
>>>   
>>>   -Original Message-
>>>   From: Nathaniel Rudavsky-Brody 
>>>  [mailto:nathaniel.rudav...@gmail.com] 
>>>   Sent: Monday, September 22, 2014 8:08 AM
>>>   To: solr-user@lucene.apache.org
>>>   Subject: fuzzy te

RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
DirectSpellChecker defaults to not suggest anything for terms that occur in 1% 
or more of the total documents in the index.  You can set this higher in 
solrconfig.xml either with a fractional percent or a whole-number absolute 
number of documents.

See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMaxQueryFrequency%28float%29
 

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 9:41 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Yep, I tried it both as a default param in the request handler (as in 
the config I sent), and in the request, but with no effect... That's 
what surprised me, since it seems it should work.

On Mon, Sep 22, 2014 at 4:38 , Dyer, James 
 wrote:
> Did you try "spellcheck.alternativeTermCount" with 
> DirectSolrSpellChecker?  You can set it to whatever low value you 
> actually want it to return back to you (perhaps 20 suggestions max?).
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
> Sent: Monday, September 22, 2014 9:36 AM
> To: solr-user@lucene.apache.org
> Subject: RE: fuzzy terms, DirectSolrSpellChecker and 
> alternativeTermCount
> 
> Hi James,
> 
> The request 
> /spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
> returns
> 
> quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
> quis, quae, quas, quem, quid, quin, qui, qua
> 
> Replacing quiam (not in the index) by quidam (in the index) returns 
> nothing at all, but I want it to return
> 
> quidam, quam, quia, quidem, quadam, quodam, quedam, ...
> 
> When I was using the same parameters with IndexBasedSpellChecker, by 
> setting a high alternativeTermCount, I got results for both. But as I 
> said, then I can't differentiate the different maxEdits.
> 
> The request handler is:
> 
>   class="org.apache.solr.handler.component.SearchHandler">
>     
>   fuzzy1
>   20
>   100
> 
> 
>   fuzzyterms
> 
>   
> 
> Thanks!
> 
> Nathaniel
> 
> On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
>  wrote:
>>  Nathaniel,
>>  
>>  Can you show us all of the parameters you are sending to the 
>>  spellchecker?  When you specify "alternativeTermCount" with 
>>  "spellcheck.q=quidam", what are the terms you expect to get back?  
>>  Also, are you getting any query results back?  If you are using a 
>> "q" 
>>  that returns results, or more results than you specify for 
>>  "spellcheck.maxResultsForSuggest", spellcheck won't give you 
>> anything 
>>  regardless of what you put for "spellcheck.q".
>>  
>>  James Dyer
>>  Ingram Content Group
>>  (615) 213-4311
>>  
>>  
>>  -Original Message-
>>  From: Nathaniel Rudavsky-Brody 
>> [mailto:nathaniel.rudav...@gmail.com] 
>>  Sent: Monday, September 22, 2014 8:08 AM
>>  To: solr-user@lucene.apache.org
>>  Subject: fuzzy terms, DirectSolrSpellChecker and 
>> alternativeTermCount
>>  
>>  Hello,
>>  
>>  I'm trying find the best way to "fake" the terms component for 
>> fuzzy 
>>  queries. That is, I need the full set of index terms for each of 
>> the 
>>  two queries "quidam~1" and "quidam~2".
>>  
>>  I tried defining two suggesters with FuzzyLookupFactory, with 
>>  maxEdits=1 and 2 respectively, but the results for "quidam~1" 
>> include 
>>  suffixes like "quodammodo", which makes sense for a suggester but 
>>  isn't 
>>  what I want here.
>>  
>>  Now I'm trying with the spell-checker. As far as I can see, 
>>  IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use 
>> it 
>>  to distinguish between my two queries. DirectSolrSpellChecker seems 
>>  like it should work, ie:
>>  
>>> class="solr.SpellCheckComponent">
>>  
>>fuzzy1
>>solr.DirectSolrSpellChecker
>>   1
>>  ...
>>  
>>  
>>fuzzy2
>>solr.DirectSolrSpellChecker
>>   2
>>  ...
>>  
>>
>>  
>>  However the parameter spellcheck.alternativeTermCount has no 
>> effect, 
>>  so 
>>  the query "spellcheck.q=quidam" gives no results, but 
>>  "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
>>  expected terms.
>>  
>>  Am I missing something? Or is there a better way to do this?
>>  
>>  Many thanks for any help and ideas,
>>  
>>  Nathaniel


RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
Did you try "spellcheck.alternativeTermCount" with DirectSolrSpellChecker?  You 
can set it to whatever low value you actually want it to return back to you 
(perhaps 20 suggestions max?).

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 9:36 AM
To: solr-user@lucene.apache.org
Subject: RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hi James,

The request 
/spellcheck?spellcheck=true&spellcheck.q=quiam&spellcheck.dictionary=fuzzy2 
returns

quidam, quam, quia, quoniam, quidem, quadam, quodam, quoad, quedam, 
quis, quae, quas, quem, quid, quin, qui, qua

Replacing quiam (not in the index) by quidam (in the index) returns 
nothing at all, but I want it to return

quidam, quam, quia, quidem, quadam, quodam, quedam, ...

When I was using the same parameters with IndexBasedSpellChecker, by 
setting a high alternativeTermCount, I got results for both. But as I 
said, then I can't differentiate the different maxEdits.

The request handler is:

 

  fuzzy1
  20
  100


  fuzzyterms

  

Thanks!

Nathaniel

On Mon, Sep 22, 2014 at 4:08 , Dyer, James 
 wrote:
> Nathaniel,
> 
> Can you show us all of the parameters you are sending to the 
> spellchecker?  When you specify "alternativeTermCount" with 
> "spellcheck.q=quidam", what are the terms you expect to get back?  
> Also, are you getting any query results back?  If you are using a "q" 
> that returns results, or more results than you specify for 
> "spellcheck.maxResultsForSuggest", spellcheck won't give you anything 
> regardless of what you put for "spellcheck.q".
> 
> James Dyer
> Ingram Content Group
> (615) 213-4311
> 
> 
> -Original Message-
> From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
> Sent: Monday, September 22, 2014 8:08 AM
> To: solr-user@lucene.apache.org
> Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount
> 
> Hello,
> 
> I'm trying find the best way to "fake" the terms component for fuzzy 
> queries. That is, I need the full set of index terms for each of the 
> two queries "quidam~1" and "quidam~2".
> 
> I tried defining two suggesters with FuzzyLookupFactory, with 
> maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
> suffixes like "quodammodo", which makes sense for a suggester but 
> isn't 
> what I want here.
> 
> Now I'm trying with the spell-checker. As far as I can see, 
> IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
> to distinguish between my two queries. DirectSolrSpellChecker seems 
> like it should work, ie:
> 
>   
> 
>   fuzzy1
>   solr.DirectSolrSpellChecker
>  1
>   ...
> 
> 
>   fuzzy2
>   solr.DirectSolrSpellChecker
>  2
> ...
> 
>   
> 
> However the parameter spellcheck.alternativeTermCount has no effect, 
> so 
> the query "spellcheck.q=quidam" gives no results, but 
> "spellcheck.q=quiam" (which doesn't exist in the index) gives the 
> expected terms.
> 
> Am I missing something? Or is there a better way to do this?
> 
> Many thanks for any help and ideas,
> 
> Nathaniel


RE: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

2014-09-22 Thread Dyer, James
Nathaniel,

Can you show us all of the parameters you are sending to the spellchecker?  
When you specify "alternativeTermCount" with "spellcheck.q=quidam", what are 
the terms you expect to get back?  Also, are you getting any query results 
back?  If you are using a "q" that returns results, or more results than you 
specify for "spellcheck.maxResultsForSuggest", spellcheck won't give you 
anything regardless of what you put for "spellcheck.q".

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Nathaniel Rudavsky-Brody [mailto:nathaniel.rudav...@gmail.com] 
Sent: Monday, September 22, 2014 8:08 AM
To: solr-user@lucene.apache.org
Subject: fuzzy terms, DirectSolrSpellChecker and alternativeTermCount

Hello,

I'm trying find the best way to "fake" the terms component for fuzzy 
queries. That is, I need the full set of index terms for each of the 
two queries "quidam~1" and "quidam~2".

I tried defining two suggesters with FuzzyLookupFactory, with 
maxEdits=1 and 2 respectively, but the results for "quidam~1" include 
suffixes like "quodammodo", which makes sense for a suggester but isn't 
what I want here.

Now I'm trying with the spell-checker. As far as I can see, 
IndexBasedSpellChecker doesn't let me set maxEdits, so I can't use it 
to distinguish between my two queries. DirectSolrSpellChecker seems 
like it should work, ie:

  

  fuzzy1
  solr.DirectSolrSpellChecker
 1
...


  fuzzy2
  solr.DirectSolrSpellChecker
 2
...

  

However the parameter spellcheck.alternativeTermCount has no effect, so 
the query "spellcheck.q=quidam" gives no results, but 
"spellcheck.q=quiam" (which doesn't exist in the index) gives the 
expected terms.

Am I missing something? Or is there a better way to do this?

Many thanks for any help and ideas,

Nathaniel


RE: Solr Spellcheck suggestions only return from /select handler when returning search results

2014-09-11 Thread Dyer, James
Thomas,

Yes, you are right about the problem being with the beginning of the word 
needing correction.  If you are using DirectSolrSpellChecker, you need to set 
the "minPrefix" parameter to 0.  Otherwise the default (1) requires the first 
character to match for it to try and correct it.

See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] 
Sent: Thursday, September 11, 2014 3:46 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Spellcheck suggestions only return from /select handler when 
returning search results

 Hi James, hi list,

I can confirm the existence of data that's within
1 Levenshtein step from "ichtscheiben":

{
 "responseHeader": {

"status": 0,
 "QTime": 0,
 "params": {
 "fl": "name,spell",
 "indent":
"true",
 "q": "name:Sichtscheiben",
 "_": "1410423419758",
 "wt":
"json",
 "rows": "50"
 }
 },
 "response": {
 "numFound": 6,
 "start":
0,
 "docs": [
 {
 "name": "Sichtscheiben",
 "spell": "Sichtscheiben"

},
 {
 "name": "Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {

"name": "Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 }
 ]
 }
}

Multiple records
exist that should match.

The note for alternativeTermCount is
appreciated.

I've tried another term: "Transport". I get suggestions
when I use "Transpor" and "Transpo", even "Transpotr", but "ransport"
doesn't yield any suggestions. Maybe it's a question of the beginning of
a word and has not really anything to do with stemming.

Am 10.09.2014
15:19 schrieb Dyer, James: 

> Thomas,
> 
> It looks like you've set
things up correctly in that while the user is searching against a
stemmed field ("name"), spellcheck is checking against a
lightly-analyzed copy of it ("spell"). This is the right way to do it as
spellcheck against stemmed forms is usually undesirable.
> 
> But as
you've experienced, you will sometimes get results (due to stemming) and
also suggestions (because the spellechecker is looking at unstemmed
forms). If you do not want spellcheck to return anything when you get
results, you can set "spellcheck.maxResultsForSuggest=0".
> 
> Now
keeping in mind we're comparing unstemmed forms, can you verify you
indeed have something in your index that is within 2 edits of
"ichtscheiben" ? My guess is you probably don't, which would be why you
do not get spelling results in that case.
> 
> Also, even if you do have
something within 2 edits, if "ichtscheiben" occurs in your index, by
default it won't try to correct it at all (even if the query returns
nothing, maybe because of filters or other required terms on the query).
In this case you need to set "spellcheck.alternativeTermCount" to a
non-zero value (try maybe 5).
> 
> See
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount
[1] and following sections.
> 
> James Dyer
> Ingram Content Group
>
(615) 213-4311
> 
> -Original Message-
> From: Thomas Michael
Engelke [mailto:thomas.enge...@posteo.de] 
> Sent: Wednesday, September
10, 2014 5:00 AM
> To: Solr user
> Subject: Solr Spellcheck suggestions
only return from /select handler when returning search results
> 
>
Hi,
> 
> I'm experimenting with the Spellcheck component and have
therefor
> used the example configuration for spell checking to try
things out. My
> solrconfig.xml looks like this:
> 
>  class="solr.SpellCheckComponent">
> 
name="queryAnalyzerFieldType">spell
> 
> 
> 
name="spellchecker">
> default
> 
name="field">spell
> 
name="classname">solr.DirectSolrSpellChecker
> 
>  name="distanceMeasure">internal
> 
>

> 
> 
>
 name="name">wordbreak
> 
name="classname">solr.WordBreakSolrSpellChecker
> 
name="field">spell
> true
> 
name="breakWords">true
> 10
>

> 
> 
> 
> And I've added the spellcheck
component to

RE: Solr Spellcheck suggestions only return from /select handler when returning search results

2014-09-10 Thread Dyer, James
Thomas,

It looks like you've set things up correctly in that while the user is 
searching against a stemmed field ("name"), spellcheck is checking against a 
lightly-analyzed copy of it ("spell").  This is the right way to do it as 
spellcheck against stemmed forms is usually undesirable.

But as you've experienced, you will sometimes get results (due to stemming) and 
also suggestions (because the spellechecker is looking at unstemmed forms).  If 
you do not want spellcheck to return anything when you get results, you can set 
"spellcheck.maxResultsForSuggest=0".

Now keeping in mind we're comparing unstemmed forms, can you verify you indeed 
have something in your index that is within 2 edits of "ichtscheiben" ?  My 
guess is you probably don't, which would be why you do not get spelling results 
in that case.

Also, even if you do have something within 2 edits, if "ichtscheiben" occurs in 
your index, by default it won't try to correct it at all (even if the query 
returns nothing, maybe because of filters or other required terms on the 
query).  In this case you need to set "spellcheck.alternativeTermCount" to a 
non-zero value (try maybe 5).

See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
and following sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] 
Sent: Wednesday, September 10, 2014 5:00 AM
To: Solr user
Subject: Solr Spellcheck suggestions only return from /select handler when 
returning search results

 Hi,

I'm experimenting with the Spellcheck component and have therefor
used the example configuration for spell checking to try things out. My
solrconfig.xml looks like this:

 
 spell
 
 
 
 default
 spell
 solr.DirectSolrSpellChecker
 
 internal
 
 
 
 
 wordbreak
 solr.WordBreakSolrSpellChecker
 spell
 true
 true
 10
 



And I've added the spellcheck component to my
/select request handler:

 
 ...
 

spellcheck
 
 

I have built up the
spellchecker source in the schema.xml from the name field:

 
 
 ...
 
 
 
 
 
 


 

As I'm querying the /select request handler,
I should get spellcheck suggestions with my results. However, I rarely
get a suggestion. Examples:

query: Sichtscheibe, spellcheck suggestion:
Sichtscheiben (works)
query: Sichtscheib, spellcheck suggestion:
Sichtscheiben (works)
query: ichtscheiben, no spellcheck suggestions

As
far as I can identify, I only get suggestions when I get real search
results. I get results for the first 2 examples, because the german
StemFilterFactory translates "Sichtscheibe" and "Sichtscheiben" into
"Sichtscheib", so there are matches found. However, the third query
should result in a suggestion, as the Levenshtein distance is less than
in the second example.

Suggestions, improvements, corrections?

 


RE: Solr spellcheck returns more than 1 word for a 1 word spellcheck

2014-09-02 Thread Dyer, James
This is the WordBreakSolrSpellChecker, which is there to correct spelling 
errors involving misplaced whitespace (or is it white space ??)  To disable it, 
remove this or similar line from your requestHandler in solrconfig.xml:

wordbreak

Keep in mind, if you want the best of both worlds, you can keep this there and 
using the "collation" feature, it will try and pick the best combination of 
spelling corrections that best fixes your user's query. See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate and 
following sections.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Thomas Michael Engelke [mailto:thomas.enge...@posteo.de] 
Sent: Monday, September 01, 2014 6:44 AM
To: Solr user
Subject: Solr spellcheck returns more than 1 word for a 1 word spellcheck

 I'm in the process of incorporating Solr spellchecking in our product.
For that, I've created a new field:

 



And in the
fieldType definitions:

 
 
 
 



Then I feed the names of products into the corresponding
core. They can have a lot of words (examples):

 door lock rear left

Door brake, door in front + rear fitting.

However, the names get pretty
long, and in the source data, they have been truncated. This sometimes
leaves parts of words at the end:

 The water pump can evacuate some
coo

I have created a spellcheck component, feeding of the `spell` field
defined earlier. Now for the problem.

Sometimes, when I look up a
slightly misspelled word, I get results I do not expect. Example
request:

 http://solr.url:8983/solr/en/spell?q=coole

This is (part of)
the response:

 cooler21

coo le2
 cable334
 co o
le4
 [...]

Now, as you can see, the
misspelled `coole` should have been `cooler`, and it's the first
suggestion. However, the second and fourth suggestion baffle me. After a
bit of research, I found this to be multiple words clunked together. As
I described above, `coo` was a part of a name that was truncated. I
found `co` the same way, and the source data contains a small number of
`o` characters on their own (product number names).

Now, my question
is: Why is Solr suggesting `multiple words` pasted together for a
spellcheck for a single word? Is there a way to prevent Solr from
pasting together word parts to forge suggestions? 
 


RE: Spellchecking suggestions won't collate

2014-08-20 Thread Dyer, James
Because "my" is the 7th suggestion down the list, it is going to need more than 
30 tries to figure out the one that can give some hits.  You can increase 
"maxCollationTries" if you're willing to endure the performance penalty of 
trying so many replacement queries.  This case actually highlights why 
DirecrSpellChecker by default doesn't even bother with short words like this.

Rather than letting the spellchecker check words this small, possibly you can 
just scan the user's input and make any words <4 characters long to be 
optional?  Or even just use a mm below 100%? (65% ?)  I realize this will give 
you a small loss of precision but the recall will be better and you'll have to 
rely less on spellcheck.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Friday, August 15, 2014 3:21 PM
To: Solr User List
Subject: Spellchecking suggestions won't collate

It must be Friday. I can't figure out why there is no collation value:

{
  "responseHeader":{
"status":0,
"QTime":31,
"params":{
  "spellcheck":"on",
  "spellcheck.collateParam.qf":"BUS_BUSINESS_NAME",
  "spellcheck.maxResultsForSuggest":"5",
  "spellcheck.maxCollations":"3",
  "spellcheck.maxCollationTries":"30",
  "qf":"BUS_BUSINESS_NAME_PHRASE",
  "q.alt":"*:*",
  "spellcheck.collate":"true",
  "spellcheck.onlyMorePopular":"false",
  "defType":"edismax",
  "debugQuery":"true",
  "echoParams":"all",
  "spellcheck.count":"10",
  "spellcheck.alternativeTermCount":"10",
  "indent":"true",
  "q":"Mi Next Promo",
  "wt":"json"}},
  "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
  },
  "spellcheck":{
"suggestions":[
  "mi",{
"numFound":10,
"startOffset":0,
"endOffset":2,
"suggestion":["mr",
  "mp",
  "mid",
  "mix",
  "mb",
  "mj",
  "my",
  "md",
  "mc",
  "ma"]},
  "next",{
"numFound":3,
"startOffset":3,
"endOffset":7,
"suggestion":["nest",
  "news",
  "neil"]},
  "promo",{
"numFound":4,
"startOffset":8,
"endOffset":13,
"suggestion":["photo",
  "prime",
  "pronto",
  "prof"]}]},

The actual business name is "My Next Promo" which I'm hoping would be the 
collation value.

Thanks,

Corey



RE: Spell check collation

2014-08-14 Thread Dyer, James
DirectSolrSpellChecker defaults with a minimum term length of 4.  So you'd need 
to bring this down with 1.  

But you might not like the results from this.  See: 
http://lucene.apache.org/core/4_6_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinQueryLength%28int%29

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Thursday, August 14, 2014 12:18 PM
To: Solr User List
Subject: Spell check collation

Solr 4.6

Current settings for my handler:
edismax
5
3
30
BUS_BUSINESS_NAME_PHRASE
10
10
true
false
0.2


  org.apache.solr.spelling.DirectSolrSpellChecker
  default
  spell
  internal
  0.5
  2
  1
  5
  0.01
  0.0001
  true

  

I'm querying:
h G's collision centre

hoping for a spell check suggestion of:
J G's collision centre

But there are no suggestions. Is there is term length limitation to 
spellchecking?

Thanks,

Corey



RE: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-12 Thread Dyer, James
Harun,

What do you mean by the "terminal console"?  Do you mean to say the admin gui 
freezes but you can still issue queries to solr directly through your browser?

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] 
Sent: Tuesday, August 12, 2014 2:46 AM
To: solr-user@lucene.apache.org
Subject: Re: When I use minimum match and maxCollationTries parameters together 
in edismax, Solr gets stuck

I tried again to make sure. Server starts, I can see web admin gui but I 
can't navigate btw tabs. It just says "loading". But on the terminal 
console everything seems normal.

Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com

On 12.08.2014 09:42, Harun Reşit Zafer wrote:
> I happens once the server is fully started. And when it gets stuck 
> sometimes I have to restart the server, sometimes I'm able to edit the 
> solrconfig.xml and reload it.
>
> Harun Reşit Zafer
> TÜBİTAK BİLGEM BTE
> Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
> T +90 262 675 3268
> W  http://www.hrzafer.com
>
> On 11.08.2014 17:32, Dyer, James wrote:
>> Harun,
>>
>> Just to clarify, is this happening during startup when a warmup query 
>> is running, or is this once the server is fully started? This might 
>> be another instance of https://issues.apache.org/jira/browse/SOLR-5386 .
>>
>> James Dyer
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -Original Message-
>> From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr]
>> Sent: Monday, August 11, 2014 8:39 AM
>> To: solr-user@lucene.apache.org
>> Subject: When I use minimum match and maxCollationTries parameters 
>> together in edismax, Solr gets stuck
>>
>> Hi,
>>
>> In the following configuration when uncomment both mm and
>> maxCollationTries lines, and run a query on |/select|, Solr gets stuck
>> with no exception.
>>
>> I tried different values for both parameters and found that values for
>> mm less than %40 still works.
>>
>>
>> |
>>   
>>
>>  explicit
>>  edismax
>>  1000
>>  title^3 title_s^2 content
>>  title content
>>  id,title,content,score
>>  0.1
>>  true
>>  true
>>  
>>  10
>>
>>  on
>>  default
>>  wordbreak
>>  true
>>  5
>>  5
>>  false
>>  2
>>  true
>>  true
>>  5
>>  
>>
>>  3
>>
>>
>>
>>  spellcheck
>>
>>
>>   
>>
>> Any idea? Thanks
>> |
>>
>>
>
>




RE: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck

2014-08-11 Thread Dyer, James
Harun,

Just to clarify, is this happening during startup when a warmup query is 
running, or is this once the server is fully started?  This might be another 
instance of https://issues.apache.org/jira/browse/SOLR-5386 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] 
Sent: Monday, August 11, 2014 8:39 AM
To: solr-user@lucene.apache.org
Subject: When I use minimum match and maxCollationTries parameters together in 
edismax, Solr gets stuck

Hi,

In the following configuration when uncomment both mm and 
maxCollationTries lines, and run a query on |/select|, Solr gets stuck 
with no exception.

I tried different values for both parameters and found that values for 
mm less than %40 still works.


|
 
  
explicit
edismax
1000
title^3 title_s^2 content
title content
id,title,content,score
0.1
true
true

10

on
default
wordbreak
true
5
5
false
2
true
true
5


3
  

  
spellcheck
  

 

Any idea? Thanks
|


-- 
Harun Reşit Zafer
TÜBİTAK BİLGEM BTE
Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü
T +90 262 675 3268
W  http://www.hrzafer.com


RE: SqlEntityProcessor

2014-08-11 Thread Dyer, James
I've heard of a user adding a separate  section to the end of their 
data-config.xml with a SqlEntityProcessor and an UPDATE statement.  It would 
run after your main  section.  I have not tried it myself, and surely 
DIH was not designed to do this, but it might work.

A better solution might be to write a class implementing EventListener that 
does the db update you want and put an "onImportEnd" listener in your 
configuration.  See 
http://wiki.apache.org/solr/DataImportHandler#EventListeners for details.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Christof Lorenz [mailto:loc...@web.de] 
Sent: Sunday, August 10, 2014 6:52 AM
To: solr-user@lucene.apache.org
Subject: SqlEntityProcessor

Hi folks,

i am searching for a way to update a certain column in the rdbms for
each
item as soon as the item was indexed by solr. 
The column will be the indicator in the delta-query to select un-indexed
items.
We don't want to use the timestamp based mechanism that is default.

Any ideas how we could implement this ?

Regards,
Lochri




RE: Change order of spell checker suggestions issue

2014-08-07 Thread Dyer, James
Corey,

Looking more carefully at your responses than I did last time I answered this 
question, it looks like every correction is 2 edits in this example.  

unie > unity (e>t , insert y)
unie > unger (i>g , insert r)
unie > unick (e>c , insert k)
unie > united (delete t , insert d)
unie > unique (delete q, u)
unie > unity (e>t , insert y)
unie > unser (s>i , insert r)
unie > unyi (i>y , e>i)

So both "score" and "freq" will give it to you by frequency.  Usually when I'm 
in doubt of something like this working like it should, I try to come up with 
more than 1 clear-cut example.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Thursday, August 07, 2014 11:31 AM
To: Solr User List
Subject: Change order of spell checker suggestions issue

Solr Rev: 4.6 Lucidworks: 2.6.3

This is sort of a repeat question, sorry.

In the solrconfig.xml, will changing the value for the comparatorClass affect 
the sort of suggestions returned?

This is my spellcheck component:


false
true
5


textSpell


org.apache.solr.spelling.DirectSolrSpellChecker
default
spell
internal
0.5
2
1
5
score
1
4
0.01

  

Searching for unie produces the following suggestions. But the suggestions 
appear to me to be by frequency (I've indicated Levenshtein distance in []):



unity [ 3  ]

1200





unger [ 3  ]

119





unick [ 3 ]

16





united [ 4 ]

16





unique [ 4 ]

10





unity [ 3 ]

7





unser [ 3 ]

7





unyi [ 2 ]

7



Is something configured incorrectly or am I just needing more coffee?



RE: Data Import handler and join select

2014-08-07 Thread Dyer, James
Alejandro,

You can use a sub-entity with a cache using DIH.  This will solve the 
"n+1-select" problem and make it run quickly.  Unfortunately, the only built-in 
cache implementation is in-memory so it doesn't scale.  There is a fast, 
disk-backed cache using bdb-je, which I use in production.  See 
https://issues.apache.org/jira/browse/SOLR-2613 .  You will need to build this 
youself and include it on the classpath, and obtain a copy of bdb-je from 
Oracle.  While bdb-je is open source, its license is incompatible with ASL so 
this will never officially be part of Solr.

Once you have a disk-backed cache, you can specify it on the child entity like 
this:




If you don't want to go down this path, you can achieve this all with one 
query, if you include and ORDER BY to sort by whatever field is used as Solr's 
uniqueKey, and add a dummy row at the end with a UNION:

SELECT p.uniqueKey, ..., 'A' as lastInd from PRODUCTS p 
INNER JOIN DESCRIPTIONS d ON p.uniqueKey = d.productKey
UNION SELECT 0 as uniqueKey, ... , 'B' as lastInd from dual 
ORDER BY uniqueKey, lastInd

Then your transformer would need to keep the "lastUniqueKey" in an instance 
variable and keep a running map of everything its seen for that key.  When the 
key changes, or if on the last row, send that map as the document.  Otherwise, 
the transformer returns null.  This will collect data from each row seen onto 
one document.

Keep in mind also, that in a lot of cases like this, it might just be easiest 
to write a program that uses solrj to send your documents rather than trying to 
make DIH's features fit your use-case.  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alejandro Marqués Rodríguez [mailto:amarq...@paradigmatecnologico.com] 
Sent: Thursday, August 07, 2014 1:43 AM
To: solr-user@lucene.apache.org
Subject: Data Import handler and join select

Hi,

I have one problem while indexing with data import hadler while doing a
join select. I have two tables, one with products and another one with
descriptions for each product in several languages.

So it would be:

Products: ID, NAME, BRAND, PRICE, ...
Descriptions: ID, LANGUAGE, DESCRIPTION

I would like to have every product indexed as a document with a multivalued
field "language" which contains every language that has an associated
description and several dinamic fields "description_" one for each language.

So it would be for example:

Id: 1
Name: Product
Brand: Brand
Price: 10
Languages: [es,en]
Description_es: Descripción en español
Description_en: English description

Our first approach was using sub-entities for the data import handler and
after implementing some transformers we had everything indexed as we
wanted. The sub-entity process added the descriptions for each language to
the solr document and then indexed them.

The problem was performance. I've read that using sub-entities affected
performance greatly, so we changed our process in order to use a join
instead.

Performance was greatly improved this way but now we have a problem. Each
time a row is processed a solr document is generated and indexed into solr,
but the data is not added to any previous data, but it replaces it.

If we had the previous example the query resulting from the join would be:

Id - Name - Brand - Price - Language - Description
1 - Product - Brand - 10 - es - Descripción en español
1 - Product - Brand - 10 - en - English description

So when indexing as both have the same id the only information I get is the
second row.

Is there any way for data import handler to manage this and allow the
documents to be indexed updating any previous data?

Thanks in advance



-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


RE: Debug DirectSolrSpellChecker Suggestion Sort Order

2014-08-01 Thread Dyer, James
Query results default to score.  But spelling suggestions sort by edit 
distance, with frequency as a secondary sort.  

unie => unger = 2 edits
unie => unick = 2 edits
unie => united = 3 edits
unie => unique = 3 edits
... etc ...

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Corey Gerhardt [mailto:corey.gerha...@directwest.com] 
Sent: Friday, August 01, 2014 3:01 PM
To: 'solr-user@lucene.apache.org'
Subject: Debug DirectSolrSpellChecker Suggestion Sort Order

Everything that I read says that the default sort order is by Score, yet this 
appears to me to be sorted by frequency:



10
0
4
0




unger

119





unick

16





united

16





unique

10





unity

7





unser

7





unyi

7





utke

5





uribe

3





uthe

3






I've even set in solconfig.xml:
score
Is there a way that I can debug my issue? I'm searching people names so ideally 
I'm hoping to get unyi higher in the list of suggestions.

Thanks,

Corey



RE: Searching words with spaces for word without spaces in solr

2014-07-31 Thread Dyer, James
If a user is searching on "ice cream" but your index has "icecream", you can 
treat this like a spelling error.  WordBreakSolrSpellChecker would identify the 
fact that  while "ice cream" is not in your index, "icecream" and then you can 
re-query for the corrected version without the space.

The problem with solving this with analyers, is that you can analyze 
"ice-cream" as either "ice cream" or "icecream" (split or catenate on hyphen).  
You can even analyze "IceCream > Ice Cream" (catenate on case change).  But how 
is your analyzer going to know that "icecream" should index as two tokens: 
"ice" "cream" ?  You're asking analysis to do too much in this case.  This is 
where spellcheck can bridge the gap.

Of course, if you have a discrete list of words you want split like this, then 
you can do it with analysis using index-time synonyms.  In this case, you need 
to provide it with the list.  See 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 for more information.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: sunshine glass [mailto:sunshineglassof2...@gmail.com] 
Sent: Thursday, July 31, 2014 10:32 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching words with spaces for word without spaces in solr

I am not clear with this. This link is related to spell check. Can you
elaborate it more ?


On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James 
wrote:

> In addition to the analyzer configuration you're using, you might want to
> also use WordBreakSolrSpellChecker to catch possible matches that can't
> easily be solved through analysis.  For more information, see the section
> for it at https://cwiki.apache.org/confluence/display/solr/Spell+Checking
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
> -Original Message-
> From: sunshine glass [mailto:sunshineglassof2...@gmail.com]
> Sent: Wednesday, July 30, 2014 9:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Searching words with spaces for word without spaces in solr
>
> This is the new configuration:
>
>  > positionIncrementGap="100">
> >   
> > 
> > 
> >  > outputUnigrams="true" tokenSeparator=""/>
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  > language="English" protected="protwords.txt"/>
> >> synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> > expand="true"/>
> >   
> >   
> > 
> > 
> >  > words="stopwords_text_prime_search.txt" enablePositionIncrements="true"
> />
> >  > outputUnigrams="true" tokenSeparator=""/>
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> >  > language="English" protected="protwords.txt"/>
> >   
> >
> >
> These are current docs in my index:
>
> 
> 
> 2
> Icecream
> 1475063961342705664
> 
> 
> 3
> Ice-cream
> 1475063961344802816
> 
> 
> 1
> Ice Cream
> 1475063961203245056
> 
> 
> 
>
> Query:
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
>
> Response:
>
> 
> 
> 1
> Ice Cream
> 1475063961203245056
> 
> 
> 3
> Ice-cream
> 1475063961344802816
> 
> 
> 
> title:ice cream
> title:ice cream
> 
> (+(title:ice DisjunctionMaxQuery((title:cream/no_coord
> 
> +(title:ice (title:cream))
> 
> 
> 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
> [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
> termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
> in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
> termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
> 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
> 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.61871845 = fieldWeight in 0, product of: 1.41

RE: Searching words with spaces for word without spaces in solr

2014-07-30 Thread Dyer, James
In addition to the analyzer configuration you're using, you might want to also 
use WordBreakSolrSpellChecker to catch possible matches that can't easily be 
solved through analysis.  For more information, see the section for it at 
https://cwiki.apache.org/confluence/display/solr/Spell+Checking 

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: sunshine glass [mailto:sunshineglassof2...@gmail.com] 
Sent: Wednesday, July 30, 2014 9:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching words with spaces for word without spaces in solr

This is the new configuration:

 positionIncrementGap="100">
>   
> 
> 
>  outputUnigrams="true" tokenSeparator=""/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> 
>  language="English" protected="protwords.txt"/>
>synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> expand="true"/>
>   
>   
> 
> 
>  words="stopwords_text_prime_search.txt" enablePositionIncrements="true" />
>  outputUnigrams="true" tokenSeparator=""/>
>  generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>  language="English" protected="protwords.txt"/>
>   
>
>
These are current docs in my index:



2
Icecream
1475063961342705664


3
Ice-cream
1475063961344802816


1
Ice Cream
1475063961203245056




Query:
http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true

Response:



1
Ice Cream
1475063961203245056


3
Ice-cream
1475063961344802816



title:ice cream
title:ice cream

(+(title:ice DisjunctionMaxQuery((title:cream/no_coord

+(title:ice (title:cream))


0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
[DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
maxDocs=3) 0.4375 = fieldNorm(doc=0)


0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
[DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
fieldNorm(doc=2)



Still not working 


On Fri, May 30, 2014 at 9:21 PM, Erick Erickson 
wrote:

> I'd spend some time with the admin/analysis page to understand the exact
> tokenization going on here. For instance, sequencing the
> shinglefilterfactory before worddelimiterfilterfactory may produce
> "interesting" resutls. And then throwing the Snowball factory at it and
> putting synonyms in front I suspect you're not indexing or searching
> what you think you are.
>
> Second, what happens when you query with &debug=query? That'll show you
> what the search string looks like.
>
> If that doesn't help, please post the results of looking at those things
> here, that'll provide some information for us to work with.
>
> Best,
> Erick
>
>
> On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> sunshineglassof2...@gmail.com> wrote:
>
> > Hi Folks,
> >
> > Any updates ??
> >
> >
> > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > sunshineglassof2...@gmail.com> wrote:
> >
> > > Dear Team,
> > >
> > > How can I handle compound word searches in solr ?.
> > > How can i search "hand bag" if I have "handbag" in my index. While
> using
> > > shingle in query analyzer, the query "ice cube" creates three tokens as
> > > "ice","cube", "icecube". Only ice and cubes are searched but not
> > > "icecubes".i.e not working for pair though I am using shingle filter.
> > >
> > > Here's the schema config.
> > >
> > >
> > >1.   > >positionIncrementGap="100">
> > >2.   
> > >3.  > >synonyms="synonyms_text_prime_index.txt" ignoreCase="true"

RE: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-16 Thread Dyer, James
Jia,

I agree that for the spellcheckers to work, you need   instead of .

But the "x-box" => "xbox" example ought to be solved by analyzing using 
WordDelimiterFilterFactory and "catenateWords=1" at query-time.  Did you 
re-index after changing your analysis chain (you need to)?  Perhaps you can 
show your full analyzer configuration, and someone here can help you find the 
problem. Also, the Analysis page on the solr Admin UI is invaluable for 
debugging text-field analyzer problems.

Getting "x box" to analyze to "xbox" is trickier (but possible).  The 
WordBreakSpellChecker is probably your best option if you have cases like this 
in your data & users' queries. 

Of course, if you have a finite number of products that have spelling variants 
like this, SynonymFilterFactory might be all you need.  I would recommend using 
index-time synonyms for your case rather than query-time synonyms.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Wednesday, July 16, 2014 7:42 AM
To: solr-user@lucene.apache.org; j...@ece.ubc.ca
Subject: Re: questions on Solr WordBreakSolrSpellChecker and 
WordDelimiterFilterFactory

Hi Jia,

What happens when you use 

 

instead of 

 

Ahmet


On Wednesday, July 16, 2014 3:07 AM, "j...@ece.ubc.ca"  wrote:



Hello everyone :)

I have a product called "xbox" indexed, and when the user search for
either "x-box" or "x box" i want the "xbox" product to be
returned.  I'm new to Solr, and from reading online, I thought I need
to use WordDelimiterFilterFactory for "x-box" case, and
WordBreakSolrSpellChecker for "x box" case. Is this correct?

(1) In my schema file, this is what I changed:


But I don't see the xbox product returned when the search term is
"x-box", so I must have missed something

(2) I tried to use  WordBreakSolrSpellChecker together with
DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
never got used:


    wc_textSpell

    
      default
      spellCheck
      solr.DirectSolrSpellChecker
      internal
          0.3
            2
            1
            5
            3
            0.01
            0.004
    

    wordbreak
    solr.WordBreakSolrSpellChecker
    spellCheck
    true
    true
    10
  
  

  
    
        SpellCheck
        true
       default
        wordbreak
         true
       false
       10
       true
       false
    
    
      wc_spellcheck
    
  

I tried to build the dictionary this way:
http://localhost/solr/coreName/select?spellcheck=true&spellcheck.build=true,
but the response returned is this:


0
0

true
true


build



What's the correct way to build the dictionary?
Even though my requestHandler's name="/spellcheck", i wasn't able to
use
http://localhost/solr/coreName/spellcheck?spellcheck=true&spellcheck.build=true
.. is there something wrong with my definition above?

(3) I also tried to use WordBreakSolrSpellChecker without the
DirectSolrSpellChecker as shown below:


  wc_textSpell
    
    default
    solr.WordBreakSolrSpellChecker
    spellCheck
    true
    true
    10
  
   

   
    
        SpellCheck
        true
       default
        
         true
       false
       10
       true
       false
    
    
      wc_spellcheck
    
  

And still unable to see WordBreakSolrSpellChecker being called anywhere.

Would someone kindly help me?

Many thanks,
Jia




RE: Endeca to Solr Migration

2014-07-02 Thread Dyer, James
We migrated a big application from Endeca (6.0, I think) a several years ago.  
We were not using any of the business UI tools, but we found that Solr is a lot 
more flexible and performant than Endeca.  But with more flexibility comes more 
you need to know.

The hardest thing was to migrate the Endeca dimensions to Solr facets.  We had 
endeca-api specific dependencies throughout the application, even in the 
presentation layer.  We ended up writing a bridge api that allowed us to keep 
our endeca-specific code and translate the queries to solr queries.  We are 
storing a cross-reference between the "N" values from Endeca and key/value 
pairs to translate something like N=4000 to "fq=Language:English".  With solr, 
there is more you need to do in your app that the backend doesn't manage for 
you.  In the end, though, it lets you sparate your concerns better.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mrg81 [mailto:maya...@gmail.com] 
Sent: Saturday, June 28, 2014 1:11 PM
To: solr-user@lucene.apache.org
Subject: Endeca to Solr Migration

Hello --

I wanted to get some details on Endeca to Solr Migration. I am
interested in few topics:

1. We would like to migrate the Faceted Navigation, Boosting individual
records and a few other items. 
2. But the biggest question is about the UI [Experience Manager] - I have
not found a tool that comes close to Experience Manager. I did read about
Hue [In response to Gareth's question on Migration], but it seems that we
will have to do a lot of customization to use that. 

Questions:

1. Is there a UI that we can use? Is it possible to un-hook the Experience
Manager UI and point to Solr?
2. How long does a typical migration take? Assuming that we have to migrate
the Faceted Navigation and Boosted records? 

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Endeca-to-Solr-Migration-tp4144582.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Spell checker - limit on number of misspelt words in a search term.

2014-06-23 Thread Dyer, James
I do not believe there is such a setting.  Most likely you will need to 
increase the value for "maxCollationTries" to get it to discover the "correct" 
combination. Just be sure not to set this too high as queries with a lot of 
misspelled words (or for something your index simply doesn't have) will take 
longer to complete.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Tuesday, June 17, 2014 4:49 PM
To: solr-user@lucene.apache.org
Subject: Spell checker - limit on number of misspelt words in a search term.

Hi All,

I am using the Direct Spell checker component and I have collate =true in
my solrconfig.xml.

The issue that I noticed is that , when I have a search term with upto two
words in it and if both of them are misspelled  I get a collation query  as
a suggestion in the spellchecker output, if I increase the search term
length to 3 words and spell all of them incorrectly then I do not get a
collation query as an output in the spell checker suggestions.

Is there a setting in solrconfig.xml file that's  controlling this behavior
by restricting the length of the search term to be up to two misspelt words
to suggest a collation query, if so I would need to change the property.

Can anyone please let me know how to do so ?

Thanks.

Sent from my mobile.


RE: Solr spellcheck - onlyMorePopular threshold?

2014-06-09 Thread Dyer, James
I believe it will return the terms that are most similar to the queried terms 
but have a greater term frequency than the queried terms.  It doesn't actually 
care what the term frequencies are, only that they are greater than the 
frequencies of the terms you queried on.

I do not know your use case, but you may want to consider using 
"spellcheck.alternativeTermCount" instead of "onlyMorePopular".  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount 
and 
https://issues.apache.org/jira/browse/SOLR-2585?focusedCommentId=13096153&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13096153
 for why.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alistair [mailto:ali...@gmail.com] 
Sent: Monday, June 09, 2014 3:06 AM
To: solr-user@lucene.apache.org
Subject: Solr spellcheck - onlyMorePopular threshold?

Hello all,

I was wondering what does the "onlyMorePopular" option for spellchecking use
as its threshold? Will it always pick the suggestion that returns the most
queries or does it base its result based off of some threshold that can be
configured? 

Thanks!

Ali.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-spellcheck-onlyMorePopular-threshold-tp4140727.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: DirectSpellChecker not returning expected suggestions.

2014-06-02 Thread Dyer, James
If "wrangle" is not in your index, and if it is within the max # of edits, then 
it should suggest it.

Are you getting anything back from spellcheck at all?  What is the exact query 
you are using?  How is the spellcheck field analyzed?  If you're using 
stemming, then "wrangle" and "wrangler" might be stemmed to the same word. (by 
the way, you shouldn't spellcheck against a stemmed or otherwise 
heavily-analyzed field).

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Monday, June 02, 2014 1:06 PM
To: solr-user@lucene.apache.org
Subject: Re: DirectSpellChecker not returning expected suggestions.

OK, I just realized that "wrangle" is a proper english word, probably thats
why I dont get a suggestion for "wrangler" in this case. How ever in my
test index there is no "wrangle" present , so even though this is a proper
english word , since there is no occurence of it in the index should'nt
Solr suggest me "wrangler" ?


On Mon, Jun 2, 2014 at 2:00 PM, S.L  wrote:

> I do not get any suggestion (when I search for "wrangle") , however I
> correctly get the suggestion wrangler when I search for wranglr , I am
> using the Direct and WordBreak spellcheckers in combination, I have not
> tried using anything else.
>
> Is the distance calculation of Solr different than what Levestien distance
> calculation ? I have set maxEdits to 1 , assuming that this corresponds to
> the maxDistance.
>
> Thanks for your help!
>
>
> On Mon, Jun 2, 2014 at 1:54 PM, david.w.smi...@gmail.com <
> david.w.smi...@gmail.com> wrote:
>
>> What do you get then?  Suggestions, but not the one you’re looking for, or
>> is it deemed correctly spelled?
>>
>> Have you tried another spellChecker impl, for troubleshooting purposes?
>>
>> ~ David Smiley
>> Freelance Apache Lucene/Solr Search Consultant/Developer
>> http://www.linkedin.com/in/davidwsmiley
>>
>>
>> On Sat, May 31, 2014 at 12:33 AM, S.L  wrote:
>>
>> > Hi All,
>> >
>> > I have a small test index of 400 documents , it happens to have an entry
>> > for  "wrangler", When I search for "wranglr", I correctly get the
>> collation
>> > suggestion as "wrangler", however when I search for "wrangle" , I do not
>> > get a suggestion for "wrangler".
>> >
>> > The Levenstien distance between wrangle --> wrangler is same as the
>> > Levestien distance between wranglr-->wrangler , I am just wondering why
>> I
>> > do not get a suggestion for wrangle.
>> >
>> > Below is my Direct spell checker configuration.
>> >
>> > 
>> >   direct
>> >   suggestAggregate
>> >   solr.DirectSolrSpellChecker
>> >   
>> >   internal
>> >   score
>> >
>> >   
>> >   0.7
>> >   
>> >   1
>> >   
>> >   3
>> >   
>> >   5
>> >   
>> >   4
>> >   
>> >   0.01
>> >   
>> >   
>> > 
>> >
>>
>
>


RE: Wordbreak spellchecker excessive breaking.

2014-05-30 Thread Dyer, James
I am not sure why changing spellcheck parameters would prevent your server from 
restarting.  One thing to check is to see if you have warming queries running 
that involve spellcheck.  I think I remember from long ago there was (maybe 
still is) an obscure bug where sometimes it will lock up in rare cases when 
spellcheck is used in warming queries.  I do not remember exactly what caused 
this or if it was ever fixed.

Besides that, you might want to post a stack trace or describe what happens 
when it doesn't restart.  Perhaps someone here will know what the problem is.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Friday, May 30, 2014 12:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Wordbreak spellchecker excessive breaking.

James,

Thanks for clearly stating this , I was not able to find this documented
anywhere, yes I am using it with another spell checker (Direct) with the
collation on. I will try the maxChangtes and let you know.

On a side note , whenever I change the spellchecker parameter , I need to
rebuild the index  and delete the solr data directory before that  as my
Tomcat instance would not even start, can you let me know why ?

Thanks.




On Tue, May 27, 2014 at 12:21 PM, Dyer, James 
wrote:

> You can do this if you set it up like in the mail Solr example:
>
> 
> wordbreak
> solr.WordBreakSolrSpellChecker
> name
> true
> true
> 10
> 
>
> The "combineWords" and "breakWords" flags let you tell it which kind of
> workbreak correction you want.  "maxChanges" controls the maximum number of
> words it can break 1 word into, or the maximum number of words it can
> combine.  It is reasonable to set this to 1 or 2.
>
> The best way to use this is in conjunction with a "regular" spellchecker
> like DirectSolrSpellChecker.  When used together with the collation
> functionality, it should take a query like "mob ile" and depending on what
> actually returns results from your data, suggest either "mobile" or perhaps
> "mob lie" or both.  The one thing is cannot do is fix a transposition or
> misspelling and combine or break words in one shot.  That is, it cannot
> detect that "mob lie" should become "mobile".
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: S.L [mailto:simpleliving...@gmail.com]
> Sent: Saturday, May 24, 2014 4:21 PM
> To: solr-user@lucene.apache.org
> Subject: Wordbreak spellchecker excessive breaking.
>
> I am using Solr wordbreak spellchecker and the issue is that when I search
> for a term like "mob ile" expecting that the wordbreak spellchecker would
> actually resutn a suggestion for "mobile" it breaks the search term into
> letters like "m o b"  I have two issues with this behavior.
>
>  1. How can I make Solr combine "mob ile" to mobile?
>  2. Not withstanding the fact that my search term "mob ile" is being broken
> incorrectly into individual letters , I realize that the wordbreak is
> needed in certain cases, how do I control the wordbreak so that it does not
> break it into letters like "m o b" which seems like excessive breaking to
> me ?
>
> Thanks.
>


RE: Wordbreak spellchecker excessive breaking.

2014-05-27 Thread Dyer, James
You can do this if you set it up like in the mail Solr example:


wordbreak
solr.WordBreakSolrSpellChecker  
name
true
true
10


The "combineWords" and "breakWords" flags let you tell it which kind of 
workbreak correction you want.  "maxChanges" controls the maximum number of 
words it can break 1 word into, or the maximum number of words it can combine.  
It is reasonable to set this to 1 or 2.

The best way to use this is in conjunction with a "regular" spellchecker like 
DirectSolrSpellChecker.  When used together with the collation functionality, 
it should take a query like "mob ile" and depending on what actually returns 
results from your data, suggest either "mobile" or perhaps "mob lie" or both.  
The one thing is cannot do is fix a transposition or misspelling and combine or 
break words in one shot.  That is, it cannot detect that "mob lie" should 
become "mobile".

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: S.L [mailto:simpleliving...@gmail.com] 
Sent: Saturday, May 24, 2014 4:21 PM
To: solr-user@lucene.apache.org
Subject: Wordbreak spellchecker excessive breaking.

I am using Solr wordbreak spellchecker and the issue is that when I search
for a term like "mob ile" expecting that the wordbreak spellchecker would
actually resutn a suggestion for "mobile" it breaks the search term into
letters like "m o b"  I have two issues with this behavior.

 1. How can I make Solr combine "mob ile" to mobile?
 2. Not withstanding the fact that my search term "mob ile" is being broken
incorrectly into individual letters , I realize that the wordbreak is
needed in certain cases, how do I control the wordbreak so that it does not
break it into letters like "m o b" which seems like excessive breaking to
me ?

Thanks.


RE: spellcheck if docsfound below threshold

2014-05-16 Thread Dyer, James
Its "spellcheck.maxResultsForSuggest".

http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Jan Verweij - Reeleez [mailto:j...@reeleez.nl] 
Sent: Monday, May 12, 2014 2:12 AM
To: solr-user@lucene.apache.org
Subject: spellcheck if docsfound below threshold

Hi,

Is there a setting to only include spellcheck if the number of documents
found is below a certain threshold?

Or would we need to rerun the request with the spellcheck parameters based
on the docs found?

Kind regards,

Jan Verweij


RE: Spell check [or] Did you mean this with Phrase suggestion

2014-05-16 Thread Dyer, James
Have you looked at "spellcheck.collate", which re-writes the entire query with 
one or more corrected words?  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate .  There are 
several options shown at this link that controls how the "collate" feature 
works.

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: vanitha venkatachalam [mailto:venkatachalam.vani...@gmail.com] 
Sent: Thursday, May 08, 2014 4:14 AM
To: solr-user@lucene.apache.org
Subject: Spell check [or] Did you mean this with Phrase suggestion

Hi,
We need a spell check component that suggest actual full phrase not just
words.

Say, we have list of brands : "Nike corporation", "Samsung electronics" ,

when I search for "tamsong", I like to get suggestions as "samsung
electronics" ( full phrase ) not just "samsung" ( words)
Please help.
-- 
regards,
Vanitha


RE: solr 4.2.1 spellcheck strange results

2014-05-16 Thread Dyer, James
To achieve what you want, you need to specify a lightly analyzed field (no 
stemming) for spellcheck.  For instance, if your "solr.SpellCheckComponent" in 
solrconfig.xml is set up with "field" of "title_full", then try using 
"title_full_unstemmed".  Also, if you are specifying a 
"queryAnalyzerFieldType", it should be the same as your unstemmed text field.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: HL [mailto:freemail.grha...@gmail.com] 
Sent: Saturday, May 10, 2014 9:12 AM
To: solr-user@lucene.apache.org
Subject: solr 4.2.1 spellcheck strange results

Hi

I am querying the solr server spellcheck and the results I get back 
although at first glance look ok
it seems like solr is replying back as if it made the search with the 
wrong key.

so while I query the server with the word
"καρδυα"
Solr is responding me as if it was querying the database with the word 
"καρδυ" eliminating the last char
---



---

Ideally, Solr should properly indicate that the suggestions correspond 
with "καρδυα" rather than "καρδυ".

Is there a way to make solr respond with the original search word from 
the query in it's responce, instead of the one that is getting the hits 
from ??

Regars,
Harry



here is the complete solr responce
---


0
23

true
*,score
0
καρδυα
καρδυα

title_short^750 title_full_unstemmed^600 title_full^400 title^500 
title_alt^200 title_new^100 series^50 series2^30 author^300 
author_fuller^150 contents^10 topic_unstemmed^550 topic^500 
geographic^300 genre^300 allfields_unstemmed^10 fulltext_unstemmed^10 
allfields fulltext isbn issn

basicSpell
arrarr
dismax
xml
0






3
0
6
0


καρδ
5


καρδι
3


καρυ
1



false







RE: spellcheck.q and local parameters

2014-04-28 Thread Dyer, James
spellcheck.q is supposed to take a list of raw query terms, so what you're 
trying to do in your example won't work.  What you should do instead is 
space-delimit the actual query terms that exist in "qq" and (nothing else) use 
that for your value of spellcheck.q .  

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Jeroen Steggink [mailto:jeroen.stegg...@contentstrategy.nl] 
Sent: Monday, April 28, 2014 3:01 PM
To: solr-user@lucene.apache.org
Subject: spellcheck.q and local parameters

Hi,

I'm having some trouble using the spellcheck.q parameter. The user's query is 
defined in the qq parameter and q parameter contains several other parameters 
for boosting.
I would like to use the qq parameter as a default for spellcheck.q.
I tried several ways of adding the qq parameter in the spellcheck.q parameter, 
but it doesn't seem to work. Is this at all possible or do I need to write a 
custom QueryConverter?

This is the configuration:

 _query_:"{!edismax qf=$qfQuery pf=$pfQuery bq=$boostQuery 
bf=$boostFunction v=$qq}"
{!v=$qq}

I haven't included all the variables, because they seem unnecessary.

Regards,
Jeroen



RE: Volatile spellcheck index

2014-02-05 Thread Dyer, James
Alejandro,

Assuming you're using Solr 3.x, under:


 
 ...
 


...you can add:

./spellchecker

...then the spell check index will be created on-disk and not in memory.

But in Solr 4.0, the default spellcheck implementation changed to 
org.apache.solr.spelling.DirectSolrSpellChecker, which does not create a 
separate index for for spellchecking, "build" does nothing, and you need not 
worry at all about these things.  The wiki still says "experimental" here but 
that is woefully out-of-date.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Alejandro Marqués Rodríguez [mailto:amarq...@paradigmatecnologico.com] 
Sent: Wednesday, February 05, 2014 3:41 AM
To: solr-user@lucene.apache.org
Subject: Volatile spellcheck index

Hi,

I'm having a problem with the spell check index building. I've configured
the spell checker component to have the index built on optimize.

*  *
*  *
*  spell*

*  *
*  spellchecker*
*  spell*
*  0.7*
*  true*
*  *
*  *

*  *
*  *
*  *
*  spellchecker*
*  on*
*  false*
*  false*
*  1*
**
**
*  spellcheck*
**
*  *

After the index process I launch an optimize request and the spellcheck
index is generated and everything is working fine. However, if I restart
Solr the spell check is not working anymore until I execute another
optimize request.

So, is this the expected way of working? Is the spell check index deleted
after every server restart? Is there any way to make it persistent?

And just one more question, I remember in previous Solr versions the
spellcheck had even its own folder under the data folder, so, for example I
could see if the spell check index had been generated just listing the
files under that folder. Does that folder still exist? Is there any way of
knowing if the spell check index has been generated without executing a
query that is supposed to return a correction?

Thanks in advance




-- 
Alejandro Marqués Rodríguez

Paradigma Tecnológico
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42



RE: How to override rollback behavior in DIH

2014-01-17 Thread Dyer, James
Peter,

I think you can override org.apache.solr.handler.dataimport.SolrWriter to have 
a custom (no-op) rollback method.  Your new writer should implement 
org.apache.solr.handler.dataimport.DIHWriter.  You can specify the "writerImpl" 
request parameter to specify the new class.

Unfortunately, it isn't actually this easy because your new writer is going to 
have to know what to do for all the other methods.  That is, there is no easy 
way to tell it how to write/commit/etc to Solr.  The default SolrWriter has a 
lot of hardcoded parameters it gets sent on construction in 
DataImportHandler#handleRequestBody.  You would have to somehow duplicate this 
construction on your own custom class.  See SOLR-3671 for an explanation of 
this dilemma.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: pkeegan01...@gmail.com [mailto:pkeegan01...@gmail.com] On Behalf Of Peter 
Keegan
Sent: Friday, January 17, 2014 7:51 AM
To: solr-user@lucene.apache.org
Subject: Re: How to override rollback behavior in DIH

Following up on this a bit - my main index is updated by a SolrJ client in
another process. If the DIH fails, the SolrJ client is never informed of
the index rollback, and any pending updates are lost. For now, I've made
sure that the DIH processor never throws an exception, but this makes it a
bit harder to detect the failure via the admin interface.

Thanks,
Peter


On Tue, Jan 14, 2014 at 11:12 AM, Peter Keegan wrote:

> I have a custom data import handler that creates an ExternalFileField from
> a source that is different from the main index. If the import fails (in my
> case, a connection refused in URLDataSource), I don't want to roll back any
> uncommitted changes to the main index. However, this seems to be the
> default behavior. Is there a way to override the IndexWriter rollback?
>
> Thanks,
> Peter
>



RE: Spellchecking problem

2013-12-20 Thread Dyer, James
Gastone,

You may, at least while developing, specify 
"spellcheck.collateExtendedResults=true" so you can see for sure it has 
verified how many hits each collation would return.

But my guess is that your "mm" parameter makes pretty much anything return some 
hits.  You might want to specify "spellcheck.collateParam.mm=100%" or something 
like that to restrict collations to only those queries that return hits if all 
the terms were required.

See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gastone Penzo [mailto:gastone.pe...@gmail.com] 
Sent: Friday, December 20, 2013 8:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Spellchecking problem

Thank you for your answer.

this is the querystring

http://seshat:9000/solr/browse/?q=otto+maialotto&fq=shelf:GIO&qf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0 category_label^0 &pf=ean^0
title^0.0035 authors^0 publisher^0 series^0 contributors^0 characters^0
manufacturer^0 actors^0 directors^0 tags^0
category_label^0&spellcheck=true&spellcheck.collate=true&spellcheck.maxCollationTries=10&spellcheck.q=otto+il+maialotto&mm=2%3C-1+5%3C80%25&

shelf is the field that rappresent the typology of product and GIO is the
typology (games)

the problem is the collation
the result gives ( Otto il polpo ) is the name of another product typology
(Book)
why?

the result is this.




5
0
17
0


otto il polpo
2


gigetto il maialetto  vol.0
2


sotto il mare  vol.0
2


sotto il mare
2


otto il rinoceronte
2



true
(otto il polpo)



this is the conf:


textSpell


  default
  spellcheckdef
  spellchecker
  on
  false
  true
  6
  true
  .001


  

Thanks






2013/12/20 Dyer, James 

> If you are using "spellcheck.maxCollateTries" with a value greater than 0
> the *collatation* section of your spellcheck response will give query
> corrections that are proven to produce hits.  Possibly you were looking at
> the first section where it gives individual word suggestions?  Or maybe one
> of your query parameters is misspelled (check case and that you have
> "spellcheck." in front of all of them)?  If you can't figure it out,
> provide us the entire query string you're using, the spellcheck response
> you get back and also the relevant portions of solrconfig.xml.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: Gastone Penzo [mailto:gastone.pe...@gmail.com]
> Sent: Friday, December 20, 2013 7:43 AM
> To: solr-user@lucene.apache.org
> Subject: Spellchecking problem
>
> Hello,
>
> i have problem with spellchecking.
> i use solr to index an ecommerce products (dvd, cd, books ecc)
> the collation is only one but in the index there'is the field: typology (of
> product)
> When i build spellchecking indexes, they are build together.
> How can i have only suggestsions of one typology?
>
> i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
> solr evaluates every suggestion with fq parameter of the query. In my query
> i have for example fq=typology:book
> but it doesn't works. why?
>
> i also tried collationparameter.fq=typology:book
> the same
>
> i use solr 4.3
> thank you
>
>
> --
> *Gastone Penzo*
>
>


-- 
*Gastone Penzo*



RE: Spellchecking problem

2013-12-20 Thread Dyer, James
If you are using "spellcheck.maxCollateTries" with a value greater than 0 the 
*collatation* section of your spellcheck response will give query corrections 
that are proven to produce hits.  Possibly you were looking at the first 
section where it gives individual word suggestions?  Or maybe one of your query 
parameters is misspelled (check case and that you have "spellcheck." in front 
of all of them)?  If you can't figure it out, provide us the entire query 
string you're using, the spellcheck response you get back and also the relevant 
portions of solrconfig.xml.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Gastone Penzo [mailto:gastone.pe...@gmail.com] 
Sent: Friday, December 20, 2013 7:43 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking problem

Hello,

i have problem with spellchecking.
i use solr to index an ecommerce products (dvd, cd, books ecc)
the collation is only one but in the index there'is the field: typology (of
product)
When i build spellchecking indexes, they are build together.
How can i have only suggestsions of one typology?

i read that if i user spellcheck.collate=true and i maxcollatetries > 0,
solr evaluates every suggestion with fq parameter of the query. In my query
i have for example fq=typology:book
but it doesn't works. why?

i also tried collationparameter.fq=typology:book
the same

i use solr 4.3
thank you


-- 
*Gastone Penzo*



RE: DataImport Handler, writing a new EntityProcessor

2013-12-18 Thread Dyer, James
The first thing I would suggest is to try and run it not in debug mode.  DIH's 
debug mode limits the number of documents it will take in, so that might be all 
that is wrong here.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: mathias@gmail.com [mailto:mathias@gmail.com] On Behalf Of Mathias 
Lux
Sent: Wednesday, December 18, 2013 4:04 AM
To: solr-user@lucene.apache.org
Subject: DataImport Handler, writing a new EntityProcessor

Hi all!

I've got a question regarding writing a new EntityProcessor, in the
same sense as the Tika one. My EntityProcessor should analyze jpg
images and create document fields to be used with the LIRE Solr plugin
(https://bitbucket.org/dermotte/liresolr). Basically I've taken the
same approach as the TikaEntityProcessor, but my setup just indexes
the first of 1000 images. I'm using a FileListEntityProcessor to get
all JPEGs from a directory and then I'm handing them over (see [2]).
My code for the EntityProcessor is at [1]. I've tried to use the
DataSource as well as the filePath attribute, but it ends up all the
same. However, the FileListEntityProcessor is able to read all the
files according to the debug output, but I'm missing the link from the
FileListEntityProcessor to the LireEntityProcessor.

I'd appreciate any pointer or help :)

cheers,
  Mathias

[1] LireEntityProcessor http://pastebin.com/JFajkNtf
[2] dataConfig http://pastebin.com/vSHucatJ

-- 
Dr. Mathias Lux
Klagenfurt University, Austria
http://tinyurl.com/mlux-itec



RE: SOLR DIH - Sub Entity with different datasource not working

2013-12-13 Thread Dyer, James
Without more of the stacktrace I don't think you'll get much help.  However, 
its my experience that exceptions that begin with "Unable to execute query" 
mean the db didn't like something about one or both queries.  I think it would 
have listed in there somewhere the actual query it didn't like, depending on 
your db driver.  If memory serves correct, i think the oracle driver also lists 
out why it didn't like the query in the exception.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lokn [mailto:nlokesh...@gmail.com] 
Sent: Friday, December 13, 2013 3:40 AM
To: solr-user@lucene.apache.org
Subject: SOLR DIH - Sub Entity with different datasource not working

Hi,
I have the data-config.xml with 2 data sources and the entity and sub
sub-entity connecting to datasource1 and datasource2 respectively.
When I do the full import, it is giving an error,
Exception :  org.apache.solr.handler.dataimport.DataImportHandlerException:
Unable to execute query: 

This is my db-data-config.xml  configuration file:

  

  
   
 
  

 
 
 



 



Let me know if there is anything wrong in this.

Thanks,
Lokesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-DIH-Sub-Entity-with-different-datasource-not-working-tp4106550.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Data Import Handler

2013-11-13 Thread Dyer, James
In solrcore.properties, put:

datasource.url=jdbc:xxx:yyy
datasource.driver=com.some.driver

In solrconfig.xml, put:



... 
${datasource.driver}
${datasource.url}
...



In data-config.xml, put:


Hope this works for you.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com] 
Sent: Wednesday, November 13, 2013 9:00 AM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

James can elaborate how to process driver="${dataimporter.request.driver}" 
url ="${dataimporter.request.url}" and all where to mention these 
my purpose is to config my DB Details(url,uname,password) in properties file

-Original Message-----
From: Dyer, James [mailto:james.d...@ingramcontent.com] 
Sent: Wednesday, November 06, 2013 7:42 PM
To: solr-user@lucene.apache.org
Subject: RE: Data Import Handler

If you prepend the variable name with "dataimporter.request", you can
include variables like these as request parameters:



/dih?driver=some.driver.class&url=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally
add each property to solrconfig.xml like this:



${dih.driver}
${dih.url}



Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com]
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file







RE: [Spellcheck] NullPointerException on QueryComponent.mergeIds

2013-11-12 Thread Dyer, James
Jean-Marc,

This might not solve the particular problem you're having, but to get 
spellcheck to work properly in a distributed enviornment, be sure to set the 
"shards.qt" parameter to the name of your request handler.  See 
http://wiki.apache.org/solr/SpellCheckComponent#Distributed_Search_Support .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Jean-Marc Desprez [mailto:jm.desp...@gmail.com] 
Sent: Tuesday, November 12, 2013 8:57 AM
To: solr-user@lucene.apache.org
Subject: [Spellcheck] NullPointerException on QueryComponent.mergeIds

Hello,

I'm following this tutorial : http://wiki.apache.org/solr/SolrCloud with a
SolR 4.5.0

I'm at the very first step, only two replica and two shard and I have only
*one* document in the index.

When I try to get a spellcheck, I have this error :
java.lang.NullPointerException
at
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:843)

I do not understand what I'm doing wrong and how I can get an error on
mergeIds with only one document in the index (merge this doc with ... ??)

Some technical details :
URL :
http://127.0.0.1:8983/solr/bench/select?shards.qt=ri_spell_fr_FR&q=sistem&distrib=true
If I set "distrib" to false, no error.

My uniqueKey is indexed and stored :


ref


My conf :

  
true
true
true
true
3
true
5
ri_spell_fr_FR
false
  

  
spellcheck_fr_FR
  



  suggest_fr_FR

  
ri_spell_fr_FR
spell_fr_FR
./spellchecker_fr_FR
org.apache.lucene.search.spell.JaroWinklerDistance
  

  ...



With this URL :
http://127.0.0.1:8983/solr/bench/select?qt=ri_spell_fr_FR&q=sistem

I have no error but the response is empty :
01


Thanks
Jean-Marc


RE: spellcheck solr 4.3.1

2013-11-11 Thread Dyer, James
There are 2 parameters you want to consider:

First is "spellcheck.maxResultsForSuggest".  Because you have an "OR" query, 
you'll get hits if only 1 query term is in the index.  This parameter lets you 
tune it to make it suggest if the query returns n or fewer hits.  My memory 
tells me, however, that if you leave this parameter out entirely, it will still 
return suggestions for "OR" queries with some misspelled words (false memory on 
my part?).  Possibly you have this set to 1?  Omitting it might be a better 
option.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.maxResultsForSuggest 
.

Second is "collateParam", which lets you override certain query parameters when 
the spellchecker is testing collations against the index.  For instance, if you 
have "q.op=OR", the spellchecker will return collations that possibly only have 
1 correct term.  The reason is it simply checks if a collation will return any 
hits.  So you can overide this with "spellcheck.collateParam.q.op=AND".  The 
same can be done for "mm" if using edismax.  See 
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collateParam.XX .

James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Daniel Borup [mailto:d...@alpha-solutions.dk] 
Sent: Monday, November 11, 2013 7:38 AM
To: solr-user@lucene.apache.org
Subject: spellcheck solr 4.3.1

Hey

I am running af solr 4.3.1 and working is implementing spellcheck using 
solr.DirectSolrSpellChecker everything seems to be working  fine but at have 
one issue.

If I search for
http://localhost:8765/solr/MainIndex/spell?q=kim%20AND%20larsen

the result is some hits and the spell component return the following structure.



true


I would have liked that if some suggest were found they were return

If I do a search for
http://localhost:8765/solr/MainIndex/spell?q=kim%20AND%20larsenn

with larsen spelled wrong (larsenn) the spell component return the following:




1
8
15
0


larsen
12



false

kim AND larsen
12

kim
larsen





In my point of view this is correct but, if I do the same search as above just 
as an OR search http://localhost:8765/solr/MainIndex/spell?q=kim%20OR%20larsenn
The spell component return some result and:



true



larsenn now is spelled corrected according to solr, I cannot understand this 
behavior. Is there a setting to adjust the spell component so it always return 
suggestions ? or a way to have suggest in OR search with one wrong word working?






Med venlig hilsen / Best regards

Daniel Borup
Tel: (+45) 28 87 69 18
E-mail: d...@alpha-solutions.dk

Alpha Solutions A/S
Sølvgade 10, 1.sal, DK-1307 Copenhagen K
Tel: (+45) 70 20 65 38
Web: www.alpha-solutions.dk


** This message including any attachments may contain confidential and/or 
privileged information
intended only for the person or entity to which it is addressed. If you are not 
the intended recipient
you should delete this message. Any printing, copying, distribution or other 
use of this message is strictly prohibited.
If you have received this message in error, please notify the sender 
immediately by telephone
or e-mail and delete all copies of this message and any attachments from your 
system. Thank you.



RE: Data Import Handler

2013-11-06 Thread Dyer, James
If you prepend the variable name with "dataimporter.request", you can include 
variables like these as request parameters:



/dih?driver=some.driver.class&url=jdbc:url:something

If you want to include these in solrcore.properties, you can additionally add 
each property to solrconfig.xml like this:



${dih.driver}
${dih.url}



Then in solrcore.properties:
 dih.driver=some.driver.class
 dih.url=jdbc:url:something

See http://wiki.apache.org/solr/SolrConfigXml?#System_property_substitution


James Dyer
Ingram Content Group
(615) 213-4311

-Original Message-
From: Ramesh [mailto:ramesh.po...@vensaiinc.com] 
Sent: Wednesday, November 06, 2013 7:25 AM
To: solr-user@lucene.apache.org
Subject: Data Import Handler

Hi Folks,

 

Can anyone suggest me how can customize dataconfig.xml file 

I want to provide database details like( db_url,uname,password ) from my own
properties file instead of dataconfig.xaml file



  1   2   3   4   5   >