Different queries for same meaning searches

2012-08-21 Thread Dalius Sidlauskas

Hello, here is my index and index analyzer configuration:

charFilter class=solr.PatternReplaceCharFilterFactory pattern=’|' 
replacement= /

tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/

Search for d Osona and d’Osona creates d and osona tokens. But 
ParsedQuery is different:


#1 d Osona

+((
DisjunctionMaxQuery((search_definitions:d | search_title:d))
DisjunctionMaxQuery((search_definitions:osona | search_title:osona))
)~2)
DisjunctionMaxQuery((search_definitions:d osona | search_title:d 
osona^3.0))


#2 d’Osona

+DisjunctionMaxQuery((
(search_definitions:d search_definitions:osona) |
(search_title:d search_title:osona)
))
DisjunctionMaxQuery((search_definitions:d osona | search_title:d 
osona^3.0))



And the results are different as well. Where I can find explanation for 
this?


--
Regards!
Dalius Sidlauskas



Re: Different queries for same meaning searches

2012-08-21 Thread Dalius Sidlauskas

Yes, the mm is 100%. Thank you for a detailed answer.

Regards!
Dalius Sidlauskas

On 21/08/12 15:21, Jack Krupansky wrote:
Solr doesn't actually know any natural language, so it has no way of 
assessing whether two token streams have the same meaning. In your 
case, the surface forms/syntax are subtly different - two separate 
terms vs. a single source term with embedded punctuation.


It appears that you are probbaly using the edismax query parser and 
probably have mm set to 100% or q.op set to AND (the ~2 
indicates a BooleanQuery with minMatch of 2 terms.) mm of 100% is 
equivalent to the AND operator, some/most of the time.


For the second query you have a split-term which is treated as a 
single term/token until the fieldType analyzer splits it into two 
terms and then does an OR of the sub-terms. Unfortunately, mm and 
q.op are not passed down to the analyzer, so you have no way of 
changing that OR to an AND - this is why you get different 
results. But what you can do is set autoGeneratePhraseQueries=true 
on your field type(s) to cause the query parser to generate a phrase 
query for q  osona rather than the OR. That's not the same as 
AND, but depending on your application it may be sufficient or even 
preferable.


-- Jack Krupansky

-Original Message- From: Dalius Sidlauskas
Sent: Tuesday, August 21, 2012 9:35 AM
To: solr-user@lucene.apache.org
Subject: Different queries for same meaning searches

Hello, here is my index and index analyzer configuration:

charFilter class=solr.PatternReplaceCharFilterFactory pattern=’|'
replacement= /
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/

Search for d Osona and d’Osona creates d and osona tokens. But
ParsedQuery is different:

#1 d Osona

+((
DisjunctionMaxQuery((search_definitions:d | search_title:d))
DisjunctionMaxQuery((search_definitions:osona | search_title:osona))
)~2)
DisjunctionMaxQuery((search_definitions:d osona | search_title:d
osona^3.0))

#2 d’Osona

+DisjunctionMaxQuery((
(search_definitions:d search_definitions:osona) |
(search_title:d search_title:osona)
))
DisjunctionMaxQuery((search_definitions:d osona | search_title:d
osona^3.0))


And the results are different as well. Where I can find explanation for
this?





Debug Query returns different debug explanation output for documents

2012-07-09 Thread Dalius Sidlauskas
(termFreq(search_dc_title:jaume)=1)
  7.6105256 = idf(docFreq=584, maxDocs=434581)
  0.5 = fieldNorm(field=search_dc_title, doc=8120)
  0.46420816 = (MATCH) weight(search_title:jaume in 8120), product of:
0.122290425 = queryWeight(search_title:jaume), product of:
  7.591897 = idf(docFreq=595, maxDocs=434581)
  0.01610802 = queryNorm
3.7959485 = (MATCH) fieldWeight(search_title:jaume in 8120), 
product of:

  1.0 = tf(termFreq(search_title:jaume)=1)
  7.591897 = idf(docFreq=595, maxDocs=434581)
  0.5 = fieldNorm(field=search_title, doc=8120)
3.698813 = (MATCH) max of:
  3.698813 = (MATCH) weight(search_dc_title:destorrent^3.0 in 
8120), product of:
0.5978991 = queryWeight(search_dc_title:destorrent^3.0), 
product of:

  3.0 = boost
  12.3727 = idf(docFreq=4, maxDocs=434581)
  0.01610802 = queryNorm
6.18635 = (MATCH) fieldWeight(search_dc_title:destorrent in 
8120), product of:

  1.0 = tf(termFreq(search_dc_title:destorrent)=1)
  12.3727 = idf(docFreq=4, maxDocs=434581)
  0.5 = fieldNorm(field=search_dc_title, doc=8120)
  1.2329376 = (MATCH) weight(search_title:destorrent in 8120), 
product of:

0.1992997 = queryWeight(search_title:destorrent), product of:
  12.3727 = idf(docFreq=4, maxDocs=434581)
  0.01610802 = queryNorm
6.18635 = (MATCH) fieldWeight(search_title:destorrent in 8120), 
product of:

  1.0 = tf(termFreq(search_title:destorrent)=1)
  12.3727 = idf(docFreq=4, maxDocs=434581)
  0.5 = fieldNorm(field=search_title, doc=8120)
0.5152115 = (MATCH) max of:
  0.019129172 = (MATCH) weight(search_full_definitions:i in 8120), 
product of:

0.041998293 = queryWeight(search_full_definitions:i), product of:
  2.607291 = idf(docFreq=87102, maxDocs=434581)
  0.01610802 = queryNorm
0.455475 = (MATCH) fieldWeight(search_full_definitions:i in 
8120), product of:

  2.236068 = tf(termFreq(search_full_definitions:i)=5)
  2.607291 = idf(docFreq=87102, maxDocs=434581)
  0.078125 = fieldNorm(field=search_full_definitions, doc=8120)
  0.5152115 = (MATCH) weight(search_dc_title:i^3.0 in 8120), 
product of:

0.22314619 = queryWeight(search_dc_title:i^3.0), product of:
  3.0 = boost
  4.617704 = idf(docFreq=11665, maxDocs=434581)
  0.01610802 = queryNorm
2.308852 = (MATCH) fieldWeight(search_dc_title:i in 8120), 
product of:

  1.0 = tf(termFreq(search_dc_title:i)=1)
  4.617704 = idf(docFreq=11665, maxDocs=434581)
  0.5 = fieldNorm(field=search_dc_title, doc=8120)
  0.1702569 = (MATCH) weight(search_title:i in 8120), product of:
0.074060805 = queryWeight(search_title:i), product of:
  4.5977597 = idf(docFreq=11900, maxDocs=434581)
  0.01610802 = queryNorm
2.2988799 = (MATCH) fieldWeight(search_title:i in 8120), 
product of:

  1.0 = tf(termFreq(search_title:i)=1)
  4.5977597 = idf(docFreq=11900, maxDocs=434581)
  0.5 = fieldNorm(field=search_title, doc=8120)
  0.75 = coord(3/4)

Can anyone explain what happens here? Thank you in advance!

--
Regards!
Dalius Sidlauskas



Re: Debug Query returns different debug explanation output for documents

2012-07-09 Thread Dalius Sidlauskas

Does these settings somehow are involved into this?

   str name=defTypeedismax/str
   str name=mm2lt;-25%/str

Regards!
Dalius Sidlauskas

On 09/07/12 11:23, Dalius Sidlauskas wrote:
Hello, I have a query that returns different debug explanation output 
for some reason. First document has no (MATCH) product of: that 
others have and makes it as a first results that should not be.


Here is my debug explain output:

Doc #1

4.2649355 = (MATCH) sum of:
  0.035260722 = (MATCH) max of:
0.035260722 = (MATCH) weight(search_full_definitions:jaume in 
5265), product of:
  0.09532936 = queryWeight(search_full_definitions:jaume), product 
of:

5.9181304 = idf(docFreq=3177, maxDocs=434581)
0.01610802 = queryNorm
  0.36988315 = (MATCH) fieldWeight(search_full_definitions:jaume 
in 5265), product of:

1.0 = tf(termFreq(search_full_definitions:jaume)=1)
5.9181304 = idf(docFreq=3177, maxDocs=434581)
0.0625 = fieldNorm(field=search_full_definitions, doc=5265)
  3.698813 = (MATCH) max of:
0.122653306 = (MATCH) weight(search_full_definitions:destorrent in 
5265), product of:
  0.17779547 = queryWeight(search_full_definitions:destorrent), 
product of:

11.037699 = idf(docFreq=18, maxDocs=434581)
0.01610802 = queryNorm
  0.6898562 = (MATCH) 
fieldWeight(search_full_definitions:destorrent in 5265), product of:

1.0 = tf(termFreq(search_full_definitions:destorrent)=1)
11.037699 = idf(docFreq=18, maxDocs=434581)
0.0625 = fieldNorm(field=search_full_definitions, doc=5265)
3.698813 = (MATCH) weight(search_dc_title:destorrent^3.0 in 5265), 
product of:
  0.5978991 = queryWeight(search_dc_title:destorrent^3.0), product 
of:

3.0 = boost
12.3727 = idf(docFreq=4, maxDocs=434581)
0.01610802 = queryNorm
  6.18635 = (MATCH) fieldWeight(search_dc_title:destorrent in 
5265), product of:

1.0 = tf(termFreq(search_dc_title:destorrent)=1)
12.3727 = idf(docFreq=4, maxDocs=434581)
0.5 = fieldNorm(field=search_dc_title, doc=5265)
1.2329376 = (MATCH) weight(search_title:destorrent in 5265), 
product of:

  0.1992997 = queryWeight(search_title:destorrent), product of:
12.3727 = idf(docFreq=4, maxDocs=434581)
0.01610802 = queryNorm
  6.18635 = (MATCH) fieldWeight(search_title:destorrent in 5265), 
product of:

1.0 = tf(termFreq(search_title:destorrent)=1)
12.3727 = idf(docFreq=4, maxDocs=434581)
0.5 = fieldNorm(field=search_title, doc=5265)
  0.5152115 = (MATCH) max of:
0.020531582 = (MATCH) weight(search_full_definitions:i in 5265), 
product of:

  0.041998293 = queryWeight(search_full_definitions:i), product of:
2.607291 = idf(docFreq=87102, maxDocs=434581)
0.01610802 = queryNorm
  0.48886704 = (MATCH) fieldWeight(search_full_definitions:i in 
5265), product of:

3.0 = tf(termFreq(search_full_definitions:i)=9)
2.607291 = idf(docFreq=87102, maxDocs=434581)
0.0625 = fieldNorm(field=search_full_definitions, doc=5265)
0.5152115 = (MATCH) weight(search_dc_title:i^3.0 in 5265), product 
of:

  0.22314619 = queryWeight(search_dc_title:i^3.0), product of:
3.0 = boost
4.617704 = idf(docFreq=11665, maxDocs=434581)
0.01610802 = queryNorm
  2.308852 = (MATCH) fieldWeight(search_dc_title:i in 5265), 
product of:

1.0 = tf(termFreq(search_dc_title:i)=1)
4.617704 = idf(docFreq=11665, maxDocs=434581)
0.5 = fieldNorm(field=search_dc_title, doc=5265)
0.1702569 = (MATCH) weight(search_title:i in 5265), product of:
  0.074060805 = queryWeight(search_title:i), product of:
4.5977597 = idf(docFreq=11900, maxDocs=434581)
0.01610802 = queryNorm
  2.2988799 = (MATCH) fieldWeight(search_title:i in 5265), product 
of:

1.0 = tf(termFreq(search_title:i)=1)
4.5977597 = idf(docFreq=11900, maxDocs=434581)
0.5 = fieldNorm(field=search_title, doc=5265)
  0.015650144 = (MATCH) max of:
0.015650144 = (MATCH) product of:
  0.031300288 = (MATCH) sum of:
0.031300288 = (MATCH) weight(search_full_definitions:casa in 
5265), product of:
  0.08981632 = queryWeight(search_full_definitions:casa), 
product of:

5.5758758 = idf(docFreq=4474, maxDocs=434581)
0.01610802 = queryNorm
  0.34849223 = (MATCH) 
fieldWeight(search_full_definitions:casa in 5265), product of:

1.0 = tf(termFreq(search_full_definitions:casa)=1)
5.5758758 = idf(docFreq=4474, maxDocs=434581)
0.0625 = fieldNorm(field=search_full_definitions, doc=5265)
  0.5 = coord(1/2)


Doc #2

4.210119 = (MATCH) product of:
  5.6134915 = (MATCH) sum of:
1.3994671 = (MATCH) max of:
  1.3994671 = (MATCH) weight(search_dc_title:jaume^3.0 in 8120), 
product of:

0.36777148 = queryWeight(search_dc_title:jaume^3.0), product of:
  3.0

Re: Hi

2012-02-10 Thread Dalius Sidlauskas
Hi, I don't think this is the right place for this question. You should 
follow samples of solr client api integration in Java and develop your 
way in konakart..


Regards!
Dalius Sidlauskas


On 10/02/12 08:25, sumal wrote:

My self I am Sumal who working as a Software Engineer. Currently I am
developing web based e-commerce applications using java and i am using e
commerce Konakart shopping cart as well. I am using

Konakart community edition. I am kindly requesting some information about
how to integrate solr in my konakart



If not can you send me some sample application which is using this solr
search engine... some application mans small jsp page that functioning to
search using solr...



Thank you!







Best Regards,

  -Sumal Wattegedara-




Re: Wildcard ? issue?

2012-02-09 Thread Dalius Sidlauskas

It seams it is applicable for Solr 3.6 and 4.0. Mines version is 3.5

Regards!
Dalius Sidlauskas


On 08/02/12 17:26, Ahmet Arslan wrote:

I have already tried this and it did
not helped because it does not
highlight matches if wild-card is used. The field
configuration turns
data to:

This writeup should explain your scenario :
http://wiki.apache.org/solr/MultitermQueryAnalysis


Re: Wildcard ? issue?

2012-02-09 Thread Dalius Sidlauskas

Okay, I get it, 3.6 is not released yet. Thanks for help fellas!

Regards!
Dalius Sidlauskas


On 09/02/12 10:19, Dalius Sidlauskas wrote:

It seams it is applicable for Solr 3.6 and 4.0. Mines version is 3.5

Regards!
Dalius Sidlauskas


On 08/02/12 17:26, Ahmet Arslan wrote:

I have already tried this and it did
not helped because it does not
highlight matches if wild-card is used. The field
configuration turns
data to:

This writeup should explain your scenario :
http://wiki.apache.org/solr/MultitermQueryAnalysis


Wildcard ? issue?

2012-02-08 Thread Dalius Sidlauskas

Sorry for inaccurate title.

I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) 
containing same value:


title xmlns=http://www.tei-c.org/ns/1.0;cal.lígraf/title

and these fields are configured accordingly:

fieldType name=xml  class=solr.TextField  positionIncrementGap=100
  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/
  /analyzer
/fieldType

fieldType name=xml_unicode  class=solr.TextField  
positionIncrementGap=100
  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

fieldType name=xml_unicode_full  class=solr.TextField  
positionIncrementGap=100
  analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

And finally my search configuration:

requestHandler name=dictionary  class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsall/str
   str name=defTypeedismax/str
   str name=mm2lt;-25%/str
   str name=qfdc_title_unicode_full^2 dc_title_unicode^2 
dc_title/str
   int  name=rows10/int
   str name=spellcheck.onlyMorePopulartrue/str
   str name=spellcheck.extendedResultsfalse/str
   str name=spellcheck.count1/str
 /lst
arr name=last-components
  strspellcheck/str
/arr
/requestHandler

I am trying to match the field with various search phrases (that are 
valid). There are results:



#   search phrase   match?  Comment
1   cal.lígra?  yes 
2   cal.ligra?  no  Changed í to i
3   cal.ligraf  yes 
4   calligra?   no  


The problem is the #2 attempt to match a data. The #3 works replacing ? 
with f.


One more thing. If * is used insted of ? other data is matched as 
cal.lígrafia but not cal.lígraf...


Also I have spotted some logic missmatch in debug parsedQuery field:
*
cal·lígraf:* +DisjunctionMaxQuery((dc_title:*calligraf*^2.0 | 
dc_title_unicode:cal·lígraf^3.0 | dc_title_unicode_full:cal·lígraf^3.0))
*cal·lígra?:*+DisjunctionMaxQuery((dc_title:*cal·lígra?*^2.0 | 
dc_title_unicode:cal·lígra?^3.0 | dc_title_unicode_full:cal·lígra?^3.0))


Should the second be *calligra?* insted?*

*Environment:
Tomcat 7.0.25 (request encoding UTF-8)
Solr 3.5.0
Java 7 Oracle
Ubuntu 11.10

--
Regards!
Dalius Sidlauskas



Re: Wildcard ? issue?

2012-02-08 Thread Dalius Sidlauskas
If you can not read this mail easily check this ticket: 
https://issues.apache.org/jira/browse/SOLR-3106 This is a copy.


Regards!
Dalius Sidlauskas


On 08/02/12 15:44, Dalius Sidlauskas wrote:

Sorry for inaccurate title.

I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) 
containing same value:


title xmlns=http://www.tei-c.org/ns/1.0;cal.lígraf/title

and these fields are configured accordingly:

fieldType name=xml  class=solr.TextField  
positionIncrementGap=100

analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/
/analyzer
/fieldType

fieldType name=xml_unicode  class=solr.TextField  
positionIncrementGap=100

analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType

fieldType name=xml_unicode_full  class=solr.TextField  
positionIncrementGap=100

analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType

And finally my search configuration:

requestHandler name=dictionary  class=solr.SearchHandler
lst name=defaults
str name=echoParamsall/str
str name=defTypeedismax/str
str name=mm2lt;-25%/str
str name=qfdc_title_unicode_full^2 dc_title_unicode^2 dc_title/str
int  name=rows10/int
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count1/str
/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler

I am trying to match the field with various search phrases (that are 
valid). There are results:



# search phrase match? Comment
1 cal.lígra? yes
2 cal.ligra? no Changed í to i
3 cal.ligraf yes
4 calligra? no


The problem is the #2 attempt to match a data. The #3 works replacing 
? with f.


One more thing. If * is used insted of ? other data is matched as 
cal.lígrafia but not cal.lígraf...


Also I have spotted some logic missmatch in debug parsedQuery field:
*
cal·lígraf:* +DisjunctionMaxQuery((dc_title:*calligraf*^2.0 | 
dc_title_unicode:cal·lígraf^3.0 | dc_title_unicode_full:cal·lígraf^3.0))
*cal·lígra?:*+DisjunctionMaxQuery((dc_title:*cal·lígra?*^2.0 | 
dc_title_unicode:cal·lígra?^3.0 | dc_title_unicode_full:cal·lígra?^3.0))


Should the second be *calligra?* insted?*

*Environment:
Tomcat 7.0.25 (request encoding UTF-8)
Solr 3.5.0
Java 7 Oracle
Ubuntu 11.10



Re: Wildcard ? issue?

2012-02-08 Thread Dalius Sidlauskas
I have already tried this and it did not helped because it does not 
highlight matches if wild-card is used. The field configuration turns 
data to:


dc_title: calligraf
dc_title_unicode: cal·lígraf
dc_title_unicode_full: cal·lígraf

Debug parsedquery says:

[Search for *cal·ligraf*]

+DisjunctionMaxQuery((dc_title:*calligraf* |  
dc_title_unicode:cal·ligraf^2.0 | dc_title_unicode_full:cal·ligraf^2.0))


[Search for *cal·ligra?*]

+DisjunctionMaxQuery((dc_title:*cal·ligra?* | 
dc_title_unicode:cal·ligra?^2.0 | dc_title_unicode_full:cal·ligra?^2.0))


Why the *dc_title* field is handled differently? The analysis looks fine:


 Index Analyzer


   org.apache.solr.analysis.HTMLStripCharFilterFactory
   {luceneMatchVersion=LUCENE_34}

textcal·lígraf


   org.apache.solr.analysis.PatternReplaceCharFilterFactory
   {replacement=, pattern=-, maxBlockChars=1,
   luceneMatchVersion=LUCENE_34, blockDelimiters=}

textcal·lígraf


   org.apache.solr.analysis.WhitespaceTokenizerFactory
   {luceneMatchVersion=LUCENE_34}

position1
term text   cal·lígraf
startOffset 43
endOffset   53


   org.apache.solr.analysis.ICUFoldingFilterFactory
   {luceneMatchVersion=LUCENE_34}

position1
term text   calligraf
startOffset 43
endOffset   53


 Query Analyzer


   org.apache.solr.analysis.WhitespaceTokenizerFactory
   {luceneMatchVersion=LUCENE_34}

position1
term text   cal·ligra?
startOffset 0
endOffset   10


   org.apache.solr.analysis.ICUFoldingFilterFactory
   {luceneMatchVersion=LUCENE_34}

position1
term text   calligra?
startOffset 0
endOffset   10


Is this a Solr or Lucene bug?

Regards!
Dalius Sidlauskas


On 08/02/12 16:03, Sethi, Parampreet wrote:

Hi Dalius,

If not already tried, Check http://localhost:8983/solr/admin/analysis.jsp
(enable verbose output for both Field Value index and query for details)
for your queries and see what all filters/tokenizers are being applied.

Hope it helps!

-param

On 2/8/12 10:48 AM, Dalius Sidlauskasdalius.sidlaus...@semantico.com
wrote:


If you can not read this mail easily check this ticket:
https://issues.apache.org/jira/browse/SOLR-3106 This is a copy.

Regards!
Dalius Sidlauskas


On 08/02/12 15:44, Dalius Sidlauskas wrote:

Sorry for inaccurate title.

I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full)
containing same value:

title xmlns=http://www.tei-c.org/ns/1.0;cal.lígraf/title

and these fields are configured accordingly:

fieldType name=xml  class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.ICUFoldingFilterFactory/
/analyzer
/fieldType

fieldType name=xml_unicode  class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType

fieldType name=xml_unicode_full  class=solr.TextField
positionIncrementGap=100
analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
/analyzer
/fieldType

And finally my search configuration:

requestHandler name=dictionary  class=solr.SearchHandler
lst name=defaults
str name=echoParamsall/str
str name=defTypeedismax/str
str name=mm2lt;-25%/str
str name=qfdc_title_unicode_full^2 dc_title_unicode^2 dc_title/str
int  name=rows10/int
str name=spellcheck.onlyMorePopulartrue/str
str name=spellcheck.extendedResultsfalse/str
str name=spellcheck.count1/str
/lst
arr name=last-components
strspellcheck/str
/arr
/requestHandler

I am trying to match the field with various search phrases (that are
valid). There are results:


# search phrase match? Comment
1 cal.lígra? yes
2 cal.ligra? no Changed í to i
3 cal.ligraf yes
4 calligra? no


The problem is the #2 attempt to match a data. The #3 works replacing
? with f.

One more thing. If * is used insted of ? other data is matched as
cal.lígrafia but not cal.lígraf...

Also I have spotted some logic missmatch in debug parsedQuery field:
*
cal·lígraf:* +DisjunctionMaxQuery((dc_title:*calligraf*^2.0 |
dc_title_unicode:cal·lígraf^3.0 | dc_title_unicode_full:cal·lígraf^3.0))
*cal·lígra?:*+DisjunctionMaxQuery((dc_title:*cal·lígra?*^2.0 |
dc_title_unicode:cal·lígra?^3.0 | dc_title_unicode_full:cal·lígra?^3.0))

Should the second be *calligra?* insted?*

*Environment:
Tomcat 7.0.25 (request encoding UTF-8)
Solr 3.5.0
Java 7 Oracle
Ubuntu 11.10