Handling space variations in queries - matching 'thunderbolt' for query 'thunder bolt'

2011-07-30 Thread Prasanna R
We use a dismax handler with mm 1 in our Solr installation. I have a
fieldType defined that creates shingles to handle space variations in the
input strings and user queries. This fieldType can successfully handle cases
where the query is 'thunderbolt' and the document contains the string
'thunder bolt' (the shingle results in the token 'thunderbolt' created
during indexing).  However, due to the pre-analysis whitespace tokenization
done by lucene query parser, the reverse is not handled well - document with
string 'thunderbolt' being matched to query 'thunder bolt'.

I find that in our dismax handler the shingle field records a match and
scores on the 'pf' but the document is not returned as none of the fields in
'qf' record a match (mm is 1). I am looking for suggestions on how to handle
this scenario. Using a synonym will obviously work but it seems a rather
hackish solution. Is there a more elegant way of achieving a similar effect?


Alternatively, is there a way to get the 'mm' parameter to factor in matches
on 'pf' also?

Kindly help.

Regards,

Prasanna


Re: slow highlighting because of stemming

2011-07-30 Thread Orosz György
Hi,

Thanks for the answer!
I am doing some logging about stemming, and what I can see is that a lot of
tokens are stemmed for the highlighting. It is the strange part, since I
don't understand why does any highlighter need stemming again.
Anyway my docments are not really large, just a few kilobytes, but thanks
for this suggestion.

If you could help me in how could I just ignore the stemming for
highlighting thing it would be very great!

Thanks,
Gyuri

2011/7/29 Mike Sokolov soko...@ifactory.com

 I'm not sure I would identify stemming as the culprit here.

 Do you have very large documents?  If so, there is a patch for FVH
 committed to limit the number of phrases it looks at; see hl.phraseLimit,
 but this won't be available until 3.4 is released.


 You can also limit the amount of each document that is analyzed by the
 regular Highlighter using maxDocCharsToAnalyze (and maybe this applies to
 FVH? not sure)

 Using RegexFragmenter is also probably slower than something like
 SimpleFragmenter.

 There is work to implement faster highlighting for Solr/Lucene, but it
 depends on some basic changes to the search architecture so it might be a
 while before that becomes available.  See https://issues.apache.org/**
 jira/browse/LUCENE-3318https://issues.apache.org/jira/browse/LUCENE-3318if 
 you're interested in following that development.

 -Mike


 On 07/29/2011 04:55 AM, Orosz György wrote:

 Dear all,

 I am quite new about using Solr, but would like to ask your help.
 I am developing an application which should be able to highlight the
 results
 of a query. For this I am using regex fragmenter:
 highlighting
fragmenter name=regex
 class=org.apache.solr.**highlight.RegexFragmenter
 lst name=defaults
   int name=hl.fragsize500/int
   float name=hl.regex.slop0.5/**float
   str name=hl.pre![CDATA[b]]**/str
  str name=hl.post![CDATA[/b]]**/str
  str name=hl.**useFastVectorHighlighter**true/str
   str name=hl.regex.pattern[-\w ,/\n\']{20,300}[.?!]/str
   str name=hl.fldokumentum_syn_**query/str
 /lst
/fragmenter
   /highlighting
 The field is indexed with term vectors and offsets:
 field name=dokumentum_syn_query type=huntext_syn indexed=true
 stored=true multiValued=true termVectors=on termPositions=on
  termOffsets=on/
 fieldType name=huntext_syn class=solr.TextField stored=true
 indexed=true positionIncrementGap=100
   analyzer type=index
 tokenizer
 class=com.morphologic.solr.**huntoken.HunTokenizerFactory/**
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_query.txt enablePositionIncrements=**true /
  filter class=com.morphologic.solr.**hunstem.**HumorStemFilterFactory
  lex=/home/oroszgy/workspace/**morpho/solrplugins/data/lex
  cache=alma/
 filter class=solr.**LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.**StandardTokenizerFactory/
  filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords_query.txt enablePositionIncrements=**true /
  filter class=com.morphologic.solr.**hunstem.**HumorStemFilterFactory
  lex=/home/oroszgy/workspace/**morpho/solrplugins/data/lex
  cache=alma/
 filter class=solr.**SynonymFilterFactory
 synonyms=synonyms_query.txt ignoreCase=true expand=true/
 filter class=solr.**LowerCaseFilterFactory/
   /analyzer
 /fieldType

 The highlighting works well, excepts that its really slow. I realized that
 this is because the highlighter/fragmenter does stemming for all the
 results
 documents again.

 Could you please help me why does it happen an how should I avoid this? (I
 thought that using fastvectorhighlighter will solve my problem, but it
 didn't)

 Thanks in advance!
 Gyuri Orosz






fragsize for highlighting

2011-07-30 Thread Frank Chiu
Hi, I'm setting hl.fragsize = 10 in all my highlighting fragmenters but I'm
still getting snippets being returned with  10 characters (I think I'm
getting the full text back).  I also tried specifying hl.fragsize in the
querystring, but the same thing happens.  Any idea why fragsize is not
getting picked up?
Thanks!


Re: slow highlighting because of stemming

2011-07-30 Thread Ahmet Arslan
 I am doing some logging about stemming, and what I can see
 is that a lot of
 tokens are stemmed for the highlighting. It is the strange
 part, since I
 don't understand why does any highlighter need stemming
 again.

Highlighting do re-analyze the text being highlighted.

 Anyway my docments are not really large, just a few
 kilobytes, but thanks
 for this suggestion.
 
 If you could help me in how could I just ignore the
 stemming for
 highlighting thing it would be very great!

If you store term vectors, the this re-analyze is skipped.
http://wiki.apache.org/solr/FieldOptionsByUseCase


Re: fragsize for highlighting

2011-07-30 Thread Ahmet Arslan
 Hi, I'm setting hl.fragsize = 10 in
 all my highlighting fragmenters but I'm
 still getting snippets being returned with  10
 characters (I think I'm
 getting the full text back).  I also tried specifying
 hl.fragsize in the
 querystring, but the same thing happens.  Any idea why
 fragsize is not
 getting picked up?

May be you are setting it twice? What is the output of echoParams=all?


Re: Autocomplete with Solr 3.1

2011-07-30 Thread O. Klein
According to
http://www.lucidimagination.com/blog/2011/04/08/solr-powered-isfdb-part-9/
it should be possible to set spellcheck.maxCollations to 5.

This doesn't work for me in 4.0, nor does it work with the regular
spellchecker, unless I set spellcheck.maxCollationTries to a value like 10.

Then I get a list of collations.

However adding these parameters to the suggester doesn't do anything.

Is this common behavior? Or is my Solr borked?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocomplete-with-Solr-3-1-tp3202214p3211775.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow highlighting because of stemming

2011-07-30 Thread Michael Sokolov

On 7/30/2011 3:46 AM, Orosz György wrote:

Hi,

Thanks for the answer!
I am doing some logging about stemming, and what I can see is that a lot of
tokens are stemmed for the highlighting. It is the strange part, since I
don't understand why does any highlighter need stemming again.
Consider that the highlighter needs to match terms from the query with 
terms from the document, just like search. If the indexed document has 
been stemmed, then the query also needs to be stemmed, or you won't see 
matches.


-Mike


Re: fragsize for highlighting

2011-07-30 Thread Frank Chiu
I'm a bit of a newbie- adding echoParams=all to my querystring isn't
yielding additional info (does solr 1.4 support it?).  Here's a query (also
tried adding hl.fragsize=10):

http://localhost:8982/solr/select/?fl=*+scorestart=0q=gofishqf=description_textshl.simple.pre=@@@hl@@@hl.simple.post=@@@endhl@@@fq=type:(Task)hl=ondefType=dismaxrows=30echoParams=all

response
lst name=responseHeader
int name=status0/int
int name=QTime3/int
lst name=params
str name=hl.fragsize10/str
str name=fl* score/str
str name=start0/str
str name=qimmanu/str
str name=qfdescription_texts/str
str name=hl.simple.pre@@@hl@@@/str
str name=hl.simple.post@@@endhl@@@/str
str name=fqtype:(Task)/str
str name=hlon/str
str name=defTypedismax/str
str name=rows30/str
/lst
/lst

lst name=highlighting
...
str
@@@hl@@@some s@@@endhl@@@uper long piece of text. long interesting stuff and
text gofish found
/str
/arr
...
/response





On Sat, Jul 30, 2011 at 2:58 AM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi, I'm setting hl.fragsize = 10 in
  all my highlighting fragmenters but I'm
  still getting snippets being returned with  10
  characters (I think I'm
  getting the full text back).  I also tried specifying
  hl.fragsize in the
  querystring, but the same thing happens.  Any idea why
  fragsize is not
  getting picked up?

 May be you are setting it twice? What is the output of echoParams=all?



Re: Solr Incremental Indexing

2011-07-30 Thread Alexei Martchenko
I always have a field in my databases called datelastmodified, so whenever I
update that record, i set it to getdate() - mssql func - and then get all
latest records order by that field.

2011/7/29 Mohammed Lateef Hussain mohammedlateefh...@gmail.com

 Hi

 Need some help in Solr incremental indexing approch.

 I have built my Solr index using SolrJ API and now want to update the index
 whenever any changes has been made in
 database. My requirement is not to use DB triggers to call any update
 events.

 I want to update my index on the fly whenever my application updates any
 record in database.

 Note: My indexing logic to get the required data from DB is some what
 complex and involves many tables.

 Please suggest me how can I proceed here.

 Thanks
 Lateef




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: fragsize for highlighting

2011-07-30 Thread Ahmet Arslan

I suspected that you set fragsize twice, but from what you paste thats not the 
case.  e.g. f.description_texts.hl.fragsize=100hl.fragsize=10

However the response you pasted is not coming from that URL. It will be better 
to see compatible URL and response.

echoParams=all displays all parameters used. Both defaults defined in 
solrconfig.xml and the ones in URL.

http://wiki.apache.org/solr/CoreQueryParameters#echoParams


--- On Sat, 7/30/11, Frank Chiu frank.c...@gmail.com wrote:

 From: Frank Chiu frank.c...@gmail.com
 Subject: Re: fragsize for highlighting
 To: Ahmet Arslan iori...@yahoo.com
 Cc: solr-user@lucene.apache.org
 Date: Saturday, July 30, 2011, 9:35 PM
 I'm a bit of a newbie- adding
 echoParams=all to my querystring isn't
 yielding additional info (does solr 1.4 support it?). 
 Here's a query (also
 tried adding hl.fragsize=10):
 
 http://localhost:8982/solr/select/?fl=*+scorestart=0q=gofishqf=description_textshl.simple.pre=@@@hl@@@hl.simple.post=@@@endhl@@@fq=type:(Task)hl=ondefType=dismaxrows=30echoParams=all
 
 response
 lst name=responseHeader
 int name=status0/int
 int name=QTime3/int
 lst name=params
 str name=hl.fragsize10/str
 str name=fl* score/str
 str name=start0/str
 str name=qimmanu/str
 str name=qfdescription_texts/str
 str name=hl.simple.pre@@@hl@@@/str
 str name=hl.simple.post@@@endhl@@@/str
 str name=fqtype:(Task)/str
 str name=hlon/str
 str name=defTypedismax/str
 str name=rows30/str
 /lst
 /lst
 
 lst name=highlighting
 ...
 str
 @@@hl@@@some s@@@endhl@@@uper long piece of text. long
 interesting stuff and
 text gofish found
 /str
 /arr
 ...
 /response
 
 
 
 
 
 On Sat, Jul 30, 2011 at 2:58 AM, Ahmet Arslan iori...@yahoo.com
 wrote:
 
   Hi, I'm setting hl.fragsize = 10 in
   all my highlighting fragmenters but I'm
   still getting snippets being returned with 
 10
   characters (I think I'm
   getting the full text back).  I also tried
 specifying
   hl.fragsize in the
   querystring, but the same thing happens. 
 Any idea why
   fragsize is not
   getting picked up?
 
  May be you are setting it twice? What is the output of
 echoParams=all?
 



Re: fragsize for highlighting

2011-07-30 Thread Frank Chiu
I ended up removing the EdgeNGramFilterFactory and the highlighting seems to
work okay.  Thanks for your help, echoParams is useful.

On Sat, Jul 30, 2011 at 2:07 PM, Ahmet Arslan iori...@yahoo.com wrote:


 I suspected that you set fragsize twice, but from what you paste thats not
 the case.  e.g. f.description_texts.hl.fragsize=100hl.fragsize=10

 However the response you pasted is not coming from that URL. It will be
 better to see compatible URL and response.

 echoParams=all displays all parameters used. Both defaults defined in
 solrconfig.xml and the ones in URL.

 http://wiki.apache.org/solr/CoreQueryParameters#echoParams


 --- On Sat, 7/30/11, Frank Chiu frank.c...@gmail.com wrote:

  From: Frank Chiu frank.c...@gmail.com
  Subject: Re: fragsize for highlighting
  To: Ahmet Arslan iori...@yahoo.com
  Cc: solr-user@lucene.apache.org
  Date: Saturday, July 30, 2011, 9:35 PM
  I'm a bit of a newbie- adding
  echoParams=all to my querystring isn't
  yielding additional info (does solr 1.4 support it?).
  Here's a query (also
  tried adding hl.fragsize=10):
 
 
 http://localhost:8982/solr/select/?fl=*+scorestart=0q=gofishqf=description_textshl.simple.pre=@@@hl@@@hl.simple.post=@@@endhl@@@fq=type:(Task)hl=ondefType=dismaxrows=30echoParams=allhttp://localhost:8982/solr/select/?fl=*+scorestart=0q=gofishqf=description_textshl.simple.pre=@@@hl@@@hl.simple.post=@@@endhl@@@fq=type:%28Task%29hl=ondefType=dismaxrows=30echoParams=all
 
  response
  lst name=responseHeader
  int name=status0/int
  int name=QTime3/int
  lst name=params
  str name=hl.fragsize10/str
  str name=fl* score/str
  str name=start0/str
  str name=qimmanu/str
  str name=qfdescription_texts/str
  str name=hl.simple.pre@@@hl@@@/str
  str name=hl.simple.post@@@endhl@@@/str
  str name=fqtype:(Task)/str
  str name=hlon/str
  str name=defTypedismax/str
  str name=rows30/str
  /lst
  /lst
 
  lst name=highlighting
  ...
  str
  @@@hl@@@some s@@@endhl@@@uper long piece of text. long
  interesting stuff and
  text gofish found
  /str
  /arr
  ...
  /response
 
 
 
 
 
  On Sat, Jul 30, 2011 at 2:58 AM, Ahmet Arslan iori...@yahoo.com
  wrote:
 
Hi, I'm setting hl.fragsize = 10 in
all my highlighting fragmenters but I'm
still getting snippets being returned with 
  10
characters (I think I'm
getting the full text back).  I also tried
  specifying
hl.fragsize in the
querystring, but the same thing happens.
  Any idea why
fragsize is not
getting picked up?
  
   May be you are setting it twice? What is the output of
  echoParams=all?
  
 



Solr request filter and indexing process

2011-07-30 Thread 于浩
Hello,Dear friends,
 I have got an problem in developing with solr.
 In My Application ,It must sends multiple query to solr server after the
page is loaded. Then I found a problem: some request will return
statusCode:0 and QTime:0, The solr has accepted the request, but It does not
return a result document.  If I send each request  one by one manually ,It
will return the result. But If I send the request frequently in a very
 short times, It will return nothing only statusCode:0 and QTime:0.
I think this may be a stratege for solr. but i can't find any documents or
discussions on the internet.
so i want you can help me.   edited on 2011-07-28

and now I have a new problem, I am developing on php, so I connect solr
through solrPhpClient( an opensource project on google code). I find the
speed of add many documents is very slow. when I add ten documents to an
solr index, It must takes more than 5 minutes(Because of the commit process
)
anybody can help me?