Re: Is there a way to retrieve the a term's position/offset in Solr

2017-04-07 Thread forest_soup
Thanks Rick. Unfortunately we have no that converter, so we have to count characters in the rich text. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-retrieve-the-a-term-s-position-offset-in-Solr-tp4326931p4328859.html Sent from the Solr - User mailing

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-30 Thread Bjarke Buur Mortensen
OK, that complicates things a bit. I would still try to go for a solution where you store the rich text in Solr, but make sure you tokenize it correctly. If the format is relatively simple, you could use either a regexp pattern tokenizer

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-30 Thread Rick Leir
Hi forest Do you have a html to richtext converter? You could use it on the highlighter's output. Otherwise you could count characters in the html. That might only be useful if your richtext font is fixed width. Cheers -- Rick On March 30, 2017 4:39:39 AM EDT, forest_soup

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-30 Thread forest_soup
Unfortunately the rich text is not an html/xml/doc/pdf or any other popular rich text format. And we would like to show the highlighted text in the doc's own specific viewer. That's why I'm eagerly want the offset. The /tvrh(term vector component) and tv.offsets/tv.positions can give us such

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-30 Thread Bjarke Buur Mortensen
OK, so the next thing to do would be to index and store the rich text ... is it HTML? Because then you can use HTMLStripCharFilterFactory in your analyzer, and still get the correct highlight back with hl.fragsize=0. I would think that you will have a hard time using the term positions, if what

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-28 Thread forest_soup
Thanks All! Actually we are going to show the highlighted words in a rich text format instead of the plain text which was indexed. So the hl.fragsize=0 seems not work for me.. And for the patch(SOLR-4722), haven't tried it. Hope it can return the position/offset info. Thanks! -- View this

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-28 Thread simon
You might want to take a look at the patch in https://issues.apache.org/jira/browse/SOLR-4722 - 'Highlighter which generates a list of query term position(s) for each item in a list of documents, or returns null if highlighting is disabled.' I've used it for retrieving the term positions with no

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-28 Thread Bjarke Buur Mortensen
Well, you can get Solr to highlight the entire field if that's what you are after by setting: hl.fragsize=0 From https://cwiki.apache.org/confluence/display/solr/Highlighting#Highlighting-Usage : Specifies the approximate size, in characters, of fragments to consider for highlighting. *0*

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-28 Thread forest_soup
Thanks Eric. Actually solr highlighting function does not meet my requirement. My requirement is not showing the highlighted words in snippets, but show them in the whole opening document. So I would like to get the term's position/offset info from solr. I went through the highlight feature, but

Re: Is there a way to retrieve the a term's position/offset in Solr

2017-03-27 Thread Emir Arnautovic
It seems to me that you are looking for Solr's highlighting functionality: https://cwiki.apache.org/confluence/display/solr/Highlighting HTH, Emir On 27.03.2017 09:09, forest_soup wrote: We are going to implement a feature: When opening a document whose body field is already indexed in Solr,