Martin,
I've been over some of the same thoughts you present here in the last
few years. The path of least resistance ended up being to deal with the
highlighting portion of OCRed images outside of Solr. That's not to say
it couldn't or shouldn't be done differently. I briefly even pursued a
similar course of action evident in
https://issues.apache.org/jira/browse/SOLR-386. This would make it
easier if you wanted to write your own highlighter.
I'm interested to see what others think of your suggestions. I've
forwarded this to the solr-user list.
Tricia
-------- Original Message --------
Subject: Highlighting Output
Date: Mon, 11 Aug 2008 17:21:55 -0400
From: Martin Owens <[EMAIL PROTECTED]>
To: Tricia Williams <[EMAIL PROTECTED]>,
[EMAIL PROTECTED]
Hello Solr Users,
I've been thinking about the highlighting functionality in Solr. I
recently had th good fortune to be helped by Tricia Williams with
payload issues relating to highlighting.
What I see though is that the highlighting functionality is heavily tied
to the fragment (highlight context) functionality. This actually makes
it interesting to write a plane highlight method that just returns meta
data (so some other process can do the actual highlighting in some
custom fashion).
So is it worth while to make sure that solr is able to do multiple
different kinds of highlighting, even if it means passing meta data back
in the request? Should we have standard ways to index and read back
payload information if we're dealing with pages, books, co-ordinates
(for highlighting images) and other meta data which is used for
highlights (chat offset, term offset eccettera). I also noticed much of
the highlighting code to do with fragments being duplicated in custom
code.
Other thoughts? does this make things more complex for normal
highlighting?
Best Regards, Martin Owens