RE: Highlighting PDF file after the search
With some work this is possible with PDFBox. PDFBox extracts text with positioning and sizing. When the text was found you could add to the page content stream the drawing of a highlighted box. PDFBox has an open RFE for this functionality, please monitor it for progress. http://sourceforge.net/tracker/index.php?func=detailaid=1035635group_id=78314atid=552835 Ben On Mon, 27 Sep 2004 [EMAIL PROTECTED] wrote: Bruce, You are right, i tried this morning and when i try to stream the higlighter output as pdf, acrobat was not able to read or open it!! Which project do you recommend that would do pdf highlighting? Thanks, Vijay Balasubramanian DPRA Inc., Bruce Ritchie [EMAIL PROTECTED]To: Lucene Users List [EMAIL PROTECTED] re.com cc: Subject: RE: Highlighting PDF file after the search 09/20/2004 05:35 PM Please respond to Lucene Users List From: [EMAIL PROTECTED] I can successfully index and search the PDF documents, however i am not able to highlight the searched text in my original PDF file (ie: like dtSearch highlights on original file) I took a look at the highlighter in sandbox, compiled it and have it ready. I am wondering if this highlighter is for highlighting indexed documents or can it be used for PDF Files as is ! Please enlighten ! The highlighter code in sandbox can facilitate highlighting of text *extracted* from the PDF, however it does nothing for you to highlight search terms *inside* of the PDF. For that you will need some sort of tool that can modify the PDF on the fly as the user views it. I know of no quick and dirty tool that allows you to do this, though there is quite a few projects and products which allow you to manipulate PDF files which likely can be used to obtain the behavior you are looking for (with some effort on your part). Regards, Bruce Ritchie - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
WordListLoader's whereabouts
Hello, I am trying to compile the analyzers from the Lucene sandbox contributions. Many of them seem to import org.apache.lucene.analysis.WordlistLoader which is not currently in my classpath. Does anyone know where I can find this class? It does not appear to be in Lucene 1.4, so I am assuming it is another contribution perhaps? Any help in tracking it down would be appreciated. Also, some of the analyzers appear to have their own copy of this class (i.e. org.apache.lucene.analysis.nl.WordlistLoader). Could I just relocate that one to the shared package, perhaps? Thanks, Tate - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordListLoader's whereabouts
Hi Tate, From the commit: http://www.mail-archive.com/[EMAIL PROTECTED]/msg06510.html I'd say you can use the german WordListLoader (renaming it or using a nightly cvs version of the refactored class). I think there might be a versionning issue here as from: http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard It is mentionned that: DONE: Move language-specific analyzers into separate downloads. Also move analysis/de/WordlistLoader.java one level upwards, as it's not specific to German at all. That should be only applicable for lucene 1.9... Last version comment for BrazilianAnalyzer: move the word list loader from analysis.de to analysis, as it is not specific to German at all; update the references to it HTH, sv On Mon, 27 Sep 2004, Tate Avery wrote: Hello, I am trying to compile the analyzers from the Lucene sandbox contributions. Many of them seem to import org.apache.lucene.analysis.WordlistLoader which is not currently in my classpath. Does anyone know where I can find this class? It does not appear to be in Lucene 1.4, so I am assuming it is another contribution perhaps? Any help in tracking it down would be appreciated. Also, some of the analyzers appear to have their own copy of this class (i.e. org.apache.lucene.analysis.nl.WordlistLoader). Could I just relocate that one to the shared package, perhaps? Thanks, Tate - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Sorting Info
I'm interested in doing sorting in Lucene. Is there a FAQ or an article that will show me how to do this? I already have my indexing and searching working. Thanks! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Shouldnt IndexWriter.flushRamSegments() be public? or at least protected?
Hello, I am trying to use transactions with the Lucene + BDB package. I want to be able to open a directory, and IndexWriter and then do things like: open IndexWriter start transaction 1 write something to the index commit transaction 1 (or abort it) start transaction 2 write something else to the index commit transaction 2 etc... close IndexWriter and everything else that needs to be closed Now the problem I have is that I dont have a way to force a flush of the IndexWriter without closing it and I need to do that before commiting a transaction or I would get random errors. Shouldnt that function be public, in case the user wants to force a flush at some point that is not when the IndexWriter is closed? If not I am forced to create a new IndexWriter and close it EVERY TIME I commit a transaction (which in my application is very often). I thought about creating a subclass of IndexWriter (something like DbIndexWriter) that implements a flush function but everything I need from IndexWriter is private (not even protected!) so I cant do this. Any pointers or solutions to this problem? (Of course I would prefer not to touch Lucene's code and make flushRamSegments() public myself, since I dont want to break my code every time I update Lucene, although I dont see why the user shouldnt be allow to flush segments to the directory if they decide to... if it ruins the performance, thats their call). Thanks! Xtian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]