RE: Highlighting PDF file after the search

2004-09-27 Thread Ben Litchfield

With some work this is possible with PDFBox.  PDFBox extracts text with
positioning and sizing.  When the text was found you could add to the page
content stream the drawing of a highlighted box.

PDFBox has an open RFE for this functionality, please monitor it for
progress.

http://sourceforge.net/tracker/index.php?func=detailaid=1035635group_id=78314atid=552835

Ben

On Mon, 27 Sep 2004 [EMAIL PROTECTED] wrote:

 Bruce,
 You are right, i tried this morning and when i try to stream the
 higlighter output as pdf, acrobat was not able to read or open it!!
 Which project do you recommend that would do pdf highlighting?

 Thanks,
 Vijay Balasubramanian
 DPRA Inc.,




   Bruce Ritchie
   [EMAIL PROTECTED]To:   Lucene Users List [EMAIL 
 PROTECTED]
   re.com  cc:
Subject:  RE: Highlighting PDF file 
 after the search
   09/20/2004 05:35
   PM
   Please respond to
   Lucene Users List






  From: [EMAIL PROTECTED]

  I can successfully index and search the PDF documents,
  however i am not able to highlight the searched text in my
  original PDF file (ie: like dtSearch highlights on original file)
 
  I took a look at the highlighter in sandbox, compiled it and
  have it ready.  I am wondering if this highlighter is for
  highlighting indexed documents or can it be used for PDF
  Files as is !  Please enlighten !

 The highlighter code in sandbox can facilitate highlighting of text
 *extracted* from the PDF, however it does nothing for you to highlight
 search terms *inside* of the PDF. For that you will need some sort of
 tool
 that can modify the PDF on the fly as the user views it. I know of no
 quick
 and dirty tool that allows you to do this, though there is quite a few
 projects and products which allow you to manipulate PDF files which
 likely
 can be used to obtain the behavior you are looking for (with some effort
 on
 your part).


 Regards,

 Bruce Ritchie




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



WordListLoader's whereabouts

2004-09-27 Thread Tate Avery
Hello,

I am trying to compile the analyzers from the Lucene sandbox contributions.  Many of 
them seem to import org.apache.lucene.analysis.WordlistLoader which is not currently 
in my classpath.

Does anyone know where I can find this class?  It does not appear to be in Lucene 1.4, 
so I am assuming it is another contribution perhaps?  Any help in tracking it down 
would be appreciated.

Also, some of the analyzers appear to have their own copy of this class (i.e. 
org.apache.lucene.analysis.nl.WordlistLoader).  Could I just relocate that one to the 
shared package, perhaps?

Thanks,
Tate

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: WordListLoader's whereabouts

2004-09-27 Thread Stephane James Vaucher
Hi Tate,

From the commit:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg06510.html

I'd say you can use the german WordListLoader (renaming it or using a
nightly cvs version of the refactored class). I think there might be a
versionning issue here as from:

http://wiki.apache.org/jakarta-lucene/Lucene2Whiteboard

It is mentionned that:
DONE: Move language-specific analyzers into separate downloads. Also move
analysis/de/WordlistLoader.java one level upwards, as it's not specific to
German at all.

That should be only applicable for lucene 1.9... Last version comment for
BrazilianAnalyzer:

move the word list loader from analysis.de to analysis, as it is not
specific to German at all; update the references to it

HTH,
sv

On Mon, 27 Sep 2004, Tate Avery wrote:

 Hello,

 I am trying to compile the analyzers from the Lucene sandbox
 contributions.  Many of them seem to import
 org.apache.lucene.analysis.WordlistLoader which is not currently in my
 classpath.

 Does anyone know where I can find this class?  It does not appear to be in Lucene 
 1.4, so I am assuming it is another contribution perhaps?  Any help in tracking it 
 down would be appreciated.

 Also, some of the analyzers appear to have their own copy of this class
 (i.e. org.apache.lucene.analysis.nl.WordlistLoader).  Could I just
 relocate that one to the shared package, perhaps?

 Thanks,
 Tate

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Sorting Info

2004-09-27 Thread yahootintin . 1247688
I'm interested in doing sorting in Lucene.  Is there a FAQ or an article that
will show me how to do this?  I already have my indexing and searching working.



Thanks!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Shouldnt IndexWriter.flushRamSegments() be public? or at least protected?

2004-09-27 Thread Christian Rodriguez
Hello,

I am trying to use transactions with the Lucene + BDB package. I want
to be able to open a directory, and IndexWriter and then do things
like:

open IndexWriter
start transaction 1
write something to the index
commit transaction 1 (or abort it)
start transaction 2
write something else to the index
commit transaction 2
etc...
close IndexWriter 
and everything else that needs to be closed

Now the problem I have is that I dont have a way to force a flush of
the IndexWriter without closing it and I need to do that before
commiting a transaction or I would get random errors. Shouldnt that
function be public, in case the user wants to force a flush at some
point that is not when the IndexWriter is closed? If not I am forced
to create a new IndexWriter and close it EVERY TIME I commit a
transaction (which in my application is very often).

I thought about creating a subclass of IndexWriter (something like
DbIndexWriter) that implements a flush function but everything I need
from IndexWriter is private (not even protected!) so I cant do this.

Any pointers or solutions to this problem? (Of course I would prefer
not to touch Lucene's code and make flushRamSegments() public myself,
since I dont want to break my code every time I update Lucene,
although I dont see why the user shouldnt be allow to flush segments
to the directory if they decide to... if it ruins the performance,
thats their call).

Thanks!
Xtian

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]