RE: Slide 2 and Search in Binary Files

Martin.Wallmer Tue, 30 Mar 2004 01:05:28 -0800

Hi,

the "correct" way would be as follows:

write an "extractor", that extracts text data out of binary files (doc, pdf, ...)
use an "indexer", for example Lucene to index this stuff
implement a "contains" query running on Lucene.

Currently Christophe Lombart and Daniel Florey is working on that, I was on it as 
well, but currently 
I have no chance to spend time for slide.

The current implementation for CONTAINS is quite slow, it really searches in all 
documents
without using an index, so a Lucene based CONTAINS is very useful.

Regards,
Martin Wallmer

> -----Original Message-----
> From: news [mailto:[EMAIL PROTECTED] Behalf Of Martin Holz
> Sent: Dienstag, 30. M�rz 2004 09:44
> To: [EMAIL PROTECTED]
> Subject: Re: Slide 2 and Search in Binary Files
> 
> 
> Muhammad Asif <[EMAIL PROTECTED]> writes:
> 
> > thanks martin ,
> > 
> > i haven't jump into DASL yet but can i summarize it as 
> currently slide
> > treat binary files search
> > 
> > as simple text.
> > 
> > This is because when i opened word file in notepad and extracted a
> > word from it and build a CONTAINS query it returns the 
> search results
> > successfully.
> 
> Sorry, I should really look into the DASL stuff. 
> 
>  
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Slide 2 and Search in Binary Files

Reply via email to