Re: which way to index pdf,word,excel

2006-09-06 Thread Christiaan Fluit
Have a look at Aperture: http://aperture.sourceforge.net/ It provides components for crawling and text and metadata extraction. It's still in alpha stage though. The development code in CVS has already improved a lot over the last official alpha release. Chris -- James liu wrote: i wanna fin

Re: which way to index pdf,word,excel

2006-09-05 Thread James liu
thk,,,Cohen and lin. 2006/9/6, Doron Cohen <[EMAIL PROTECTED]>: I think that Nutch would crawl and search all these 3 types. Not sure that Nutch would provide the framework you seem to look for, but perhaps it is worth to take a look - http://lucene.apache.org/nutch/ "James liu" <[EMAIL PROT

Re: which way to index pdf,word,excel

2006-09-05 Thread Doron Cohen
I think that Nutch would crawl and search all these 3 types. Not sure that Nutch would provide the framework you seem to look for, but perhaps it is worth to take a look - http://lucene.apache.org/nutch/ "James liu" <[EMAIL PROTECTED]> wrote on 05/09/2006 23:10:16: > i wanna find frame which can

Re: which way to index pdf,word,excel

2006-09-05 Thread James liu
i wanna find frame which can index xml,word,excel,pdf,,,not one. i just wanna know who know the frame like what i wanna. 2006/9/6, yueyu lin <[EMAIL PROTECTED]>: First, Lucene is just a index toolkit, you have to USE it to implement your application. If you want to index something, you must

Re: which way to index pdf,word,excel

2006-09-05 Thread yueyu lin
First, Lucene is just a index toolkit, you have to USE it to implement your application. If you want to index something, you must have knowledge how to extract information from them and what kind of keys they need to be set. Then you can do what you want to. On 9/5/06, James liu <[EMAIL PROTECTE

Re: which way to index pdf,word,excel

2006-09-05 Thread James liu
i wanna find frame which can index xml,word,excel,pdf,,,not one. 2006/9/6, Doron Cohen <[EMAIL PROTECTED]>: Lucene FAQ - http://wiki.apache.org/jakarta-lucene/LuceneFAQ - has a few entries just for this: How can I index HTML documents? How can I index XML documents? How can I index Open

Re: which way to index pdf,word,excel

2006-09-05 Thread Doron Cohen
Lucene FAQ - http://wiki.apache.org/jakarta-lucene/LuceneFAQ - has a few entries just for this: How can I index HTML documents? How can I index XML documents? How can I index OpenOffice.org files? How can I index MS-Word documents? How can I index MS-Excel documents? How can I index MS

which way to index pdf,word,excel

2006-09-05 Thread James liu
i find lius many question so i wanna give up and find new. who recommend ?