integrationofLucene and PDF box

2004-08-24 Thread Santosh
any body integrated lucene with pdfbox? can we do it by changing the code in the IndexFiles.java or IndexHTML.java regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being

Re: integration of lucene with pdfbox

2004-08-24 Thread Santosh
I dont know how to add lucene document to index, i know how to add given directory. any body please tell me how to add lucene document to index - Original Message - From: Ben Litchfield [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 8:13 PM Subject:

RE: integrationofLucene and PDF box

2004-08-24 Thread Karthik N S
Hi santosh many people has worked in this arena... U look at the forms one by one and u may come across some example code to do similarly... Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 11:40 AM To: Lucene Users List

Re: integration of lucene with pdfbox

2004-08-24 Thread Bernhard Messer
Santosh, please have a look to the lucene demo package. There are several samples (IndexFiles.java) showing how to add a document to a writer. regards Bernhard Santosh wrote: I dont know how to add lucene document to index, i know how to add given directory. any body please tell me how to add

term frequency data of terms of all documents

2004-08-24 Thread Serkan Oktar
I want to build a list of terms of all documents and their frequency data. It seems the information I need is in tis and tii files. However I havent't found a way to handle them till now. How can I get the term frequency data? Thanks , Serkan

Re: Lucene for Indian Languages

2004-08-24 Thread srinivasa raghavan
Hi Satish, The morphological Analyzers for Hindi, Marathi, Telugu and Kannada are available. Please visit http://ltrc.iiit.net/showfile.php?filename=onlineServices/morph/index.htm I think you need not develop it from the scratch. I hope this will solve your problem for marathi to some

worddoucments search

2004-08-24 Thread Santosh
Can lucene be able to search word documents? if so please give me information about it regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO

Re: worddoucments search

2004-08-24 Thread Don Vaillancourt
I could ber wrong, but I don't think that there is an indexer for word documents. There's a Python version of Lucene called Lupy with a Python indexer for all sorts of document types (http://www.methods.co.nz/docindexer/). Would anyone be willing to port those over. Although the MSWord

RE: worddoucments search

2004-08-24 Thread David Townsend
Is this a wind-up? -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: 24 August 2004 13:16 To: Lucene Users List Subject: worddoucments search Can lucene be able to search word documents? if so please give me information about it regards Santosh kumar

Re: worddoucments search

2004-08-24 Thread Chandan Tamrakar
please look at Apache POI project. http://jakarta.apache.org Words documents can be extracted using POI apis and later can be indexed. regards - Original Message - From: Santosh [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 6:00 PM Subject:

Re: worddoucments search

2004-08-24 Thread Don Vaillancourt
Lucene isn't a doll made my Hasbro. :-) David Townsend wrote: Is this a wind-up? -Original Message- From: Santosh [mailto:[EMAIL PROTECTED]] Sent: 24 August 2004 13:16 To: Lucene Users List Subject: worddoucments search Can lucene be able to search word documents? if so please

Textmining.org IS NOT POI (was Re: worddoucments search)

2004-08-24 Thread Ryan Ackley
Go to http://www.textmining.org for a platform independent library to extract text from Word documents. I wrote 99.99% of the Word component of POI and all of the textmining.org library. I have seen several discussions and web pages that point to textmining.org that say I simply wrap POI classes

Re: worddoucments search

2004-08-24 Thread Ryan Ackley
Otis, Why didn't you use the textmining.org library? You even asked me to fix a bug for the book , which I did. Also, the code would have been about three lines. -Ryan - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday,

Re: worddoucments search

2004-08-24 Thread Ryan Ackley
Code example for textmining.org library: FileInputStream in = new FileInputStream (test.doc); WordExtractor extractor = new WordExtractor(); String str = extractor.extractText(); - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent:

Searching MySql index using lucene

2004-08-24 Thread sivalingam T
  Hi, 1. MySql defaultly creates an index. if i want to search this index using lucene how i can search. 2. How to create index on databases using lucene. Give me suggestions if any body know. Thanks. With Warm Regards, Sivalingam.T Sai Eswar Innovations (P) Ltd, Chennai-92

Re: Custom filter

2004-08-24 Thread roy-lucene-user
On Fri, 20 Aug 2004 20:01:36 -0400, Erik Hatcher wrote On Aug 20, 2004, at 6:48 PM, [EMAIL PROTECTED] wrote: We're currently in lucene 1.2... haven't moved to 1.3 yet. Skip 1.3 and go straight to 1.4.1 :) Upgrade - why not? Well we have some MASSIVE indexes so updating needs to be

Re: term frequency data of terms of all documents

2004-08-24 Thread Bernhard Messer
Serkan, it's easier using the IndexReader class to get the information you need. If you just need the doc frequency of each term you could use the sample. IndexReader ir = null; try { if (!IndexReader.indexExists(tmp/index)) return; ir =

Sort Search Result

2004-08-24 Thread Natarajan.T
FYI, How can I get the search results in Ascending order... (Sort API) Thanks, Natarajan.

PDF indexing

2004-08-24 Thread sivalingam T
 Hi I have written one files for PDF Indexing. Here I have written as follows .. This is my IndexPDF file. import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.IndexReader; import

RE: Sort Search Result

2004-08-24 Thread Aviran
Look at SortField http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/SortField .html -Original Message- From: Natarajan.T [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 11:35 AM To: 'Lucene Users List' Subject: Sort Search Result FYI, How can I get the

RE: Searching MySql index using lucene

2004-08-24 Thread Aviran
Just read your data from the database and create a Lucene Index for the columns you want to search -Original Message- From: sivalingam T [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 9:52 AM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Subject: Searching MySql index using

Re: Lucene PDF indexing

2004-08-24 Thread Stephane James Vaucher
You need to add log4j to your classpath: http://logging.apache.org/log4j/docs/ sv On 24 Aug 2004, sivalingam T wrote: Hi I have written one files for PDF Indexing. Here I have written as follows ..   This is my IndexPDF file. import org.apache.lucene.analysis.standard.StandardAnalyzer;

Re: worddoucments search

2004-08-24 Thread Otis Gospodnetic
As I just answered in a separate email to Ryan - we used textmining.org library, too, as an example of something that is easier to use than POI. It's been a while since I wrote that chapter, so it slipped my mind when I replied. Yes, use textmining.org first, you'll be able to include it in your

How to implement KWIC (KeyWord In Context) display

2004-08-24 Thread yinjin
Hello all, Does anyone know how to implement KWIC display using Lucene? I'd like to display the result similar to google search. Thanks for any help, Ying

Re: How to implement KWIC (KeyWord In Context) display

2004-08-24 Thread Otis Gospodnetic
Hello Ying, Take a look at Lucene Highlighter in Lucene Sandbox: http://jakarta.apache.org/lucene/docs/lucene-sandbox/ Otis --- yinjin [EMAIL PROTECTED] wrote: Hello all, Does anyone know how to implement KWIC display using Lucene? I'd like to display the result similar to google search.