PDFBox does the PDFs, Textmining.org is supposed to work for the doc/xls stuff,
but I don't know anything about the PPT
Eric Anderson
LanRx Network Solutions
815-505-6132
Quoting Daniel Hunziker <[EMAIL PROTECTED]>:
> Are there any parser for the following format
> - doc
> - xls
> - ppt
> - pdf
The most recent article about Lucene published on
http://www.onjava.com/ talks exactly about this type of stuff. It
should answer your questions from this email.
Otis
--- Vince Taluskie <[EMAIL PROTECTED]> wrote:
> Howdy All,
>
> I am interested in several things to improve the speed of my
> in
Are there any parser for the following format
- doc
- xls
- ppt
- pdf
Thanks for help
Daniel
Howdy All,
I am interested in several things to improve the speed of my indexing.
First would be to find out if it's possible (as well as how) to merge
lucene indexes of similarly structured (same number of and type of
fields) documents or coordinate several machines updating the same
index.
Hi There,
I wonder if anyone can help me. I am submitting a simple Boolean query to
the Lucene search engine and recieving a set of 'Hits' in return. I now want to
determine the frequency of the terms as the occur in the returned documents. How do I
do this??
Help greatly appreciate
I ran a little test where I did:
doc.add(new Field("name","value"));
doc.add(new Field("name","value"));
Then got a list of the field for that doc and sure enough it is in there
twice. So it appends whatever value to the field, even if the value already
exists.
Thanks,
Rob
-Original Mes
Rob Outar wrote:
What happens if I add the same name/value pair to a Lucene Document? Does
it override it? Does it append it so you have duplicates?
I believe it 'appends' in the sense that if you add 2 fields with the same
name then the Document has the union of the content of both fields
added
What happens if I add the same name/value pair to a Lucene Document? Does
it override it? Does it append it so you have duplicates?
Let me know,
Rob
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail
Tatu Saloranta writes:
> On Wednesday 19 March 2003 01:44, Morus Walter wrote:
>
> This might still be a feasible thing to do, except if number of collections
> changes very frequently (as you need to reindex all docs, not just
> incremental).
>
Well the number is slowly growing.
> Another po
Hi,
thanks for all your answers, I think I collect some of the hints and ideas
rather than commenting all of them apart.
Doug Cutting writes:
> Morus Walter wrote:
> > Searches must be able on any combination of collections.
> > A typical search includes ~ 40 collections.
> >
> > Now the questio
10 matches
Mail list logo