, is there anything we can or need to do to
optimize Lucene to handle lots of little Lucene documents?
Thanks,
Eliot
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED
hope
it performs adequately.
Cheers,
E.
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED]
w w w . d a t a c h a n n e l . c o m
.
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED]
w w w . d a t a c h a n n e l . c o m
).
Cheers,
Eliot
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED]
w w w . d a t a c h a n n e l . c o m
, is there a description of the algorithm ~ uses?
Thanks,
E.
--
. . . . . . . . . . . . . . . . . . . . . . . .
W. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED]
w w w . d a t a c h a n n e l . c o m
. Eliot Kimber | Lead Brain
1016 La Posada Dr. | Suite 240 | Austin TX 78752
T 512.656.4139 | F 512.419.1860 | [EMAIL PROTECTED]
w w w . d a t a c h a n n e l . c o m
I have put together a hopefully useful package that demonstrates our
current experiments with using Lucene for XML indexing. You can get the
files by anonymous ftp from che.isogen.com, /outgoing/lucene. There are
two zip files:
- lucene_xml_indexing.zip
This is the core indexing code and a
Ogren, Philip V. wrote:
We are indexing a large corpus of XML documents (~10M). One thing that
Verity does with XML notes is that it indexes each XML tag as a zone.*
What's cool about it is that the zones are nested so that it mirrors the
schema of your XML document. You can limit your
You can now find our package for doing XML indexing with Lucene on the
ISOGEN web site:
http://www.isogen.com/papers/lucene_xml_indexing.html
The package (lucene_xml_indexing.zip) includes all the 3rd-party
libraries it depends on (Lucene, Xerces 1.4.4, junit).
This package is provided as-is
this functionality in order to correlate PDF
annotations (links, bookmarks, notes) to the page objects they relate
to--it's all done with bounding boxes.
Cheers,
Eliot
--
W. Eliot Kimber, [EMAIL PROTECTED]
Consultant, ISOGEN International
1016 La Posada Dr., Suite 240
Austin, TX 78752 Phone
main writing usecase is the
rewriting of existing PDFs following some amount of manipulation through
our API.
A caution: I am still waiting to get approval from my employers to do
this work as open source--it may be a while before I can even start on
the coding.
Cheers,
Eliot
--
W. Eliot Kimber
/runLuceneClient.bat script (on Windows) and it should just
work. If it doesn't, let me know.
Cheers,
Eliot
--
W. Eliot Kimber, [EMAIL PROTECTED]
Consultant, ISOGEN International
1016 La Posada Dr., Suite 240
Austin, TX 78752 Phone: 512.656.4139
--
To unsubscribe, e-mail: mailto:[EMAIL PROTECTED
12 matches
Mail list logo