RE: Can I retrieve token offsets from Hits?

2004-07-23 Thread Stepan Mik
Athough storing metadata would by very useful stuff, I believe that Token (Term?) offsets should be integral part of index. Storing such information would be optional (same as term frequencies) so users could make thier own decision on index size. But this discussion probably belongs to developer

Re: authentication support in lucene

2004-07-23 Thread Kelvin Tan
If you don't have low-level access to the framework that can retrieve a batch list of accessible IDs, document-by-document checking of ACL will be _painful_. I implemented ACL checking via Filters. Caching filters definitely helps, but may not be applicable in every situation. I stored the UUID

Re: authentication support in lucene

2004-07-23 Thread Dave Spencer
Kelvin Tan wrote: If you don't have low-level access to the framework that can retrieve a batch list of accessible IDs, document-by-document checking of ACL will be _painful_. I implemented ACL checking via Filters. Caching filters definitely helps, but may not be applicable in every situation.

Re: authentication support in lucene

2004-07-23 Thread Kelvin Tan
On Fri, 23 Jul 2004 10:09:25 +0100, Dave Spencer said: I implemented ACL checking via Filters. Caching filters definitely helps, but may not be applicable in every situation. I stored the UUID of each document in the database as well as in Lucene. That way, by retrieving a list of accessible

Large index files

2004-07-23 Thread Rupinder Singh Mazara
Hi all I am using lucene to index a large dataset, it so happens 10% of this data yields indexes of 400MB, in all likelihood it is possible the index may go upto 7GB. My deployment will be on a linux/tomcat system, what will be a better solution a) create one large index and hope linux

RE: Large index files

2004-07-23 Thread Karthik N S
Hi I think (a) would be a better choice [I have done it on Linux upt to 7GB , it's pretty faster then doing the same on win2000 PF] with regards Karthik -Original Message- From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED] Sent: Friday, July 23, 2004 5:55 PM To: Lucene Users

RE: Large index files

2004-07-23 Thread John Moylan
As long as your kernel has Large File Support, then you should be fine. Most modern distro's support 2GB files now out of the box. John On Fri, 2004-07-23 at 13:44, Karthik N S wrote: Hi I think (a) would be a better choice [I have done it on Linux upt to 7GB , it's pretty faster

Re: Large index files

2004-07-23 Thread Joel Shellman
I'm a little confused by this. I thought Lucene keeps creating new files as the index gets bigger and any single file doesn't ever get all that big. Is that not the case? Thanks, Joel Shellman John Moylan wrote: As long as your kernel has Large File Support, then you should be fine. Most modern

RE: Large index files

2004-07-23 Thread Rupinder Singh Mazara
by optimizing the created in index(es) u can reduce multiple files into a smaller set of files and on some file systems might be a good idea to optimize once in while -Original Message- From: Joel Shellman [mailto:[EMAIL PROTECTED] Sent: 23 July 2004 14:38 To: Lucene Users List

Merging indexes

2004-07-23 Thread Rupinder Singh Mazara
Hi all I got problem with merging indexes, I had to split up the indexing of my data into 20 different indexes(based on a primary key) I wanted to merge them all into one master index for example i have /xxx/lucene/tmp/1001-1000 /xxx/lucene/tmp/1001-2000

PDFBox problem.

2004-07-23 Thread Natarajan.T
FYI, I am using PDFBox.jar to Convert PDF to Text. Problem is in the runtime its printing lot of object messages How can I avoid this one??? How can I go with this one. import java.io.InputStream; import java.io.BufferedWriter; import java.io.IOException; import

Re: PDFBox problem.

2004-07-23 Thread Zilverline info
Natarajan.T wrote: FYI, I am using PDFBox.jar to Convert PDF to Text. Problem is in the runtime its printing lot of object messages How can I avoid this one??? How can I go with this one. import java.io.InputStream; import java.io.BufferedWriter; import java.io.IOException; import

Re: PDFBox problem.

2004-07-23 Thread Christiaan Fluit
We invoke the following code in a static initializer that simply disables log4j's output entirely. static { Properties props = new Properties(); props.put(log4j.threshold, OFF); org.apache.log4j.PropertyConfigurator.configure(props);

Re: PDFBox problem.

2004-07-23 Thread Ben Litchfield
I usually use use -Dlog4j.configuration=log4j.xml when invoking java from the command line, but I believe this depends on your environment. ex java -Dlog4j.configuration=log4j.xml org.pdfbox.ExtractText input.pdf Ben On Fri, 23 Jul 2004, Christiaan Fluit wrote: We invoke the following

Re: RangeQuery on Numeric values

2004-07-23 Thread Daniel Naber
On Friday 23 July 2004 16:58, Terence Lai wrote: I am currently using Lucene 1.4 Final. I want to construct a query that matches a numeric range. I believe that the RangeQuery defined in Lucene API uses the string comparision. It does not work for numeric contents. Does anyone know how to

merge factor and minMergeDocs

2004-07-23 Thread Praveen Peddi
Is anything changed lucene 1.4 regarding mergefactor? I recently ported to lucene 1.4 final and my indexing time doesnot change with change in the merge factor. Increasing minMergeDocs is improving my indexing as expected but changing mergefactor is making no difference. If this is the case, I