Athough storing metadata would by very useful stuff, I believe that
Token (Term?) offsets should be integral part of index. Storing such
information would be optional (same as term frequencies) so users could
make thier own decision on index size. But this discussion probably
belongs to developer
If you don't have low-level access to the framework that can retrieve a batch
list of accessible IDs, document-by-document checking of ACL will be _painful_.
I implemented ACL checking via Filters. Caching filters definitely helps, but
may not be applicable in every situation. I stored the UUID
Kelvin Tan wrote:
If you don't have low-level access to the framework that can retrieve a batch
list of accessible IDs, document-by-document checking of ACL will be _painful_.
I implemented ACL checking via Filters. Caching filters definitely helps, but
may not be applicable in every situation.
On Fri, 23 Jul 2004 10:09:25 +0100, Dave Spencer said:
I implemented ACL checking via Filters. Caching filters definitely helps, but
may not be applicable in every situation. I stored the UUID of each document
in
the database as well as in Lucene. That way, by retrieving a list of
accessible
Hi all
I am using lucene to index a large dataset, it so happens 10% of this data
yields indexes of
400MB, in all likelihood it is possible the index may go upto 7GB.
My deployment will be on a linux/tomcat system, what will be a better
solution
a) create one large index and hope linux
Hi
I think (a) would be a better choice [I have done it on Linux upt to
7GB , it's pretty faster then doing the same on win2000 PF]
with regards
Karthik
-Original Message-
From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED]
Sent: Friday, July 23, 2004 5:55 PM
To: Lucene Users
As long as your kernel has Large File Support, then you should be
fine. Most modern distro's support 2GB files now out of the box.
John
On Fri, 2004-07-23 at 13:44, Karthik N S wrote:
Hi
I think (a) would be a better choice [I have done it on Linux upt to
7GB , it's pretty faster
I'm a little confused by this. I thought Lucene keeps creating new files
as the index gets bigger and any single file doesn't ever get all that
big. Is that not the case?
Thanks,
Joel Shellman
John Moylan wrote:
As long as your kernel has Large File Support, then you should be
fine. Most modern
by optimizing the created in index(es) u can reduce multiple files into
a smaller set of files
and on some file systems might be a good idea to optimize once in while
-Original Message-
From: Joel Shellman [mailto:[EMAIL PROTECTED]
Sent: 23 July 2004 14:38
To: Lucene Users List
Hi all
I got problem with merging indexes,
I had to split up the indexing of my data into 20 different indexes(based on a
primary key)
I wanted to merge them all into one master index
for example i have
/xxx/lucene/tmp/1001-1000
/xxx/lucene/tmp/1001-2000
FYI,
I am using PDFBox.jar to Convert PDF to Text.
Problem is in the runtime its printing lot of object messages
How can I avoid this one??? How can I go with this one.
import java.io.InputStream;
import java.io.BufferedWriter;
import java.io.IOException;
import
Natarajan.T wrote:
FYI,
I am using PDFBox.jar to Convert PDF to Text.
Problem is in the runtime its printing lot of object messages
How can I avoid this one??? How can I go with this one.
import java.io.InputStream;
import java.io.BufferedWriter;
import java.io.IOException;
import
We invoke the following code in a static initializer that simply
disables log4j's output entirely.
static {
Properties props = new Properties();
props.put(log4j.threshold, OFF);
org.apache.log4j.PropertyConfigurator.configure(props);
I usually use use -Dlog4j.configuration=log4j.xml when invoking java from
the command line, but I believe this depends on your environment.
ex
java -Dlog4j.configuration=log4j.xml org.pdfbox.ExtractText input.pdf
Ben
On Fri, 23 Jul 2004, Christiaan Fluit wrote:
We invoke the following
On Friday 23 July 2004 16:58, Terence Lai wrote:
I am currently using Lucene 1.4 Final. I want to construct a query that
matches a numeric range. I believe that the RangeQuery defined in Lucene
API uses the string comparision. It does not work for numeric contents.
Does anyone know how to
Is anything changed lucene 1.4 regarding mergefactor?
I recently ported to lucene 1.4 final and my indexing time doesnot change with change
in the merge factor. Increasing minMergeDocs is improving my indexing as expected but
changing mergefactor is making no difference.
If this is the case, I
16 matches
Mail list logo