To store document (specially large ones) out of the index is better than
in index. Every merge of segments or optimize will copy those data.
Stored in index is possible, but it requires 1-4x more space, depends on
read/write speed of the fs, merge and optimize takes longer time.
Karel
On Sun, 200
The problem is in StandardTokenizer so Analyzer with method:
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = new LowerCaseTokenizer(reader);
result = new StopFilter(result, stopSet);
return result;
}
if you need everything standard analyzer does
Fr
Apostrophe is recognized as a part of word - Standard analyzer is mostly
English oriented.
The way is to swap apostrophes - "normal" with unusual.
StandardAnalyzer.java line 40-44
APOSTROPHE:
token = jj_consume_token(APOSTROPHE);
-
Nope. IndexReader obtains a snapshot of index - not closing and opening
indexreader leads to not deleting files (windows exception, linux will
not free them).
Is it possible to get all the matching document in the result without
restarting the Searcher program?
One think, generally use RDBM for the STORED fields is good idea because
every segment merging / optimize copies those data once or twice (cfs).
I'm thinking about to put STORED fields in extra file and put pointers
in cfs. Delete will just mark document as delete. And new operation
omptimize_
Once I got same problem and following Jira not alone. I deleted index
and rebuild it from source again and problem was gone. Im unable to
reproduce it. Are you able to reproduce the problem?
Karel
java.io.FileNotFoundException: /lucene-indexes/mediafragments/_8km.fnm
(No
---
Discussed before, it's more relation db task than lucene.
Simple approach is to get a list of terms from your queries and store
relation document - query - terms.
I have around 1.6e10 query-terms in postgreSQL and with proper index
select takes around 0.6 ms (clustered vacuumed analyzed), 300
Not for now, but I'd like to contribute span support soon.
Karel
An alternative highlighter implementation was recently contributed here:
http://issues.apache.org/jira/browse/LUCENE-644?page=all
I've not had the time to study this alternative in detail (I hope to soon) so I can't say if it wi
Yes it is possible. Only UNSTORED fields became UNSTORED again and You
cannot change TERM in them.
If you have SQL db I have neat code to doing this.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [
Depends.
0) optimize big index
1) on big index delete all documents except those for a part of index
2) use AddIndexes on IndexWriter on destination dir (empty)
3) delete segments.del in big index directory (the segments.del is a
just serialized BitVector)
4) repeat for another set
do not mak
Im sending a snippet of code how to reconstruct UNSTORED fields.
It has two parts:
DB+terms
Class.forName("org.postgresql.Driver").newInstance();
con = DriverManager.getConnection("jdbc:postgresql:lucene",
"lucene", "lucene");
PreparedStatement psCompany=con.prepareStatemen
Well, you can have! :-) Even I have not tested, just an idea.
You can get document id after add - numDocs() and insert if DB fails,
you can delete document from RAMDir.
Or in my case of batches - im adding documents in DB with savepoint,
than create clear index (create=true) and at the end if
Jason is right. I think, even Im not expert on lucene too, your newly
added document cann't recreate terms for field with analyzer, because
field text in empty.
There is very hairy solution - hack a IndexReader, FieldInfosWriter and
use addIndexes.
Lucene is "only" a fulltext search library, n
Hi,
I'm facing similar problem. I found a possible way, how to copy a
part of index (w/o copy whole index,delete,optimize), but don't know how
to change/add/remove field (or add term vector in my case) to existing
index.
To copy a part of index override methods in IndexReader
/** Returns
depends of the document type, look at method setOmitNorms in Field class.
heritrix.lucene wrote:
Hi,
Aprrox 50 Million i have processed upto now. I kept maxMergeFactor and
maxBufferedDoc's value 1000. This value i got after several round of test
runs.
Indexing rate for each document in 50 M, is
Singleton pattern is better. Than you can extend it to proxy pattern.
existing IndexReader really isn't that expensive and does get around
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL
Hi,
there are two ways. The first is to use MultiFieldQueryParser
http://lucene.apache.org/java/docs/api/org/apache/lucene/queryParser/MultiFieldQueryParser.html
or do an extra step in indexing to build a new field as join of those
(e.g. StringBuffer append f1 append f2 ...)
Benefits of the
Not explict closing can lead especially when is allowed a lot of memory
to JVM but small amount is used that old files will stay on the disk on
linux.
Solution is in using ReentrantReadWriteLock
where the re-open method opens new indexreader at ThreadLocal
accuire write lock
saves old reference
You can use jdbm.sf.net for holding your_id to lucene_id relation in a
transaction hashtable on the disk.
Also Yonik will say solr at incubator.apache.org/solr has this
constraint check implemented.
-
To unsubscribe, e-mail:
Or you can use ssh -X for X11 forwarding. I don't know how it's working
in windows (some x client app) but great on linux(es) with huge bandwidth.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EM
20 matches
Mail list logo