Hi,
Recently an index I've been building passed the 2 GB mark, and after I
optimize()ed it into one segment over 2 GB, it stopped working.
Apparently, this is a known problem (on 32 bit JVMs), and mentioned in the FAQ,
http://wiki.apache.org/lucene-java/LuceneFAQ question Is there a way to limit
Op Wednesday 25 June 2008 07:03:59 schreef John Wang:
Hi guys:
Perhaps I should have posted this to this list in the first
place.
I am trying to work on a patch to for each term, expose minDoc
and maxDoc. This value can be retrieve while constructing the
TermInfo.
Knowing
Nadav Har'El wrote:
Recently an index I've been building passed the 2 GB mark, and after I
optimize()ed it into one segment over 2 GB, it stopped working.
Nadav, which platform did you hit this on? I think I've created 2
GB index on 32 bit WinXP just fine. How many platforms are really
Jason Rutherglen wrote:
For Ocean I created a workaround where the IndexCommits from
IndexDeletionPolicy are saved in a map in order to achieve deleting
based on the IndexReader. It would be more straightforward to
delete from the IndexCommit in IndexReader.
It seems like we are mixing
Jason Rutherglen wrote:
One of the bottlenecks I have noticed testing Ocean realtime search
is the delete process which involves writing several files for each
possibly single delete of a document in SegmentReader. The best way
to handle the deletes is too simply keep them in memory
John Wang wrote:
The problem I am having is stated below, I don't know how to
add the minDoc and maxDoc values to the index while keeping backward
compatibility.
Unfortunately, TermInfo file format just isn't extensible at the
moment, so I think for now you'll have to break
[
https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12607954#action_12607954
]
Michael McCandless commented on LUCENE-1314:
bq. In my SegmentReader subclass
I understand what you are saying. I am not sure it is worth clearly quite
a bit more work given how easy it is to simply be able to have more control
over the IndexReader deletedDocs BitVector which seems like a feature that
should be in there anyways, perhaps even allowing SortedVIntList to be
On Wed, Jun 25, 2008 at 6:29 AM, Michael McCandless
[EMAIL PROTECTED] wrote:
We've also discussed at one point creating an IndexReader impl that searches
the RAM buffer that DocumentsWriter writes to when adding documents. I
think it's easier than it sounds, on first glance, because
[
https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608039#action_12608039
]
Jason Rutherglen commented on LUCENE-1314:
--
Here is the code of the SegmentReader
+1
24 jun 2008 kl. 22.28 skrev Yonik Seeley:
Something to consider for Lucene 3 is to have something to retrieve
Similarity per-field rather than passing the field name into some
functions...
benefits:
- Would allow customizing most Similarity functions per-field
- Performance: Similarity for
It seems like it could, it even has serialVersionUID defined.
Thanks Paul and Mike for the feedback.
Paul, for us, sparsity of the docIds determine which data structure to use.
Where cardinality gives some of that, min/max docId would also help,
example:
say maxdoc=100, cardinality = 7, docids: {0,1,...6} or
{3,4...9}, using arrayDocIdSet
No reason done!
Erik
On Jun 25, 2008, at 11:05 AM, Jason Rutherglen wrote:
It seems like it could, it even has serialVersionUID defined.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands,
I read other parts of the email but glanced over this part. Would terms be
automatically sorted as they came in? If implemented it would be nice to be
able to get an encoded representation (probably byte array) of the document
and postings which could be written to a log, and then reentered in
Avoidable synchronization bottleneck in MatchAlldocsQuery$MatchAllScorer
Key: LUCENE-1316
URL: https://issues.apache.org/jira/browse/LUCENE-1316
Project: Lucene - Java
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Feak updated LUCENE-1316:
--
Further investigation indicates that the ValueSourceQuery$ValueSourceScorer may
suffer from the same
Hi Paul:
Regarding to your comment on adding required/prohibited to BooleanQuery:
Based on the new api on DocIdSet and DocIdSetIterator abstractions, we
also developed decorators such as AndDocIdSet,OrDocIdSet and NotDocIdSet,
furthermore a DocIdSetQuery class that honors the Query api
On Wed, Jun 25, 2008 at 11:30 AM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
I read other parts of the email but glanced over this part. Would terms be
automatically sorted as they came in? If implemented it would be nice to be
able to get an encoded representation (probably byte array) of the
Op Wednesday 25 June 2008 18:45:16 schreef John Wang:
Hi Paul:
Regarding to your comment on adding required/prohibited to
BooleanQuery:
Based on the new api on DocIdSet and DocIdSetIterator
abstractions, we also developed decorators such as
AndDocIdSet,OrDocIdSet and NotDocIdSet,
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608128#action_12608128
]
Yonik Seeley commented on LUCENE-1316:
--
Although this doesn't solve the general
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608129#action_12608129
]
Hoss Man commented on LUCENE-1316:
--
rather then attempting localized optimizations of
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Feak updated LUCENE-1316:
--
I like Hoss' suggestion better. I'll try that fix locally and if it provides
the same improvement, I
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608134#action_12608134
]
Yonik Seeley commented on LUCENE-1316:
--
a more generalized improvements would
: Might also consider passing in more optional context when retrieving
: the similarity for a field (such as a Query, if searching).
: Something like Similarity.getSimilarity(String field, Query q).
i assume you mean Searcher.getSimilarity(String fieldName, Query q) to
replace the current
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608137#action_12608137
]
Hoss Man commented on LUCENE-1316:
--
bq. Code that depended on deletes being instantly
On Wed, Jun 25, 2008 at 2:19 PM, Chris Hostetter
[EMAIL PROTECTED] wrote:
: Might also consider passing in more optional context when retrieving
: the similarity for a field (such as a Query, if searching).
: Something like Similarity.getSimilarity(String field, Query q).
i assume you mean
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608146#action_12608146
]
robert engels commented on LUCENE-1316:
---
According to the java memory model,
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608147#action_12608147
]
Yonik Seeley commented on LUCENE-1316:
--
bq. why would deletes be stop being instantly
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608149#action_12608149
]
robert engels commented on LUCENE-1316:
---
The Pattern#5 referenced (cheap read-write
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608160#action_12608160
]
Yonik Seeley commented on LUCENE-1316:
--
bq. declaring the deletedDocs volatile should
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608162#action_12608162
]
Mark Miller commented on LUCENE-1316:
-
If I remember correctly, volatile does not work
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608162#action_12608162
]
[EMAIL PROTECTED] edited comment on LUCENE-1316 at 6/25/08 12:40 PM:
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608183#action_12608183
]
Hoss Man commented on LUCENE-1316:
--
bq. if thread A deleted a document, and then thread B
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608187#action_12608187
]
robert engels commented on LUCENE-1316:
---
Hoss, that is indeed the case, another
I am not sure, BooleanQuery takes something that can score, e.g. being a
Clause or a Query, the contract requires some sort of scoring functionality.
We use DocIdSetQuery for some of the scoring capabilities such as constant
score (with boosting), age decay, and using the new scoring api in 2.3.
[
https://issues.apache.org/jira/browse/LUCENE-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608189#action_12608189
]
Yonik Seeley commented on LUCENE-1316:
--
bq. is your point that without
: and how to use them? For a concrete example I'm looking to do a query
: on a date field to find documents earlier than a specified date or
: later than a specified date. Ex: date:( 20070101) or date:
: (20070101). I looked at the range query feature but it didn't appear
: to cover this
On Wed, Jun 25, 2008 at 5:06 PM, Chris Hostetter
[EMAIL PROTECTED] wrote:
Hmmm... that seems like it would be confusing: particularly since in the
IndexWriter case the Query param would never make sense. changing
IndexWriter.getSimilarity to take a String fieldName and changing
On 24-Jun-08, at 1:28 PM, Yonik Seeley wrote:
Something to consider for Lucene 3 is to have something to retrieve
Similarity per-field rather than passing the field name into some
functions...
+1
I've felt that this was the proper (and more useful) way to do
things for a long time
Chris,
That's exactly what I was looking for. Thanks for the info and the
clarification on where to post my questions.
Regards,
Kyle
On Wed, Jun 25, 2008 at 5:12 PM, Chris Hostetter [EMAIL PROTECTED]
wrote:
: and how to use them? For a concrete example I'm looking to do a query
: on a
41 matches
Mail list logo