One more thing I missed. I don't quite get your point about skip() vs
next().
With or queries, skipping does not help as much comparing to and queries.
-John
On Tue, Jan 6, 2009 at 11:55 PM, John Wang wrote:
> Paul:
>
>Our very simple/naive testing methodology for OrDocIdSetIterator:
>
Paul:
Our very simple/naive testing methodology for OrDocIdSetIterator:
5 sub iterators, each subiterators just iterate from 0 to 1,000,000.
The test iterates the OrDocIdSetIterator until next() is false.
Do you want me to run the same test against DisjunctDisi?
-John
On Tue, Jan
On Wednesday 07 January 2009 07:36:06 John Wang wrote:
> Hi guys:
>
> We have been building a suite of boolean operators DocIdSets
> (e.g. AndDocIdSet/Iterator, OrDocIdSet/Iterator,
> NotDocIdSet/Iterator). We compared our implementation on the
> OrDocIdSetIterator (based on DisjunctionMaxScor
Hi guys:
We have been building a suite of boolean operators DocIdSets (e.g.
AndDocIdSet/Iterator, OrDocIdSet/Iterator, NotDocIdSet/Iterator). We
compared our implementation on the OrDocIdSetIterator (based on
DisjunctionMaxScorer code) with some code tuning, and we see the performance
doubled
Michael McCandless wrote:
I'll remove those 2 test cases.
The build now works perfectly. Thanks Mike!
--
Sami Siren
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-
[
https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Rutherglen updated LUCENE-1314:
-
Attachment: LUCENE-1314.patch
Everything in the previous post should be working and comp
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mark Miller updated LUCENE-1483:
Attachment: LUCENE-1483.patch
Merged everything and put Sort.ORD back the way it was (using ORD_SU
for the k=1 case in my mind your last comment might not really be that much
slower than storing the additional data... sounds worth investigating
On Tue, Jan 6, 2009 at 8:04 PM, robert engels wrote:
> I think you would need to store the position in the stream using position
> == to the k factor.
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661394#action_12661394
]
Mark Miller commented on LUCENE-1483:
-
bq. I think we should fix TestSort so that
I think you would need to store the position in the stream using
position == to the k factor. Pretty straightforward, both for
indexing and for searching.
I think if you want the utmost in performance this is the way to go.
If you don't want to store all of the additional data, I still think
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661390#action_12661390
]
Mark Miller commented on LUCENE-1483:
-
Can't seem to use the partial patch, but I'll t
robert theres only one problem i see: i don't see how you can do a single
search since fastssWC returns some false positives (with k=1 it will still
return some things with ED of 2). maybe if you store the deletion position
information as a payload (thus using original fastss where there are no
fal
I understand now.
The index in my case would definitely be MUCH larger, but I think it
would perform better, as you only need to do a single search - for
obert (if you assume it was a misspelling).
In your case you would eventually do an OR search in the lucene index
for all possible matc
i see what you are saying here. this is different than fastss but sounds
nice for spelling correction.
i suppose one reason why i like fastss is for my application i need the true
complete edit distance, i'm actually not using it for spelling correction
but as a first step for other tasks.
but ma
To clarify a statement in the last email.
To generate the 'possible source words' in real-time is not a
difficult as first seems, if you assume some sort of first character
prefix (which is what it appears google does).
For example, assume the user typed 'robrt' instead of 'robert'. You
s
On Tue, Jan 6, 2009 at 5:15 PM, robert engels wrote:
> It is definitely going to increase the index size, but not any more than
> than the external one would (if my understanding is correct).
> The nice thing is that you don't have to try and keep documents numbers in
> sync - it will be automati
It is definitely going to increase the index size, but not any more
than than the external one would (if my understanding is correct).
The nice thing is that you don't have to try and keep documents
numbers in sync - it will be automatic.
Maybe I don't understand what your external index is
[
https://issues.apache.org/jira/browse/LUCENE-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1502.
Resolution: Fixed
Lucene Fields: [New, Patch Available] (was: [Patch Availa
i see, your idea would definitely simplify some things.
What about the index size difference between this approach and using
separate index? Would this separate field increase index size?
I guess my line of thinking is if you have 10 docs with robert, with
separate index you just have robert, and
I don't think that is the case. You will have single deletion
neighborhood. The number of unique terms in the field is going to be
the union of the deletion dictionaries of each source term.
For example, given the following documents A which have field 'X'
with value best, and document B wi
a deletion neighborhood can be pretty large (for example robert is something
like robert obert rbert robrt robet ...)
so if you have a 100 million docs with 1 billion words, but only 100k unique
terms, it definitely would be wasteful to have 1 billion deletion
neighborhoods when you only need 100k.
Why not just create a new field for this? That is, if you have
FieldA, create field FieldAFuzzy and put the various permutations there.
The fuzzy scorer/parser can be changed to automatically use the
Fuzzy field when required.
You could also store positions, and allow that the first ter
[
https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661314#action_12661314
]
Robert Muir commented on LUCENE-1513:
-
otis, discussion was on java-user.
again, I ap
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661306#action_12661306
]
Mark Miller commented on LUCENE-1483:
-
bq. Could we just make ctors on each comparator
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661304#action_12661304
]
Michael McCandless commented on LUCENE-1483:
{quote}
> I'm trying to get loca
[
https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661302#action_12661302
]
Otis Gospodnetic commented on LUCENE-1513:
--
I feel like I missed some FastSS disc
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661295#action_12661295
]
Michael McCandless commented on LUCENE-1483:
{quote}
> Not sure about new cons
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1483:
---
Attachment: LUCENE-1483-partial.patch
Attached prototype changes to switch to "setBo
[
https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661286#action_12661286
]
Otis Gospodnetic commented on LUCENE-1513:
--
References provided by Glen Newton:
[
https://issues.apache.org/jira/browse/LUCENE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661269#action_12661269
]
Mark Miller commented on LUCENE-1304:
-
The main impact is that most of that code will
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661264#action_12661264
]
Mark Miller commented on LUCENE-1483:
-
I think we are wrapping up, but it may make sen
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661260#action_12661260
]
Ryan McKinley commented on LUCENE-1483:
---
Any estimates on how far along this is?
Is
[
https://issues.apache.org/jira/browse/LUCENE-1504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661249#action_12661249
]
Mark Miller commented on LUCENE-1504:
-
I think there is contrib dependency examples in
[
https://issues.apache.org/jira/browse/LUCENE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661241#action_12661241
]
Ryan McKinley commented on LUCENE-1512:
---
Any chance you could make a new patch witho
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661238#action_12661238
]
Mark Miller commented on LUCENE-1483:
-
Here is what that example policy has to be esse
[
https://issues.apache.org/jira/browse/LUCENE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661223#action_12661223
]
Ryan McKinley commented on LUCENE-1512:
---
This is awesome. thanks patrick!
> Incorp
[
https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-1513:
Attachment: fastSSfuzzy.zip
> fastss fuzzyquery
> -
>
> Key: LUCEN
fastss fuzzyquery
-
Key: LUCENE-1513
URL: https://issues.apache.org/jira/browse/LUCENE-1513
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/*
Reporter: Robert Muir
Priority
[
https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661214#action_12661214
]
Michael McCandless commented on LUCENE-1314:
{quote}
> The problem is the use
[
https://issues.apache.org/jira/browse/LUCENE-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
patrick o'leary updated LUCENE-1512:
Attachment: LUCENE-1512.patch
spatial-lucene GeoHash implementation based on
http://en.wi
Incorporate GeoHash in contrib/spatial
--
Key: LUCENE-1512
URL: https://issues.apache.org/jira/browse/LUCENE-1512
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/spatial
[
https://issues.apache.org/jira/browse/LUCENE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661199#action_12661199
]
patrick o'leary commented on LUCENE-1304:
-
How will LUCENE-1483 impact this immedi
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661165#action_12661165
]
Mark Miller commented on LUCENE-1483:
-
There are other little conversion steps that ha
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661160#action_12661160
]
markrmil...@gmail.com edited comment on LUCENE-1483 at 1/6/09 6:57 AM:
-
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661160#action_12661160
]
Mark Miller commented on LUCENE-1483:
-
bq. Mark, I see 3 testcase failures in TestSort
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661149#action_12661149
]
Michael McCandless commented on LUCENE-1483:
On what ComparatorPolicy to use b
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661148#action_12661148
]
Michael McCandless commented on LUCENE-1483:
I prototyped a rough change to th
[
https://issues.apache.org/jira/browse/LUCENE-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661145#action_12661145
]
Michael McCandless commented on LUCENE-1483:
Mark, I see 3 testcase failures i
[
https://issues.apache.org/jira/browse/LUCENE-1509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661143#action_12661143
]
Shalin Shekhar Mangar commented on LUCENE-1509:
---
Thanks Michael!
> IndexCom
[
https://issues.apache.org/jira/browse/LUCENE-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661125#action_12661125
]
Grant Ingersoll commented on LUCENE-1227:
-
Yes, please do have a look and let us k
50 matches
Mail list logo