Thanks!
On Nov 21, 2007, at 1:35 AM, Michael Busch wrote:
robert engels wrote:
We are still using Lucene 1.9.1+, and I am wondering if there has
been
any improvements in searching on AND clauses when some of the
terms are
very infrequent...
multi-level skipping should help when an AND
robert engels wrote:
>
> We are still using Lucene 1.9.1+, and I am wondering if there has been
> any improvements in searching on AND clauses when some of the terms are
> very infrequent...
>
multi-level skipping should help when an AND query has frequent and
infrequent terms. See LUCENE-866 fo
Sorry if this is somewhat off topic, but it seems at least marginally
related to this...
We are still using Lucene 1.9.1+, and I am wondering if there has
been any improvements in searching on AND clauses when some of the
terms are very infrequent...
This change seems appropriate. Are th
[
https://issues.apache.org/jira/browse/LUCENE-693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yonik Seeley updated LUCENE-693:
Attachment: conjunction.patch
Whew... I'd forgotten about this issue. I brushed up one of the last
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544175
]
Doron Cohen commented on LUCENE-1063:
-
{quote}
So I don't think we need to change anything.
{quote}
(y) sounds g
[
https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated LUCENE-1058:
Attachment: LUCENE-1058.patch
Here's a patch that modifies the DocumentsWriter to not thro
[
https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544145
]
Grant Ingersoll commented on LUCENE-1058:
-
Some javadoc comments for the modifyToken method in BufferingToke
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544136
]
Chuck Williams commented on LUCENE-1052:
I can report that in our application having a formula is critical.
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544125
]
Doug Cutting commented on LUCENE-1052:
--
What class would we put TermInfosReader-specific setters & getters on,
[
https://issues.apache.org/jira/browse/LUCENE-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544115
]
Michael McCandless commented on LUCENE-1044:
OK, I tested calling command-line "sync", after writing eac
[
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paul Elschot updated LUCENE-1001:
-
Comment: was deleted
> Add Payload retrieval to Spans
> --
>
>
[
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544108
]
Paul Elschot commented on LUCENE-1001:
--
Grant,
You asked:
... how do I get access to the position payloads in
[
https://issues.apache.org/jira/browse/LUCENE-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544107
]
Paul Elschot commented on LUCENE-1055:
--
Hoss,
That must have been the cause. After removing the gdata-server d
[
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544105
]
Grant Ingersoll commented on LUCENE-1001:
-
Sure, but how do I get access to the position payloads in the ord
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-1063.
Resolution: Invalid
> Token re-use API breaks back compatibility in certain TokenS
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544103
]
Michael McCandless commented on LUCENE-1063:
OK it sounds like this was a false alarm on my part -- sorr
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544098
]
Yonik Seeley commented on LUCENE-1063:
--
> CachingTokenFilter actually does this (caching references to the toke
[
https://issues.apache.org/jira/browse/LUCENE-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544095
]
Michael Busch commented on LUCENE-1062:
---
We want to add the following methods to Payload:
{code:java}
public
[
https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544096
]
Michael McCandless commented on LUCENE-1058:
I think the discussion in LUCENE-1063 is relevant to this i
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544093
]
Doron Cohen commented on LUCENE-1063:
-
{quote}
> TokenStreams that cache tokens without "protecting" their priva
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544091
]
Michael Busch commented on LUCENE-1063:
---
{quote}
I think it should put a cloned copy into the cache.
{quote}
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544088
]
Michael Busch commented on LUCENE-1063:
---
{quote}
That would be a bug in the filter (both in the past and now).
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544077
]
Yonik Seeley commented on LUCENE-1063:
--
In the past, the semantics were simple... Tokenizer generated tokens, a
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544076
]
Michael McCandless commented on LUCENE-1052:
Maybe, instead, we should simply make it "easy" to subclas
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544075
]
Doron Cohen commented on LUCENE-1063:
-
Oh, I was locked on that calling next(null) means do-not-reuse but guess
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544063
]
Michael McCandless commented on LUCENE-1063:
{quote}
I checked next(Token res) implementations of CharTo
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544059
]
Doron Cohen commented on LUCENE-1063:
-
{quote}
and even with old style Tokens w/o Token reuse, one could always
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544055
]
Chuck Williams commented on LUCENE-1052:
I agree a general configuration system would be much better. Doug.
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544054
]
Michael McCandless commented on LUCENE-1063:
{quote}
Looking at the test, this would not have worked bef
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544048
]
Yonik Seeley commented on LUCENE-1063:
--
> it is the addition of Token.termBuffer() that allowed this to happen
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544045
]
Doron Cohen commented on LUCENE-1063:
-
Yes, that's what I meant - it is the addition of Token.termBuffer()
that
20 nov 2007 kl. 20.28 skrev Doug Cutting:
karl wettin wrote:
On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]>
wrote:
it is always good to have query logs
http://thepiratebay.org/tor/3783572
It doesn't look as though there's click data, so we can't use this
for relevance exp
Grant Ingersoll wrote:
> Scratch my last comment. I was thinking it only pertained to payloads.
>
> In that light, I think we should modify the scorePayload method for the
> time being, then we can deprecate it when we go to per field sim.
>
> -Grant
>
OK sounds good. Will make the change with
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544034
]
Yonik Seeley commented on LUCENE-1063:
--
Looking at the test, this would not have worked before token-reuse eith
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544032
]
Doug Cutting commented on LUCENE-1052:
--
I think we should be cautious about adding a new public interface or ab
Scratch my last comment. I was thinking it only pertained to payloads.
In that light, I think we should modify the scorePayload method for
the time being, then we can deprecate it when we go to per field sim.
-Grant
On Nov 20, 2007, at 2:34 PM, Michael Busch wrote:
Yonik Seeley wrote:
P
Well, we are making an awful lot of improvements for Payloads, I think
we should try to get it in now and make 2.3 wait a bit more, since we
all have more or less agreed that 2.9 (next after 2.3) is going to be
a deprecation release before moving to 3.0
-Grant
On Nov 20, 2007, at 2:34 PM,
[
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544029
]
Doug Cutting commented on LUCENE-1001:
--
> Would it be simpler to just use a SortedSet?
TreeMap is slower than
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-1063:
---
Attachment: LUCENE-1063.patch
Attached patch w/ unit test showing the issue, plus th
Yonik Seeley wrote:
>
> Per field similarity would certainly be more efficient since it moves
> the field->similarity lookup from the inner loop to the outer loop.
>
I agree. Then I'll leave the scorePayload() API as is for now. And I
don't think the per-field similarity should block 2.3, so let
karl wettin wrote:
On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
it is always good to have query logs
I realize that it is not that politically correct, but the TPB
collection is released to the public domain and contains 3.2 million
user queries with session id, timesta
On Nov 20, 2007 2:17 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> Grant Ingersoll wrote:
> > +1 for adding the field name.
> >
>
> The question is whether we should add the field name to the
> Similarity#scorePayload() method or if we should support a per-field
> similarity in the future?
Per fi
Grant Ingersoll wrote:
> +1 for adding the field name.
>
>
The question is whether we should add the field name to the
Similarity#scorePayload() method or if we should support a per-field
similarity in the future?
-Michael
-
T
OK, thanks. I'll put mine in there too.
Mike
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 20, 2007 1:49 PM, Michael McCandless <[EMAIL PROTECTED]>
> wrote:
> >
> > Will do ...
> >
> > Mike
> >
> > "Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote:
> > > Could we make this a little more conc
[
https://issues.apache.org/jira/browse/LUCENE-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
John Wang updated LUCENE-1061:
--
Fix Version/s: 2.3
Lucene Fields: [New, Patch Available] (was: [Patch Available, New])
On Nov 20, 2007 1:49 PM, Michael McCandless <[EMAIL PROTECTED]> wrote:
>
> Will do ...
>
> Mike
>
> "Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote:
> > Could we make this a little more concrete by creating a simple test case
> > that fails?
FWIW, I recently added mine to TestAnalyzers to check fo
Will do ...
Mike
"Yonik Seeley (JIRA)" <[EMAIL PROTECTED]> wrote:
>
> [
>
> https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544005
> ]
>
> Yonik Seeley commented on LUCENE-1063:
> ---
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544005
]
Yonik Seeley commented on LUCENE-1063:
--
Could we make this a little more concrete by creating a simple test cas
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543991
]
Michael McCandless commented on LUCENE-1063:
{quote}
{code}
// Filter F is calling TokenStream ts:
F.nex
[
https://issues.apache.org/jira/browse/LUCENE-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543982
]
Hoss Man commented on LUCENE-1055:
--
contrib/gdata-server is recorded as deleted (so an "svn status" will show that
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> > If we used a Payload object, it would save 8 bytes per Token for
> > fields not using payloads.
Of course with Token reuse, saving 8 bytes isn't important any more
either since it's only allocated once per field.
-Yonik
--
[
https://issues.apache.org/jira/browse/LUCENE-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543979
]
Michael Busch commented on LUCENE-1055:
---
{quote}
After svn update, contrib/gdata-server is still in my working
Michael McCandless wrote:
> "Yonik Seeley" <[EMAIL PROTECTED]> wrote:
>> On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
>>> Yonik Seeley wrote:
So I think we all agree to do payloads by reference (do not make a
copy of byte[] like termBuffer does), and to allow payload
: I think the safest path is simply to not publish any queries, but rather to,
: e.g., permit committers to run experiments using them and publish the results
: of the experiments. But no queries would be made available to the general
: public on a website.
that would eliminate the goal of havin
[
https://issues.apache.org/jira/browse/LUCENE-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543939
]
Doron Cohen commented on LUCENE-1063:
-
In ''code words":
{code}
// Filter F is calling TokenStream ts:
F.next(To
[
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543915
]
Grant Ingersoll commented on LUCENE-1001:
-
{quote}
Off the top of my head: the priority queue is used to mak
Token re-use API breaks back compatibility in certain TokenStream chains
Key: LUCENE-1063
URL: https://issues.apache.org/jira/browse/LUCENE-1063
Project: Lucene - Java
[
https://issues.apache.org/jira/browse/LUCENE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543903
]
Yonik Seeley commented on LUCENE-1040:
--
Indeed... thanks for catching that!
> Can't quickly create StopFilter
This may be worth asking legal-discuss about. I am not sure if there
is an issue or not.
-Grant
On Nov 20, 2007, at 4:54 AM, karl wettin wrote:
On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
it is always good to have query logs
I realize that it is not that politic
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> On Nov 19, 2007 6:52 PM, Michael Busch <[EMAIL PROTECTED]> wrote:
> > Yonik Seeley wrote:
> > >
> > > So I think we all agree to do payloads by reference (do not make a
> > > copy of byte[] like termBuffer does), and to allow payload reuse.
> > >
> > > So
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reopened LUCENE-1052:
> Add an "termInfosIndexDivisor" to IndexReader
>
[
https://issues.apache.org/jira/browse/LUCENE-1052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543854
]
Michael McCandless commented on LUCENE-1052:
Thanks Chuck for such a wonderfully thorough patch & unit t
[
https://issues.apache.org/jira/browse/LUCENE-1040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543847
]
Michael McCandless commented on LUCENE-1040:
Yonik, I think you missed my proposed update to your origin
On Nov 15, 2007 10:09 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> it is always good to have query logs
I realize that it is not that politically correct, but the TPB
collection is released to the public domain and contains 3.2 million
user queries with session id, timestamp, category etc to g
[
https://issues.apache.org/jira/browse/LUCENE-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12543807
]
Paul Elschot commented on LUCENE-1055:
--
After svn update, contrib/gdata-server is still in my working copy.
Is
65 matches
Mail list logo