[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-2111.
Resolution: Fixed
Wrapup flexible indexing
terms; I also
strengthened CheckIndex to verify that .ord() of the TermsEnum always returns
the right result, for codecs that implement .ord. I'll commit shortly...
Wrapup flexible indexing
Key: LUCENE-2111
URL: https
-- attached patch creates such an
index. I'm still getting to the bottom of it...
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type
enough
that we don't need to spend any time optimizing that emulation layer...
I also ran an indexing test (index first 10M docs of wikipedia) and
flex and trunk had similar times.
I think net/net we are good to land flex!
Wrapup flexible indexing
Key
!
+1. The tests have been passing for some time now, and Solr tests pass too.
It would be nice to look at merging flex into the trunk soon so that it gets
more exposure.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https
exc if it's run on a
field that omitTFAPs (matches PhraseQuery), fixes all jdoc warnings, spells out
back compat breaks in changes.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
to merge with trunk now.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions
this being said, I think flex is a great move forward for multitermqueries,
at least
we have a seeking-friendly API! One step at a time.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
-- that can't be a terms dict thing (just one lookup); i'm
not sure offhand why it's faster. That code is not very different than trunk.
bq. How's the indexing performance?
Unchanged -- I indexed first 10M docs of wikipedia and the times were nearly
identical.
Wrapup flexible indexing
pass a DFA to the codec
and it does the intersection enums the result), and we used byte-based DFAs,
I think we'd get a good speedup.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
25.87 QPS [-58.6% worse]
run flex on flex index...
cd /root/src/flex.clean/contrib/benchmark
log: /root/src/flex.clean/contrib/benchmark/logs/flexOnFlex.2
39.30 QPS [-37.1% worse]
124623 hits
{code}
Other queries I've tested look OK so far...
Wrapup flexible indexing
on flex branch! I turned
most of them into TODOs :)
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-2111.
Resolution: Fixed
Thanks Shai!
Wrapup flexible indexing
Subject: [jira] Resolved: (LUCENE-2111) Wrapup flexible indexing
[ https://issues.apache.org/jira/browse/LUCENE-
2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless resolved LUCENE-2111.
Resolution
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless reopened LUCENE-2111:
Duh -- wrong issue! I only wish ;)
Wrapup flexible indexing
codecs provide direct access to the int[] they have
(saves extra copy).
* Down to 9 nocommits!!
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-2111:
---
Attachment: LUCENE-2111.patch
Forgot to add SegmentReadState.java
Wrapup flexible
TermsEnum.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: Flex Branch
such a
large patch...
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects
is one.
I think it looks kinda dumb but if its useful, I'll commit it.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
, MTQ.getTermsEnum() to never return
null, but IR.fields() and Fields.terms(String field), and
.docs/.docsAndPositions can return null.
Also whittled down more nocommits -- down to 53 now!
Wrapup flexible indexing
Key: LUCENE-2111
URL: https
good Robert... thanks!
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Michael McCandless updated LUCENE-2111:
---
Attachment: LUCENE-2111.patch
Down to 15 nocommits!
Wrapup flexible indexing
(adding @Override to
interface).
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects
to determine that a codec does not store positions.
Thinking more about this... I think we should switch back to a null return from
.docsAndPositionsEnum if the codec doesn't support positions. We only return
.EMPTY if the enum is really just empty.
Wrapup flexible indexing
compat,
so null can have some other meaning. Instead it uses VirtualMethod,
with the default implementatinos throwing UOE.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: Flex Branch
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2111:
Attachment: flex_merge_916543.patch
patch for review of flex merge
Wrapup flexible indexing
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated LUCENE-2111:
Attachment: LUCENE-2111.patch
a few more easy nocommits
Wrapup flexible indexing
(instead return .EMPTY
objects).
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
to use
@lucene.experimental
i didnt mess with IndexFileNames as there is an open issue about it right now.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene
.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: Flex Branch
, and renaming BytesRef.toString -
BytesRef.utf8ToString.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components
at the backwards tests now
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects
a StringBuilder if you want)
there are some breaks (e.g. binary api compat), but its an internal api.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
:)
Why not just remove UTF8Result altogether? (ie don't bother deprecating).
It's an internal API...
The new method to compute hash is great, saving the extra pass in THPF.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https
in revision 915511
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions
.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: Flex Branch
flex trunk).
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: Flex
before because the previous impl of Multi*Enums
was using the same Docs/AndPositionsEnums before. This patch fixes that.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
a singleton in TermsEnum class
itsself.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
-- thanks for re-merging!
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions
was faster before because the previous impl of
Multi*Enums was using the same Docs/AndPositionsEnums before. This patch fixes
that.
Ahh, and also because your patch switches from String to char[], which should
improve perf.
Your patch looks good Robert! Thanks.
Wrapup flexible indexing
from String to char[], which
should improve perf.
actually i didnt apply your LUCENE-2111 when running the benchmark (the
improvement is simply the char[]).
the test is now actually slightly slower now with the rest of LUCENE-2111
Wrapup flexible indexing
on
MultiFields (eg, MultiFields.getFields(IndexReader)) to easily do
this, and cutover places in Lucene that may need direct postings from
a multi-reader to use this method.
I've updated the javadocs explaining this.
Wrapup flexible indexing
Key: LUCENE
:
* remove synchronization (not necessary, history here: LUCENE-296)
* reuse char[] rather than create Strings
* remove unused ctors
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
benchmark for LUCENE-2089, wierd
that flex was slower than trunk before.
numbers are stable across many iterations.
||unpatched flex||patched flex||trunk||
|4362ms|3239ms|3459ms|
Wrapup flexible indexing
Key: LUCENE-2111
URL: https
the 1024 sized cache)
I cutover all codecs to the new API... all tests pass if you switch
the default codec (in oal.index.codec.Codecs.getWriter) to any of the
four.
Wrapup flexible indexing
Key: LUCENE-2111
URL: https
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: Flex Branch
- oal.util.BytesRef.
I think, eventually, we should fix the various places that refer to byte
slices, eg Field.get/setBinary*, Payload, UnicodeUtil.UTF8Result,
IndexOutput.writeBytes, IndexInput.readBytes, to use BytesRef instead.
Wrapup flexible indexing
Key
LUCENE-2111.
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New Feature
Components: Index
,
and indexing using flex, so this is a good test of mixed pre/post flex
segments, with no problems.
Getting closer...
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene
of the back compat layers.
We still need more tests but this is a good step forward...
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue
in the flex API on
non-flex layer (the preflex codec) -- exposed with a new test case in
TestBackCompat, and fixed. Also cleaned up some nocommits, added
indexDivisor to the loadTermsIndex API, and fixed preflex to actually
implement it.
Wrapup flexible indexing
java.io.Closeable, so the underlying old enums are never
closed by client code. Shouldn't all the enum classes not also be Closeable,
even when the new Codec API current would implement these as a no-op for core
classes. But maybe someone creates a codec that needs close?
Wrapup flexible
for flex indexing ?
It's https://svn.apache.org/repos/asf/lucene/java/branches/flex_1458
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue
/TermDocs). I'd rather
not add a close that for all core impls is a no-op and so Lucene
doesn't have to call close.
Fourth, because it complicates our impls if we really must close
whenever we pull a enum -- eg our Scorers pull enums today, but never
close them.
Wrapup flexible indexing
, we lost the fixes for LUCENE-1558 (defaulting
readOnly=true for IndexReader)... IndexSearcher looks like it didn't lose the
change though.
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https
from what I can tell :) Which is
why I had to go line by line a merge or two ago to catch everything that had
been dropped.
I expected I'd have to do it again, but its a lot of effort to do every time.
Further steps towards flexible indexing
.
I expected I'd have to do it again, but its a lot of effort to do every time.
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene
looks good now.
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New Feature
Components
that we should re-run perf tests of the MTQs -- LinkedHashMap caused serious
GC problems when I was testing automaton query.
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse
if you had done that yet last night
(unrelatedly)
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New
be an interface it would work (this
is one of the cases where interfaces for really simple patterns can be used,
like iterators).
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse
or two and for some reason thought this was
one of them.
was (Author: markrmil...@gmail.com):
Cool - was actually thinking about looking if you had done that yet last
night (unrelatedly)
Further steps towards flexible indexing
about looking if you had done that yet last
night (unrelatedly)
Feel free to fix it!
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project
multiple inheritace, we cannot also extends
attributesource :-( Would DocIdSetIterator be an interface it would work (this
is one of the cases where interfaces for really simple patterns can be used,
like iterators).
Further steps towards flexible indexing
flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Affects Versions: 2.9
it and dynamically instantiating would be an
idea. The same applies for TermsEnum, it should be separated for lazy init.
That's a good point (avoid cost of creating the AttributeSource) -- that makes
complete sense.
Further steps towards flexible indexing
cache
Should and still try and do the reuse stuff, or should we just drop it and use
the cache as it is now? (eg reusing the object that is removed, if one is
removed) Looks like that would be harder to get done now.
Further steps towards flexible indexing
and do the reuse stuff, or should we just drop it and use
the cache as it is now? (eg reusing the object that is removed, if one is
removed) Looks like that would be harder to get done now.
Further steps towards flexible indexing
---
Key: LUCENE
flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Affects Versions: 2.9
it from the beginning as DocIdSetIterator?
In my opinion, as pointed out above, the AttributeSource stuff should go in as
a lazy-init member behind getAttributes() / attributes().
Further steps towards flexible indexing
---
Key: LUCENE-1458
towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Affects Versions: 2.9
.
With this patch MatchAllDocsQuery is very simple to implement now as a
ConstantScoreQuery on top of a Filter that returns the DocsEnum of the supplied
IndexReader as iterator. Really cool.
Further steps towards flexible indexing
---
Key: LUCENE
(XYZ.class) in its ctor and store the reference
locally. attributes() is final (to be safe, when called by ctor).
Eventually add an Interface AttributeAble *g* that is implemented by all these
enums and anywhere else using AttributeSource that may need to be lazy init.
Further steps towards flexible
the reuse stuff, or should we just drop it and
use the cache as it is now?
How about starting w/o reuse but leave a TODO saying we could/should
investigate?
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL
. MatchAllDocsQuery is very simple to implement now as a ConstantScoreQuery
on top of a Filter that returns the DocsEnum of the supplied IndexReader as
iterator. Really cool.
Sweet! Wait, using AllDocsEnum you mean?
Further steps towards flexible indexing
but leave a TODO saying we could/should
investigate?
Actually, scratch that -- reuse is too hard in DBLRU -- I would say just no
reuse now. Trunk doesn't reuse either...
Further steps towards flexible indexing
---
Key: LUCENE-1458
?
Yes, but this class is package private and unused! AllTermDocs is used by
SegmentReader to support termDocs(null), but not AllDocsEnum. There is no
method in IndexReader that returns all docs?
The matchAllDocs was just an example, there are more use cases.
Further steps towards flexible
: Flex Branch
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Reporter: Michael McCandless
[
https://issues.apache.org/jira/browse/LUCENE-2111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Uwe Schindler updated LUCENE-2111:
--
Affects Version/s: Flex Branch
Wrapup flexible indexing
of unchecked warnings... The
Generics policeman will visit them and will hopefully help fixing!
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
will visit them and will hopefully help fixing!
Uh-oh... I sense heavy committing in flex branch's future!
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
?
Wrapup flexible indexing
Key: LUCENE-2111
URL: https://issues.apache.org/jira/browse/LUCENE-2111
Project: Lucene - Java
Issue Type: Improvement
Components: Index
Affects Versions: Flex Branch
)
at
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:208)
{code}
Pipe in if you know. Hard to debug or run this test singular in Eclipse
(because of how BW compat tests work), so its a slow slog to trouble shoot and
I haven't had time yet.
Further steps towards flexible indexing
was from my fix of BooleanQuery to take coord into
account in equals hashCode (LUCENE-2092)? I hit exactly that same failure,
and it required a fix on back-compat branch to just pass in true to the new
BooleanQuery() done just before the assert. Does that explain it?
Further steps towards flexible
on merging trunk down! I'm especially looking
forward to getting the faster unit tests (LUCENE-1844).
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
. The test just checks that no clauses
are generated. In my opinion, it should not compare to a empty BQ instance,
instead just assert bq.clauses().size()==0.
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL
not compare to a empty BQ instance, instead just
assert bq.clauses().size()==0.
+1, that'd be a good improvement -- I'll do that.
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira
(MultiTermQuery.SCORING_BOOLEAN_QUERY_REWRITE);
assertMatches(searcher, wq, 0);
Query q = searcher.rewrite(wq);
assertTrue(q instanceof BooleanQuery);
assertEquals(0, ((BooleanQuery) q).clauses().size());
}
{code}
Further steps towards flexible indexing
?
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New Feature
Components: Index
guess and try - but neither true nor false fixed it.
Looks like Uwes fix with side step the issue though? Sounds good to me :)
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira
with a new issue and patch... Just
joking :-)
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New
seeing that testreopen gc
overhead limit exceeded, I just hit it again randomly.
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene
steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New Feature
Components: Index
Affects Versions: 2.9
. I could also put it into 3.0 and
2.9, but I do not think that is needed :)
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene
).
Opinions?
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project: Lucene - Java
Issue Type: New Feature
Components: Index
to go back to returning the Enum? But I'm not sure why this
change was made, so ...
Further steps towards flexible indexing
---
Key: LUCENE-1458
URL: https://issues.apache.org/jira/browse/LUCENE-1458
Project
1 - 100 of 428 matches
Mail list logo