Re: AutomatonTermsEnum and fixed-string automata

2017-01-11 Thread Alan Woodward
I opened LUCENE-7627 > On 11 Jan 2017, at 10:52, Michael McCandless > wrote: > > Thanks Alan, I agree we should open an issue and iterate. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Wed, Jan 11, 2017 at 5:41 AM, Alan Woodward wrote: >> The problem here is mainly with t

Re: AutomatonTermsEnum and fixed-string automata

2017-01-11 Thread Michael McCandless
Thanks Alan, I agree we should open an issue and iterate. Mike McCandless http://blog.mikemccandless.com On Wed, Jan 11, 2017 at 5:41 AM, Alan Woodward wrote: > The problem here is mainly with the Sorted*DocValues APIs, which return a > TermsEnum but don’t have a Terms instance to call interse

Re: AutomatonTermsEnum and fixed-string automata

2017-01-11 Thread Alan Woodward
The problem here is mainly with the Sorted*DocValues APIs, which return a TermsEnum but don’t have a Terms instance to call intersect on. So maybe the thing to do is to add a termsEnum(CompiledAutomaton) method to SortedDocValue and SortedSetDocValues? That should avoid the trap of bypassing

Re: AutomatonTermsEnum and fixed-string automata

2017-01-06 Thread Michael McCandless
Unfortunately I think that's somewhat dangerous because it creates an ambiguous API with a nasty performance trap? I.e. this new method won't invoke the fast Terms.intersect in the default terms dict? Mike McCandless http://blog.mikemccandless.com On Fri, Jan 6, 2017 at 3:20 PM, Alan Woodward

Re: AutomatonTermsEnum and fixed-string automata

2017-01-06 Thread Alan Woodward
Hm, how about something like this, on CompiledAutomaton: public TermsEnum getTermsEnum(TermsEnum te) throws IOException { switch (type) { case NONE: return TermsEnum.EMPTY; case ALL: return te; case SINGLE: return new SingleTermsEnum(te, term); case NORMAL:

Re: AutomatonTermsEnum and fixed-string automata

2017-01-06 Thread Michael McCandless
These automaton intersection APIs are frustrating with all the special case handling... Ideas welcome! We've had similar challenges with them in the past, when a user invoked Terms.intersect directly instead of via CompiledAutomaton: https://issues.apache.org/jira/browse/LUCENE-7576 The problem i

AutomatonTermsEnum and fixed-string automata

2017-01-06 Thread Alan Woodward
We’ve hit an issue while developing marple, where we want to have the ability to filter the values from a SortedDocValues terms dictionary. Normally you’d create a CompiledAutomaton from the filter string, and then call #getTermsEnum(Terms) on it; but for docvalues, we don’t have a Terms instan