[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14747320#comment-14747320 ] Arcadius Ahouansou commented on LUCENE-3807: Hi [~simonw] Sorry for coming late... {quote} weight (ie freq) now returns a long instead of a float discouraging floating point number as weights {quote} There are many cases like with the {{AnalyzingInfixSuggester}} where the weight is pre-computed outside of Lucene before ingestion. Having a {{double}} (floating point) makes sense as it would help avoid all sort of rounding/data-loss. > Cleanup suggester API > - > > Key: LUCENE-3807 > URL: https://issues.apache.org/jira/browse/LUCENE-3807 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/other >Affects Versions: 3.6, 4.0-ALPHA >Reporter: Simon Willnauer >Assignee: Simon Willnauer > Fix For: 3.6, 4.0-ALPHA > > Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, > LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, > LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807_3x.patch > > > Currently the suggester api and especially TermFreqIterator don't play that > nice with BytesRef and other paradigms we use in lucene, further the java > iterator pattern isn't that useful when it gets to work with TermsEnum, > BytesRef etc. We should try to clean up this api step by step moving over to > BytesRef including the Lookup class and its interface... -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224242#comment-13224242 ] Simon Willnauer commented on LUCENE-3807: - backported committed patches to 3.x in rev 1297946 Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807_3x.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13224693#comment-13224693 ] Robert Muir commented on LUCENE-3807: - Thanks Simon! hard work but I expect there is more. Hopefully this module will grow into something much bigger... Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.6, 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807_3x.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220926#comment-13220926 ] Dawid Weiss commented on LUCENE-3807: - I looked at your patch briefly, Simon. Random notes: - I'd change currentElement into lastElement or something like that. Otherwise it looks odd to me in the code, as in: {code} throw new IndexOutOfBoundsException(index + pos + must be less than the size: + currentElement); {code} - typo in orderdEntries. - I'm very likely paranoid but I'd stick to just one class for storing these: {code} protected Number weightAsNumber(long weight) { {code} Since these are objects the memory gain will most likely be obscured by object alignments and object overhead itself and the downside is that you're using an interface with all call sites that will very likely become megamorphic (so no chances to inline anything). I don't know if it's worth the effort. I didn't have time to think much about changes to the functional logic; I don't think there were any (and if there were, they should be covered by tests?). Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221000#comment-13221000 ] Robert Muir commented on LUCENE-3807: - i did a quick review: looks good to me Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221012#comment-13221012 ] Simon Willnauer commented on LUCENE-3807: - committed to trunk.. I think we are ready to backport. I will do that in the next days Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221207#comment-13221207 ] Robert Muir commented on LUCENE-3807: - Somewhere along the line we got load(File), load(InputStream), store(File), store(InputStream)... can we improve this? I think its confusing when writing a new suggester to have all these hooks. load(File) takes a directory name, so a suggester is free to use multiple files (I am working on one that might use 2 files). so then what is load(InputStream) ?! Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221224#comment-13221224 ] Dawid Weiss commented on LUCENE-3807: - I added methods that used streams. They stored the underlying FST. I personally didn't like the must point to a directory approach. The File(dir) methods got inherited from the original Lookup interface I think. Those saving/reading from streams were not in the interface so they were specific to a concrete implementation. Don't know how it looks now. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221233#comment-13221233 ] Robert Muir commented on LUCENE-3807: - Right I'm not concerned about concrete impls. Concrete can do whatever it wants :) In the actual lookup interface, load/store from stream was added underneath this issue: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/suggest/src/java/org/apache/lucene/search/suggest/Lookup.java?r1=1236012r2=1291418diff_format=h I'm not opposed to this change: but if we are going to require that implementations only use a single file, then we should remove the File ones and just let caller deal with the inputstream... does that make sense? Currently its impossible to tell which one will be called! Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221239#comment-13221239 ] Dawid Weiss commented on LUCENE-3807: - If you say you already have an implementation that could use more than one file then it'd be silly to enforce a single file format. I agree too many options is not good, in particular it's painful for potential implementors. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221243#comment-13221243 ] Robert Muir commented on LUCENE-3807: - Well I switched to a single file already, so ignore that implementation :) But my concern is mostly about API confusion for implementing suggesters: if we require a single file then lets just have InputStream. Otherwise, lets remove InputStream from Lookup, (of course a specific concrete Impl that only uses one file is still free to provide that) Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221249#comment-13221249 ] Dawid Weiss commented on LUCENE-3807: - +1 for streams. They're a lot more flexible than Files. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13221332#comment-13221332 ] Simon Willnauer commented on LUCENE-3807: - the main reason why I moved this to the interface was that I needed to do some extra processing in the stream ie. calc a checksum and I wanted to write stuff directly to HDFS etc. I think we should go for streams or we pass in a lucene directory while I think unless really needed we should stick with a stream. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213477#comment-13213477 ] Dawid Weiss commented on LUCENE-3807: - Sorry for being late, work. I like the patch. Comments: +public ComparatorBytesRef getComparator() { + return null; +} This shows up in a number of places. I have mixed feelings about certain types having a comparator and others not having it, but it's minor. BufferingTermFreqIteratorWrapper is a nuisance (buffers in memory). It would be nicer to have a sort on disk if something doesn't support sorted iteration order. I also wonder float - long = 4 - 8 bytes... would this count as an incompatible API change (because what used to work for a given amount of RAM won't work anymore -- BufferingTermFreqIteratorWrapper again)? + if (l1 l2) { +aStop = l1; + } else { +aStop = l2; + } if I remember correctly Math.min/max are intrinsics, so you can afford to be explicit ;) Why not a specific covariant here? - public Float get(String key) { + public Object get(CharSequence key) { @@ -199,7 +199,7 @@ public class LookupBenchmarkTest extends LuceneTestCase { public Integer call() throws Exception { int v = 0; for (String term : input) { -v += lookup.lookup(term, onlyMorePopular, num).size(); +v += lookup.lookup(new CharsRef(term), onlyMorePopular, num).size(); This doesn't seem necessary (lookup accepts a CharSequence?). I like the rest, including the CharSequenceish evilness of bytesToCharSequence :) Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213519#comment-13213519 ] Simon Willnauer commented on LUCENE-3807: - bq. This shows up in a number of places. I have mixed feelings about certain types having a comparator and others not having it, but it's minor. well this an indicator if we know something about the order or not. if you get null there is not order specified... bq. BufferingTermFreqIteratorWrapper is a nuisance (buffers in memory). It would be nicer to have a sort on disk if something doesn't support sorted iteration order. this is the ultimate goal... see my comment above ({noformat} next steps would be cleaning up the in-memory sorting stuff and use the sorter which goes to disk to do the actual sorting internally if needed {noformat}) bq. I also wonder float - long = 4 - 8 bytes... would this count as an incompatible API change (because what used to work for a given amount of RAM won't work anymore – BufferingTermFreqIteratorWrapper again)? see above bq. if I remember correctly Math.min/max are intrinsics, so you can afford to be explicit I will upload a new patch - we use this in BytesRefComp too, I think its safe to fix bq. This doesn't seem necessary (lookup accepts a CharSequence?). right - leftover from an old iteration bq. Why not a specific covariant here? that would be Float get(String) or Float get(CharSequence) or... ;) Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212551#comment-13212551 ] Robert Muir commented on LUCENE-3807: - I agree with removing IOException from getComparator. I agree also with moving weight to a long... though currently the e.g. solr integrations take floats as input, so this really needs to be listed as a backwards break since it will directly affect users. I agree with removing the sorted marker interface: its not useful since you don't know the order. However, I don't think we should add the charsref methods... I think Bytes/Ints/CharsRef should have parallel apis and someone can just call unicodeutil: in general these are reference classes not stringbuffers and we shouldn't encourage abuse via sugar apis. I already have an issue open for fixing, cleaning up, and making those APIs consistent. I don't think we should add a generics parameter V to Lookup, especially if LookupResult itself is still wired to float. I do think suggesters should be able to return additional data but this needs more thought: its necessary to actually get the additional data to them. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212636#comment-13212636 ] Robert Muir commented on LUCENE-3807: - I found another bug (added test and committed fix in r1291826). This would cause an exception in solr if someone called build but the field was empty. I think before proceeding with refactoring, we really need to beef up tests. I'll help with this. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212719#comment-13212719 ] Simon Willnauer commented on LUCENE-3807: - bq. I think before proceeding with refactoring, we really need to beef up tests. I will help too while I think the patch is ready... we can go with iterations? Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212720#comment-13212720 ] Simon Willnauer commented on LUCENE-3807: - bq. so this really needs to be listed as a backwards break since it will directly affect users. I will note this in the CHANGES once we are done Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212740#comment-13212740 ] Robert Muir commented on LUCENE-3807: - I like this latest patch! I didnt test it, but i think its fine to move forward. I just mainly want to address the issues of whole suggesters that are not tested at all in this module (LUCENE-3813) before we did too much more. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch, LUCENE-3807.patch, LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211890#comment-13211890 ] Dawid Weiss commented on LUCENE-3807: - +1. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211921#comment-13211921 ] Robert Muir commented on LUCENE-3807: - I like the patch, but only one thing (its fine to commit it as-is though, we can solve this on another issue, i just couldnt help but notice) I don't think we should have the BufferedTermFreqIteratorWrapper/etc and the SortedTermFreqIterator marker interface needs to be fixed. Here are the problems: * Marker interface SortedTermFreqIterator doesn't tell you if its UTF-8 or UTF-16 order. Its implemented by two classes: SortedTermFreqIteratorWrapper, which sorts in UTF-16 order, and HighFrequencyDictionary, which returns terms from the index (so UTF-8 order). The problem is that classes that rely upon sorted order like JaSpell/TST are likely broken already. Fortunately FST/WFST always do their own sort. * Buffering in RAM is not ideal. Instead I think all of these classes should be using our Sort anyway which can spill to disk. For now could we put the BytesRefList in the suggest package since its only used there? we might not need it after we clean up this sorting stuff in some future issue. Also I don't think we should factor out the BytesRefIterator. I seriously think its a bad idea to tie our core index Terms enumeration API with the spellcheck API at this time, it would make it hard to change in the future if we need, especially with spellcheck being... needing work :) Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211926#comment-13211926 ] Robert Muir commented on LUCENE-3807: - Again i just wanted to mention, i think its fine to commit as-is. Consider my comments thinking-out-loud. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212036#comment-13212036 ] Simon Willnauer commented on LUCENE-3807: - I committed the patch to trunk and moved the BytesRefList to the suggest package (pkg private) for now. I keep this issue open as I have more iterations / patches. I will backport before I close this issue ie. batch port to 3.x all commits related to this issue. commit revision is 1291418 Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212090#comment-13212090 ] Robert Muir commented on LUCENE-3807: - I think we should go with a simpler solution for the Buffering/SortedIterator... this has caused a few recent test fails. (latest: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12476/) If we are going to replace it anyway with the Sort that spills to disk, cant we just use a simple treemap? Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212130#comment-13212130 ] Simon Willnauer commented on LUCENE-3807: - just drop it I will replace that stuff with on disk sort tomorrow anyway Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3807) Cleanup suggester API
[ https://issues.apache.org/jira/browse/LUCENE-3807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212430#comment-13212430 ] Simon Willnauer commented on LUCENE-3807: - bq. I think we should go with a simpler solution for the Buffering/SortedIterator... this has caused a few recent test fails. those failures have been fixed.. Cleanup suggester API - Key: LUCENE-3807 URL: https://issues.apache.org/jira/browse/LUCENE-3807 Project: Lucene - Java Issue Type: Improvement Components: modules/other Affects Versions: 3.6, 4.0 Reporter: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3807.patch Currently the suggester api and especially TermFreqIterator don't play that nice with BytesRef and other paradigms we use in lucene, further the java iterator pattern isn't that useful when it gets to work with TermsEnum, BytesRef etc. We should try to clean up this api step by step moving over to BytesRef including the Lookup class and its interface... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org