[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121779#comment-13121779 ] Uwe Schindler commented on LUCENE-1536: --- Hi Chris, hi Male, I was going to bed after my last post. I had a crisis with two facts in the new API, that do no play nicely together. I thought the whole night about it again and I also started to recode some details last evening, but all was not so fine (but I found lots of problems, so it's a good thing that I started to code - especially on several filters that are not so basic like those which only use FixedFitSet/OpenBitSet): # the hidden implementation of Bits is a nice idea, but has one big problem: Java is a strongly-typed language. If a DocIdSet implements Bits, but you want to wrap it using FilteredDocIdSet, this interface implementation might suddenly go away, because the wrapper class does not implement Bits. If we make FilteredDocIdSet implement Bits, its also wrong, as it might wrap another DocIdSet that is not random access. So I tend to keep DocIdSet abstrcat and let it only expose functions that return a Bits interface. The same is that DocIdSet does not directly implement DocIdSetIterator, it can just return one. So I would strongly recommend to add a method like iterator() that returns a impl and not rely on marker interfaces. I would favor Bits DocIdSet.bits() - would be in line with the iterator method. If the implementing class like FixedBitSet implements it itsself and returns this is an implementation detail. If DocIdSet does not allow random access it should expose with an exception thrown by bits or if it returns null. Does not really matter to me. - In general a wrapper like FilteredDocIdSet can do this in one class, wrapping bits() would check if bits() returns non-null, and then wrap another wrapper around bits() that uses match() to filter. The impl of this class is fast and supports both (iterator and bits, if available). # the other thing, I dont like, is the setContainsOnlyLiveDocs setter on DocIdSet. It allows anybody to change the DocIdSet (which should have an API that exposes only read-access). Only classes like FixedBitSet that implement this read-only interface might be able to change it from their own API (means the setter might be in the various DocIdSet implementations in oal.util). A consumer of the filter should not be able to change the DocIdSet behaviour from outside using a public API. I started to rewrite this yesterday and only left the getter in DocIdSet, but added the setter to FixedBitSet, OpenBitSet, DocIdBitSet,... The setter in the abstract base class also violates unmodifiable of EMPTY_DOCIDSET. This impl should be containsOnlyLiveDocs=true) and this must be unchangeable fixed. # Also DocIdSet is a class not really related solely to Filters, e.g. Scorer extends DocIdSetIterator or DocsEnum extends DocIdSetIterator, Solr Facetting uses DocIdSet. DocIdSet is just a holder class for a bunch of documents exposing a iterator (and a Bits API - this is why I want two getter methods and no interface magic)). The existence of live docs is outside it's scope. I therefore would like a similar API like for scorers, so IndexSearcher can ask the Filter for a DocIdSet based on the given liveDocs (like the scorer method in Weights). The returned DocIdSet would not know if it only has live Docs or not (as the Scorer itsself also does not expose this information). CachingWrapperFilter is little bit special, but this one would always ask the wrapped Filter for a DocidSet without deletions and cache that one, but always return a FilteredDocIdSet bringing the liveDocs passed from IndexSearcher in. The cache would then always be without LiveDocs and easier to maintain. Reopening segments would never need to reload cache. CachingWrapperFilter would just decide on the fact if IndexSearcher passes a liveDocs BitSet or not, if it needs to use it or not (in its own getDocIdSet method). If we have a query and only filter some documents, IndexSearcher already knows about liveDocs from the main scorer and would pass null to the filter. This would remove lots of additional checks to liveDocs. Only the main scorer would know about them, the filter will ignore them (so there is no overhead in CachingWrapperFilter, as it can return the cached filter directly to IndexSearcher, without wrapping). QueryWrapperFilter could pass the liveDocs through the wrapped filter, too. I may have time today to implement some parts of this, should not be to difficult. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121786#comment-13121786 ] Chris Male commented on LUCENE-1536: Okay thats alot to take in again. You've made a good case for dropping setContainsOnlyLiveDocs, I totally agree. I really do like the idea of adding the acceptDocs to Filter.getDocIdSet. I'm also comfortable with adding .bits() to DocIdSet to address the typing problem. Should we bash out a quick patch making these changes and see how it looks? if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121790#comment-13121790 ] Uwe Schindler commented on LUCENE-1536: --- +1, I have to revert here a lot again because I was trying to move the setLiveDocsOnly/liveDocsOnly down to FixedBitSet Co, but this is too complicated. Should I start to hack something together? The biuggest change will be in all filter impls to add the parameter to getDocIdSet(). if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121807#comment-13121807 ] Chris Male commented on LUCENE-1536: Yes please put something together and then we'll review / iterate. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-3433: --- Assignee: Simon Willnauer Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121830#comment-13121830 ] Michael McCandless commented on LUCENE-3433: bq. I agree with simon here, can't we spin off a different issue for these? +1: I agree removal of SortedSource is unrelated to this issue. We should discuss it under a new issue (it's obviously contentious), and for this issue do the nice cleanups we all agree on. It shouldn't be removed under this one. One shouldn't have to pay such a high price (uninversion on searcher startup) to sort or group by a string field, which we do today. It's silly to re-invert on every searcher startup when we can sort once during indexing and record that in the doc values, and SortedSource gives us that. Besides the merge RAM usage (which I think is minor) is there a technical/code complexity reason that SortedSource should be removed? Does it somehow require the enums or something? I'm trying to understand how/why it suddenly got coupled into this issue... I think sorting and grouping by string fields are first class functions for Lucene. Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3492) Extract a generic framework for running randomized tests.
Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121831#comment-13121831 ] Michael McCandless commented on LUCENE-3433: bq. I also attached a patch that adds back the sorted source so we can spin off a new issue and make them efficient without writing it from the scratch. Simon, can you invert this patch, and open a new issue for removing SortedSource? Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-3492: Attachment: Screen Shot 2011-10-06 at 12.58.02 PM.png Static fixtures couldn't be handled with a rule, so I've decided to rewrite JUnit Runner instead of subclassing it. Lots of frustration so far, but I like the result :) Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121840#comment-13121840 ] Martijn van Groningen commented on LUCENE-3433: --- bq. I think sorting and grouping by string fields are first class functions for Lucene. And faceting too! Maybe we should have DocTermIndex that is independent of source and have impls for DV and impls for indexed values. Maybe the name DocTermIndex doesn't make sense then, because it suggests that values come from the inverted index which might not be the case. Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121842#comment-13121842 ] Dawid Weiss commented on LUCENE-3492: - I've implemented a runner that follows the basic algorithm given in LUCENE-3489. Basically speaking, seeds for each test run are fixed derivations of a single master seed (used for the runner and all class-level fixtures) and don't rely on the order of invocations or other factors. There's plenty of ways to tweak and tune by overriding class-level @Seed, method-level @Seed. @Repeat gives you control on how many times a given test is executed and whether a seed is reused (constant for each iteration) or randomized (predictably from the start seed). Most of all, everything fits quite nicely in Eclipse (and I hope other GUIs... didn't check Idea or Netbeans though) because each executed test run is nicely described in the runner (full seed), so that you can either click on it and re-run a single test or write down the seed and fix it at runtime. Lots of TODOs in the code, will continue in the evening. Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3262) Facet benchmarking
[ https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121852#comment-13121852 ] Shai Erera commented on LUCENE-3262: bq. ItemSource.resetInputs I don't have that warning turned on in Eclipse. I disabled it for exactly this reason :). bq. ItemSource rename The new name is ok, and the properties better fit it. BTW, if you wanted to have the .algs out there to not silently fail, you could add some code to setConfig that checks for these outdated properties, and throw a proper exception. But I'm ok with the solution you chose. bq. PFD.readers.incRef() The javadocs are good. I'd also add bNOTE:/b if you no longer need that IndexReader/TaxoReader, you should decRef()/close() after calling this method. Otherwise, the IR/TR will just stay open ... Facet benchmarking -- Key: LUCENE-3262 URL: https://issues.apache.org/jira/browse/LUCENE-3262 Project: Lucene - Java Issue Type: New Feature Components: modules/benchmark, modules/facet Reporter: Shai Erera Assignee: Doron Cohen Attachments: CorpusGenerator.java, LUCENE-3262.patch, TestPerformanceHack.java A spin off from LUCENE-3079. We should define few benchmarks for faceting scenarios, so we can evaluate the new faceting module as well as any improvement we'd like to consider in the future (such as cutting over to docvalues, implement FST-based caches etc.). Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here as a starting point. We've also done some preliminary job for extending Benchmark for faceting, so I'll attach it here as well. We should perhaps create a Wiki page where we clearly describe the benchmark scenarios, then include results of 'default settings' and 'optimized settings', or something like that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121859#comment-13121859 ] Robert Muir commented on LUCENE-3433: - {quote} I think sorting and grouping by string fields are first class functions for Lucene. {quote} I disagree: if you aren't sorting by score, then go use a database. Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121860#comment-13121860 ] Simon Willnauer commented on LUCENE-3433: - bq. Simon, can you invert this patch, and open a new issue for removing SortedSource? actually my plan was to have one iterface for now and then open an issue to add back the SortedSource with an impl that we all agree on. Currently, the sorted variants are somewhat flaky and heavy I think we should simply remove it here and then go and work out a plan how to implement this. The technical reason here is simply to rethink the interface, we now have one which is simple so let see what we can do to make this work with sorted variants. simon Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121862#comment-13121862 ] Robert Muir commented on LUCENE-3433: - {quote} +1: I agree removal of SortedSource is unrelated to this issue. We should discuss it under a new issue (it's obviously contentious), and for this issue do the nice cleanups we all agree on. It shouldn't be removed under this one. {quote} Thats not what i meant by spin off a different issue, i think we should spin off a different issue to add back SortedSource. Docvalues really needs to be simplified, Simon has done just that, and I think its great as a part of that that it focuses on what it should be, thats per-document values, not being some precomputed FieldCache. Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1536: -- Attachment: LUCENE-1536-rewrite.patch A first rewrite of Lucene core to pass acceptDocs down to Filter.getDocIdSet: - optimized and simpliefied CachingWrapper* - no deletesmode anymore - FieldCacheTermsFilter has optimized DocIdSet - Added bits() to all DocIdSet - IndexSearcher.searchWithFilter was rewritten to pass liveDocs down. - AndBits is no longer needed The tests are not yet rewritten, still 55 compile errors This patch is just for review if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121877#comment-13121877 ] Robert Muir commented on LUCENE-1536: - {noformat} I therefore would like a similar API like for scorers, so IndexSearcher can ask the Filter for a DocIdSet based on the given liveDocs (like the scorer method in Weights). {noformat} If this is the case, then in the !randomAccess path of indexsearcher.java please pass null as liveDocs. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-1536: Attachment: LUCENE-1536.patch adding back this optimization, again. before committing please give me time to write tests to ensure we aren't losing these optimizations. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121885#comment-13121885 ] Uwe Schindler commented on LUCENE-1536: --- Robert, thanks! I missed this line: {code} Bits acceptDocs = filterContainsLiveDocs ? null : context.reader.getLiveDocs(); {code} As we now always use live docs in filter this would always be null! if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2403) Problem with facet.sort=lex, shards, and facet.mincount
[ https://issues.apache.org/jira/browse/SOLR-2403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2403. Resolution: Fixed Fix Version/s: 3.2 Problem with facet.sort=lex, shards, and facet.mincount --- Key: SOLR-2403 URL: https://issues.apache.org/jira/browse/SOLR-2403 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0 Environment: RHEL5, Ubuntu 10.04 Reporter: Peter Cline Fix For: 3.2 I tested this on a recent trunk snapshot (2/25), haven't verified with 3.1 or 1.4.1. I can if necessary and update. Solr is not returning the proper number of facet values when sorting alphabetically, using distributed search, and using a facet.mincount that excludes some of the values in the first facet.limit values. Easiest explained by example. Sorting alphabetically, the first 20 values for my subject_facet field have few documents. 19 facet values have only 1 document associated, and 1 has 2 documents. There are plenty after that have more than 2. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2 {code} comes back with the expected 20 facet values with = 2 documents associated. If I add a shards parameter that points back to itself, the result is different. {code} http://localhost:8082/solr/select?q=*:*facet=truefacet.field=subject_facetfacet.limit=20facet.sort=lexfacet.mincount=2shards=localhost:8082/solr {code} comes back with only 1 facet value: the single value in the first 20 that had more than 1 document. It appears to me that mincount is ignored when doing the original query to the shards, then applied afterwards. Let me know if you need any more info. Thanks, Peter -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121911#comment-13121911 ] Simon Willnauer commented on LUCENE-3433: - bq. +1 - something contentious should not be removed in an unrelated issue like this. If it's already in, but some want it out, let's make an a new issue to discuss. Once something is in, there should be a clear and dedicated issue discussing it's removal if there is dispute. I don't agree with simply pulling it and putting the onus on those who want it to make an issue to get it back in. there is no dispute here If you'd have looked at the API and the code you'd know what I talk about though. We cut over to a new api where sorted source doesn't fit in nicely. We had ValuesEnum used for merging as the LCD between SortedSource and Source. Now we only have Source as a RandomAccess API. To keep this at a reasonable size we should try to add the missing part in a different issue. We should also rethink how to merge sortd sources since they are quite mem heavy (I think this is a potential issue). Adding back SortedSource is going to be tough without a new issue since lots of stuff has changed. To make this dev process easier backing out and re-adding seems best to me. Don't worry we gonna add it back though. Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121918#comment-13121918 ] Mark Miller commented on LUCENE-3433: - bq. there is no dispute here If there is no dispute, what exactly is Mike talking about? Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121935#comment-13121935 ] Robert Muir commented on LUCENE-3433: - Here is the way i see the problem: * Currently docvalues is a bit confusing... I think a lot of this is due to the current API * no offense to simon but i think in a way this forces him to feel responsible for doing all work on it. The complexity makes it hard for others to get involved. * with this patch, the api becomes a lot simpler: i'm sure its not perfect but the API seems to correspond to what DV does, at least it makes sense to me. can we temporarily drop SortedSource, open a new issue to add it back (mark it blocker for 4.0 even?!). This way, we can rethink how to implement this functionality (maybe it doesnt even belong as docvalues but something on top of it, or something else entirely). Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121954#comment-13121954 ] Shai Erera commented on LUCENE-3492: This is only for debugging from an IDE right? It does not replace tests.iter and tests.seed? It looks very cool. It also adds a risk that someone will accidentally commit tests with these annotations. So perhaps we should add pre-commit hooks, or a test that scans all test files and ensures those annotations do not exist? Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2218) Performance of start= and rows= parameters are exponentially slow with large data sets
[ https://issues.apache.org/jira/browse/SOLR-2218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved SOLR-2218. --- Resolution: Duplicate Dup of SOLR-1726 Performance of start= and rows= parameters are exponentially slow with large data sets -- Key: SOLR-2218 URL: https://issues.apache.org/jira/browse/SOLR-2218 Project: Solr Issue Type: Improvement Components: Build Affects Versions: 1.4.1 Reporter: Bill Bell With large data sets, 10M rows. Setting start=large number and rows=large numbers is slow, and gets slower the farther you get from start=0 with a complex query. Random also makes this slower. Would like to somehow make this performance faster for looping through large data sets. It would be nice if we could pass a pointer to the result set to loop, or support very large rows=number. Something like: rows=1000 start=0 spointer=string_my_query_1 Then within interval (like 5 mins) I can reference this loop: Something like: rows=1000 start=1000 spointer=string_my_query_1 What do you think? Since the data is too great the cache is not helping. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2372) Upgrade Solr to Tika 0.10
[ https://issues.apache.org/jira/browse/SOLR-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121962#comment-13121962 ] Jan Høydahl commented on SOLR-2372: --- Also fixed the dot.classpath for eclipse so that the new Tika jars are found Upgrade Solr to Tika 0.10 - Key: SOLR-2372 URL: https://issues.apache.org/jira/browse/SOLR-2372 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction) Reporter: Grant Ingersoll Assignee: Jan Høydahl Fix For: 3.5, 4.0 as the title says -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1536: -- Attachment: LUCENE-1536-rewrite.patch Here a patch with almost all core tests rewritten (I left out the CachingWrapper tests, as I nuked DeletesMode). Its just for demonstartion. Some tests have really stupid filters and work only with optimized indexes. I added asserts in those filters (except one), that acceptDocs==null. The remaining one uses QueryUtils and I have no idea whats going on there, that the acceptDocs!=null. When looking at the code in IndexSearcher, I would propose to remove all Filter special handling in IndexSaercher and move all code over to FilteredQuery (with all our optimizations). If you call IS.search(query, filter,...), IndexSearcher would simply wrap with FilteredQuery and we would have no code duplication and much easier maintainability in IS. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121976#comment-13121976 ] Robert Muir commented on LUCENE-1536: - {quote} When looking at the code in IndexSearcher, I would propose to remove all Filter special handling in IndexSaercher and move all code over to FilteredQuery (with all our optimizations). If you call IS.search(query, filter,...), IndexSearcher would simply wrap with FilteredQuery and we would have no code duplication and much easier maintainability in IS. {quote} +1 Also, we can nuke AndBits.java now? if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121978#comment-13121978 ] Uwe Schindler commented on LUCENE-1536: --- bq. Also, we can nuke AndBits.java now? It was nuked here, but still made it into the patch :( if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2813) TrieTokenizerFactory should catch NumberFormatException, return 400 (not 500)
TrieTokenizerFactory should catch NumberFormatException, return 400 (not 500) - Key: SOLR-2813 URL: https://issues.apache.org/jira/browse/SOLR-2813 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: 4.0 Environment: 4.0 trunk, snapshot taken 09/08/2011. Reporter: Jeff Crump Priority: Minor TrieTokenizerFactory is allowing bad user input to result in a 500 error rather than a 400. For a long-valued field, for example, this code in TrieTokenizerFactory.reset() will throw a NumberFormatException: case LONG: ts.setLongValue(Long.parseLong(v)); break; The NFE gets all the way out to RequestHandlerBase.handleRequest(): catch (Exception e) { SolrException.log(SolrCore.log,e); if (e instanceof ParseException) { e = new SolrException(SolrException.ErrorCode.BAD_REQUEST, e); } but is not caught here, and ends up coming out of SolrDispatchFilter.sendError as a 500. Simply catching NFE and turning it into a SolrException does the trick: solr/core/src/java/org/apache/solr/analysis/TrieTokenizerFactory.java#1 - /4.0-trunk-09082011/solr/core/src/java/org/apache/solr/analysis/TrieTokenizerFactory.java 110a111,112 } catch (NumberFormatException e) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, Unable to parse input, e); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3493) Solr reopen on a custom reader doesn't work
Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-3493: - Attachment: LUCENE-3493.patch Patch with unit test demonstrating the bug. The fix required in Lucene is randomly in the patch as well. I'll post another patch showing the Lucene fix, allows fixing the bug on the Solr side. Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3494) Remove per-document multiply in FilteredQuery
Remove per-document multiply in FilteredQuery - Key: LUCENE-3494 URL: https://issues.apache.org/jira/browse/LUCENE-3494 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3494.patch Spinoff of LUCENE-1536. In LUCENE-1536, Uwe suggested using FilteredQuery under-the-hood to implement filtered search. But this query is inefficient, it does a per-document multiplication (wrapped.score() * boost()). Instead, it should just pass the boost down in its weight, like BooleanQuery does to avoid this per-document multiply. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3494) Remove per-document multiply in FilteredQuery
[ https://issues.apache.org/jira/browse/LUCENE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3494: Attachment: LUCENE-3494.patch Remove per-document multiply in FilteredQuery - Key: LUCENE-3494 URL: https://issues.apache.org/jira/browse/LUCENE-3494 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3494.patch Spinoff of LUCENE-1536. In LUCENE-1536, Uwe suggested using FilteredQuery under-the-hood to implement filtered search. But this query is inefficient, it does a per-document multiplication (wrapped.score() * boost()). Instead, it should just pass the boost down in its weight, like BooleanQuery does to avoid this per-document multiply. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13121996#comment-13121996 ] David Smiley commented on SOLR-2155: Frederick, a rough inspection of your problem suggests that the GeoHashField is declared multiValue=true but the field in your POJO is not correspondingly a ListString like it should be. If you only need a single value then I suggest you use LatLonType instead, since it's what comes with Solr. Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
PagedBytes additional method
PagedBytes is great! Even better would be a couple of additional methods, one to write it out to an IndexOutput and the other for the total bytes used. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: PagedBytes additional method
why don't you open an issue for this? thanks, simon On Thu, Oct 6, 2011 at 5:33 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: PagedBytes is great! Â Even better would be a couple of additional methods, one to write it out to an IndexOutput and the other for the total bytes used. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: PagedBytes additional method
I try not to without having a patch somewhat prepared! On Thu, Oct 6, 2011 at 11:38 AM, Simon Willnauer simon.willna...@googlemail.com wrote: why don't you open an issue for this? thanks, simon On Thu, Oct 6, 2011 at 5:33 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: PagedBytes is great! Â Even better would be a couple of additional methods, one to write it out to an IndexOutput and the other for the total bytes used. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3494) Remove per-document multiply in FilteredQuery
[ https://issues.apache.org/jira/browse/LUCENE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122010#comment-13122010 ] Uwe Schindler commented on LUCENE-3494: --- +1, commit this so i can move forward with 1536! Thanks for help!!! Remove per-document multiply in FilteredQuery - Key: LUCENE-3494 URL: https://issues.apache.org/jira/browse/LUCENE-3494 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3494.patch Spinoff of LUCENE-1536. In LUCENE-1536, Uwe suggested using FilteredQuery under-the-hood to implement filtered search. But this query is inefficient, it does a per-document multiplication (wrapped.score() * boost()). Instead, it should just pass the boost down in its weight, like BooleanQuery does to avoid this per-document multiply. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122011#comment-13122011 ] Mark Miller commented on LUCENE-3493: - I have a couple questions: If the bug is in Lucene, shouldn't we write a test at the Lucene level? What exactly is the bug? That when you subclass DirectoryReader, it doesn't return that subclass from reopen? If this is the desired behavior, isn't it up to the subclass to override reopen? Also, you say the required lucene fix is randomly in the patch, but also that you will post another patch showing the lucene fix - I don't see it in the patch, so I assume its coming, but the only change I see is making some Lucene constructors public - we shouldn't likely do that just for this Solr test I think. Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] Asking about Lucene.net License
The answer is no. None of the code within Lucene.Net is GPL. All Apache products are under the Apache License. -Original Message- From: Ron Grabowski [mailto:rongrabow...@yahoo.com] Sent: Thursday, October 06, 2011 12:11 AM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Asking about Lucene.net License I think he was asking if any of the code within Lucene.Net is GPL. From: Scott Lombard lombardena...@gmail.com To: lucene-net-...@lucene.apache.org Sent: Wednesday, October 5, 2011 5:08 PM Subject: RE: [Lucene.Net] Asking about Lucene.net License Asha, Lucene.net is an Apache Incubator project and is only distributed under an Apache License version 2. If you are using a GPL 3 License then there is documented compatibility between the licenses as described on the page http://www.apache.org/licenses/GPL-compatibility.html. Give this compatibility you can include Lucene.net in a GPL 3 project. I am not sure how on all of the mechanics of this inclusion but it can be done. Scott -Original Message- From: Asha Kang [mailto:stereo...@gmail.com] Sent: Tuesday, October 04, 2011 8:57 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] Asking about Lucene.net License Hi this is Asha Kang from Korea. the reason i`m writing this email is that I`d like to make sure which license Lucene.net is following. Now I`m developing Search Engine by using Lucene.net. As I know, Lucene.net follows apache licene 2.0. but my co-woker told me that some classes included in lucene.net`s dll could follow GPL License. So now I`m confused. Are there any classes following GPL LICENSE? Do I need to follow two license apache license 2.0 and GPL LICENSE? I'm looking forward to replying from you. bset regards Asha Kang.
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122019#comment-13122019 ] Jason Rutherglen commented on LUCENE-3493: -- The patch shows the bug only. Which needs a test in Solr. The next patch will show the fix etc. A Lucene test makes sense as well. Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122021#comment-13122021 ] Uwe Schindler commented on LUCENE-3493: --- This is not a bug at all: Your custom IndexReader has to override reopen() and return your own implementation. Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3262) Facet benchmarking
[ https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3262: Attachment: LUCENE-3262.patch Updated patch with a test, more javadocs, and a comment as Shai suggested. I think this is ready to commit. More tests are needed, and also Search with facets is missing, but that can go in a separate issue. Facet benchmarking -- Key: LUCENE-3262 URL: https://issues.apache.org/jira/browse/LUCENE-3262 Project: Lucene - Java Issue Type: New Feature Components: modules/benchmark, modules/facet Reporter: Shai Erera Assignee: Doron Cohen Attachments: CorpusGenerator.java, LUCENE-3262.patch, LUCENE-3262.patch, TestPerformanceHack.java A spin off from LUCENE-3079. We should define few benchmarks for faceting scenarios, so we can evaluate the new faceting module as well as any improvement we'd like to consider in the future (such as cutting over to docvalues, implement FST-based caches etc.). Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here as a starting point. We've also done some preliminary job for extending Benchmark for faceting, so I'll attach it here as well. We should perhaps create a Wiki page where we clearly describe the benchmark scenarios, then include results of 'default settings' and 'optimized settings', or something like that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122033#comment-13122033 ] Jason Rutherglen commented on LUCENE-3493: -- Uwe, I'd like to agree with you, however I cannot (because then I wouldn't have had to create an issue!). Look at DR.doOpen* methods. They're private. There's no reason for them to be. They need to be protected, that's in the next patch. Fairly simple. The follow on to this is overriding IW to return custom readers. I had an issue and patch for that a while back. It's best to implement both here, as Lucene 4.x Solr's NRT will show the same problem! I think you're right, looks like this *could* be done be overriding doOpenIfChanged* however, it doesn't make sense to duplicate code! Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122036#comment-13122036 ] Yonik Seeley commented on LUCENE-3493: -- Implementing your own IndexReader has always been a very tricky endeavor, esp wrt maintainability... super-expert only. One of the reasons I was glad to get rid of SolrIndexReader (the fragile base class problem). Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3433: Attachment: LUCENE-3433.patch here we go.. the make everybody happy patch! I added SortedSource back and integrated it into the Source pattern for random access. we now have an entirely disk resident SortedSource impl for both variants and a single interface in the first place. SortedSource instance can be obtained via Source#asSortedSource() which returns null if the source is not sorted. With this random access DirectSortedSource we can also improve the merging for sorted sources which was one of my major issues here. Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3494) Remove per-document multiply in FilteredQuery
[ https://issues.apache.org/jira/browse/LUCENE-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3494. - Resolution: Fixed Assignee: Robert Muir Remove per-document multiply in FilteredQuery - Key: LUCENE-3494 URL: https://issues.apache.org/jira/browse/LUCENE-3494 Project: Lucene - Java Issue Type: Improvement Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3494.patch Spinoff of LUCENE-1536. In LUCENE-1536, Uwe suggested using FilteredQuery under-the-hood to implement filtered search. But this query is inefficient, it does a per-document multiplication (wrapped.score() * boost()). Instead, it should just pass the boost down in its weight, like BooleanQuery does to avoid this per-document multiply. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122040#comment-13122040 ] Mark Miller commented on LUCENE-3493: - That clears things up a bit Jason. The title and patch don't really explain the issue. bq. as Lucene 4.x Solr's NRT will show the same problem! How is that? Solr's NRT does not rely on a custom IndexReader (and if it did, I imagine we would make that properly override doOpenIfChanged, else it would be a bug)? Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122050#comment-13122050 ] Jason Rutherglen commented on LUCENE-3493: -- bq. Solr's NRT does not rely on a custom IndexReader Yikes, logically the custom reader functionality should! {quote}properly override doOpenIfChanged, else it would be a bug{quote} It's a bug because there's no way to implement that today. The DirectoryReader is created deep inside of IW.getReader (there's no way to re-implement it's functionality either because of private variable access). I think we need a protected method for creating reader in IW. I think though this becomes almost endless because I don't think there's a way to implement a custom IW in Solr. Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122056#comment-13122056 ] Mark Miller commented on LUCENE-3493: - bq. Yikes, logically the custom reader functionality should! Okay, I see - you also want your Reader impl to be pulled from IW when using NRT. But as you allude to below, you would need a custom IndexWriter to do that - that is where we get the IndexReader from for NRT. That's a scary road to start down currently (or as you say, endless). bq. I don't think there's a way to implement a custom IW in Solr. Would be pretty advanced stuff, but what we would likely have to do is allow users to provide alternate SolrCoreState impls (currently the DefaultSolrCoreState impl is simply used). This would let you manage what IndexWriter impl was used. Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122068#comment-13122068 ] Dawid Weiss commented on LUCENE-3492: - Hi Shai. This is definitely not only for debugging. For example we use randomized testing inside CarrotSearch to test algorithmic/ combinatorial code. Once you hit a bug, you simply copy the test case (or a call to a common test case method) and fix the seed to have a regression test for the future (so that you know you're not failing examples that previously failed). So, for example: {code} @Test @Seed(23095324) public void runFixedRegression_1 { doSomethingWithRandoms(); } @Test @Seed(239735923) public void runFixedRegression_1 { doSomethingWithRandoms(); } @Test public void runRandomized { doSomethingWithRandoms(); } {code} This is a scenario I really came to like. It's a bit like your tests write themselves for you :) I left system properties for fixing seeds and enforcing repetition number because they are currently in Lucene, although I personally don't like them that much (because they affect everything globally). I do understand they're useful for quick hacking without recompiling stuff or for remote executions, but I'd much rather have something like -Dseed.testClass[.method]= which would affect only a single class or method rather than everything. The same can be done for filtering which method/ test case to execute. This is debatable of course and a matter of personal taste. I should publish what I have tonight on github (I'm moving certain things out of our proprietary codebase and there are JUnit corner cases that slow things down). Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122068#comment-13122068 ] Dawid Weiss edited comment on LUCENE-3492 at 10/6/11 5:04 PM: -- Hi Shai. This is definitely not only for debugging. For example we use randomized testing inside CarrotSearch to test algorithmic/ combinatorial code. Once you hit a bug, you simply copy the test case (or a call to a common test case method) and fix the seed to have a regression test for the future (so that you know you're not failing examples that previously failed). So, for example: {code} @Test @Seed(23095324) public void runFixedRegression_1 { doSomethingWithRandoms(); } @Test @Seed(239735923) public void runFixedRegression_2 { doSomethingWithRandoms(); } @Test public void runRandomized { doSomethingWithRandoms(); } {code} This is a scenario I really came to like. It's a bit like your tests write themselves for you :) I left system properties for fixing seeds and enforcing repetition number because they are currently in Lucene, although I personally don't like them that much (because they affect everything globally). I do understand they're useful for quick hacking without recompiling stuff or for remote executions, but I'd much rather have something like -Dseed.testClass[.method]= which would affect only a single class or method rather than everything. The same can be done for filtering which method/ test case to execute. This is debatable of course and a matter of personal taste. I should publish what I have tonight on github (I'm moving certain things out of our proprietary codebase and there are JUnit corner cases that slow things down). was (Author: dweiss): Hi Shai. This is definitely not only for debugging. For example we use randomized testing inside CarrotSearch to test algorithmic/ combinatorial code. Once you hit a bug, you simply copy the test case (or a call to a common test case method) and fix the seed to have a regression test for the future (so that you know you're not failing examples that previously failed). So, for example: {code} @Test @Seed(23095324) public void runFixedRegression_1 { doSomethingWithRandoms(); } @Test @Seed(239735923) public void runFixedRegression_1 { doSomethingWithRandoms(); } @Test public void runRandomized { doSomethingWithRandoms(); } {code} This is a scenario I really came to like. It's a bit like your tests write themselves for you :) I left system properties for fixing seeds and enforcing repetition number because they are currently in Lucene, although I personally don't like them that much (because they affect everything globally). I do understand they're useful for quick hacking without recompiling stuff or for remote executions, but I'd much rather have something like -Dseed.testClass[.method]= which would affect only a single class or method rather than everything. The same can be done for filtering which method/ test case to execute. This is debatable of course and a matter of personal taste. I should publish what I have tonight on github (I'm moving certain things out of our proprietary codebase and there are JUnit corner cases that slow things down). Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122089#comment-13122089 ] Jason Rutherglen commented on LUCENE-3493: -- Uwe, I tried your idea. It doesn't work! Here's why: DR.writeLock and DR.segmentInfos are private. Meaning the re-duplicated code because the useful methods aren't protected, cannot access these private variables. Of course one can use reflection but that's just 'atrocious'. :) Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122095#comment-13122095 ] Jason Rutherglen commented on LUCENE-3493: -- One way to solve all of this without subclassing, is to move the IndexReaderFactory to Lucene, integrate it into IW and DR. That will be much cleaner than forcing users to subclass, which is a monstrous pain, and will generate excessive unnecessary code in the end. Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3495) BlockJoinQuery doesn't implement boost
BlockJoinQuery doesn't implement boost -- Key: LUCENE-3495 URL: https://issues.apache.org/jira/browse/LUCENE-3495 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.4 Reporter: Robert Muir Fix For: 3.5, 4.0 After reviewing LUCENE-3494, i checked other queries and noticed that BlockJoinQuery currently throws UOE for getBoost and setBoost: {noformat} throw new UnsupportedOperationException(this query cannot support boosting; please use childQuery.setBoost instead); {noformat} I don't think we can safely do that in queries, because other parts of lucene rely upon this working... for example BQs rewrite when it has a single clause and erases itself. So I think we should just pass down the boost to the inner weight. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3495) BlockJoinQuery doesn't implement boost
[ https://issues.apache.org/jira/browse/LUCENE-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3495: Attachment: LUCENE-3495.patch BlockJoinQuery doesn't implement boost -- Key: LUCENE-3495 URL: https://issues.apache.org/jira/browse/LUCENE-3495 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.4 Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3495.patch After reviewing LUCENE-3494, i checked other queries and noticed that BlockJoinQuery currently throws UOE for getBoost and setBoost: {noformat} throw new UnsupportedOperationException(this query cannot support boosting; please use childQuery.setBoost instead); {noformat} I don't think we can safely do that in queries, because other parts of lucene rely upon this working... for example BQs rewrite when it has a single clause and erases itself. So I think we should just pass down the boost to the inner weight. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3495) BlockJoinQuery doesn't implement boost
[ https://issues.apache.org/jira/browse/LUCENE-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122128#comment-13122128 ] Michael McCandless commented on LUCENE-3495: +1 looks good! BlockJoinQuery doesn't implement boost -- Key: LUCENE-3495 URL: https://issues.apache.org/jira/browse/LUCENE-3495 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.4 Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3495.patch After reviewing LUCENE-3494, i checked other queries and noticed that BlockJoinQuery currently throws UOE for getBoost and setBoost: {noformat} throw new UnsupportedOperationException(this query cannot support boosting; please use childQuery.setBoost instead); {noformat} I don't think we can safely do that in queries, because other parts of lucene rely upon this working... for example BQs rewrite when it has a single clause and erases itself. So I think we should just pass down the boost to the inner weight. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122130#comment-13122130 ] Michael McCandless commented on LUCENE-3433: Thanks Simon! I'll look through the patch... it's a great cleanup. Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3262) Facet benchmarking
[ https://issues.apache.org/jira/browse/LUCENE-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122134#comment-13122134 ] Shai Erera commented on LUCENE-3262: bq. I think this is ready to commit. +1. Perhaps just add a CHANGES entry? bq. but that can go in a separate issue I think it's better if we resolve it in that issue, and maybe rename the issue to Facet benchmarking framework. You can still commit the current progress because it is 'whole' - covering the indexing side. I've worked on issues before that had several commits, so this will not be the first one. We should also run some benchmark tests, describing clearly the data sets, but this can be done under a separate issue. Facet benchmarking -- Key: LUCENE-3262 URL: https://issues.apache.org/jira/browse/LUCENE-3262 Project: Lucene - Java Issue Type: New Feature Components: modules/benchmark, modules/facet Reporter: Shai Erera Assignee: Doron Cohen Attachments: CorpusGenerator.java, LUCENE-3262.patch, LUCENE-3262.patch, TestPerformanceHack.java A spin off from LUCENE-3079. We should define few benchmarks for faceting scenarios, so we can evaluate the new faceting module as well as any improvement we'd like to consider in the future (such as cutting over to docvalues, implement FST-based caches etc.). Toke attached a preliminary test case to LUCENE-3079, so I'll attach it here as a starting point. We've also done some preliminary job for extending Benchmark for faceting, so I'll attach it here as well. We should perhaps create a Wiki page where we clearly describe the benchmark scenarios, then include results of 'default settings' and 'optimized settings', or something like that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122140#comment-13122140 ] Shai Erera commented on LUCENE-3492: Ok I get the point now. But I still think we should have specific unit tests that reproduce specific scenarios, than using some monstrous tests that happened to stumble on a seed that revealed a bug. If however the scenario cannot be reproduced deterministically, then I agree that this framework is powerful and useful. Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122148#comment-13122148 ] Dawid Weiss commented on LUCENE-3492: - Sure, absolutely. In our (mostly algorithmic, mind you) experience even small test cases can be randomized and then it is really duplicated effort to re-write them for a particular bug scenario (the tests are often simple, the data changes). But sure: the simpler the test, the better. Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122153#comment-13122153 ] Robert Muir commented on LUCENE-3492: - I agree too. one difficulty with using @seed or something is our seeds quickly become out of date because we are often adding more randomization to our testing framework (e.g. additional craziness to randomindexwriter, searchers, analyzer, whatever) Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3495) BlockJoinQuery doesn't implement boost
[ https://issues.apache.org/jira/browse/LUCENE-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3495. - Resolution: Fixed Assignee: Robert Muir BlockJoinQuery doesn't implement boost -- Key: LUCENE-3495 URL: https://issues.apache.org/jira/browse/LUCENE-3495 Project: Lucene - Java Issue Type: Bug Affects Versions: 3.4 Reporter: Robert Muir Assignee: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3495.patch After reviewing LUCENE-3494, i checked other queries and noticed that BlockJoinQuery currently throws UOE for getBoost and setBoost: {noformat} throw new UnsupportedOperationException(this query cannot support boosting; please use childQuery.setBoost instead); {noformat} I don't think we can safely do that in queries, because other parts of lucene rely upon this working... for example BQs rewrite when it has a single clause and erases itself. So I think we should just pass down the boost to the inner weight. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122165#comment-13122165 ] Dawid Weiss commented on LUCENE-3492: - That's why I mentioned I would like this to become _generally_ useful, not only restricted to Lucene/Solr :) If we make it work for two projects (Carrot2 and Lucene) chances are the outcome will be flexible enough to use elsewhere. I'm not saying you must fix the seeds using annotations -- it's an option. Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2765) Shard/Node states
[ https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jamie Johnson updated SOLR-2765: Attachment: shard-roles.patch Shard/Node states - Key: SOLR-2765 URL: https://issues.apache.org/jira/browse/SOLR-2765 Project: Solr Issue Type: Sub-task Components: SolrCloud, update Reporter: Yonik Seeley Fix For: 4.0 Attachments: shard-roles.patch Need state for shards that indicate they are recovering, active/enabled, or disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2765) Shard/Node states
[ https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122185#comment-13122185 ] Jamie Johnson commented on SOLR-2765: - Yonik, I had a need to have this role capability so I could dynamically add/remove/discover solr instances and their responsibility as the state of the cloud changed. To do this I added the following snippet to ZKController.java, CoreContainer.java and CloudDescriptor.java to incorporate this information. Now in solr.xml you define the following: core name=coreName instanceDir=. shard=shard1 collection=collection roles=searcher,indexer/ I've attached the patch for comment (wasn't done against trunk but I can try to pull that down and do it there if necessary). Shard/Node states - Key: SOLR-2765 URL: https://issues.apache.org/jira/browse/SOLR-2765 Project: Solr Issue Type: Sub-task Components: SolrCloud, update Reporter: Yonik Seeley Fix For: 4.0 Attachments: shard-roles.patch Need state for shards that indicate they are recovering, active/enabled, or disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2765) Shard/Node states
[ https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122196#comment-13122196 ] Mark Miller commented on SOLR-2765: --- This is where incremental update of the cloud state gets tricky... If you have something like these roles at the shard level, all of a sudden you cannot change them on the fly because the new incremental update will not pick them up. Its a tricky situation - without incremental, things start to get nasty at a huge number of shards. One possibility is that everyone also watches another node, that when pinged, causes a full read - so that must cloud state updates are incremental, but when per shard info like this is changed on the fly, you can then trigger a full read by everyone... Shard/Node states - Key: SOLR-2765 URL: https://issues.apache.org/jira/browse/SOLR-2765 Project: Solr Issue Type: Sub-task Components: SolrCloud, update Reporter: Yonik Seeley Fix For: 4.0 Attachments: shard-roles.patch Need state for shards that indicate they are recovering, active/enabled, or disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2765) Shard/Node states
[ https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122196#comment-13122196 ] Mark Miller edited comment on SOLR-2765 at 10/6/11 7:54 PM: This is where incremental update of the cloud state gets tricky... If you have something like these roles at the shard level, all of a sudden you cannot change them on the fly because the new incremental update will not pick them up. Its a tricky situation - without incremental, things start to get nasty at a huge number of shards. One possibility is that everyone also watches another node, that when pinged, causes a full read - so that most cloud state updates are incremental, but when per shard info like this is changed on the fly, you can then trigger a full read by everyone... was (Author: markrmil...@gmail.com): This is where incremental update of the cloud state gets tricky... If you have something like these roles at the shard level, all of a sudden you cannot change them on the fly because the new incremental update will not pick them up. Its a tricky situation - without incremental, things start to get nasty at a huge number of shards. One possibility is that everyone also watches another node, that when pinged, causes a full read - so that must cloud state updates are incremental, but when per shard info like this is changed on the fly, you can then trigger a full read by everyone... Shard/Node states - Key: SOLR-2765 URL: https://issues.apache.org/jira/browse/SOLR-2765 Project: Solr Issue Type: Sub-task Components: SolrCloud, update Reporter: Yonik Seeley Fix For: 4.0 Attachments: shard-roles.patch Need state for shards that indicate they are recovering, active/enabled, or disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2814) Core names that contain a - fail in new Admin Gui
Core names that contain a - fail in new Admin Gui --- Key: SOLR-2814 URL: https://issues.apache.org/jira/browse/SOLR-2814 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: Working with Solr 4 trunk Reporter: Eric Pugh Priority: Minor If you have a core with a - in the name, any clicks on it in the new web GUI seem to be ignored. A core named uspatentgrant works fine, but a core named us-patent-grant isn't openable in the GUI. Nothing is logged in the solr output either. I will attach a screenshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2814) Core names that contain a - fail in new Admin Gui
[ https://issues.apache.org/jira/browse/SOLR-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Pugh updated SOLR-2814: Attachment: solr-admin.png screenshot showing admin gui Core names that contain a - fail in new Admin Gui --- Key: SOLR-2814 URL: https://issues.apache.org/jira/browse/SOLR-2814 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: Working with Solr 4 trunk Reporter: Eric Pugh Priority: Minor Attachments: solr-admin.png If you have a core with a - in the name, any clicks on it in the new web GUI seem to be ignored. A core named uspatentgrant works fine, but a core named us-patent-grant isn't openable in the GUI. Nothing is logged in the solr output either. I will attach a screenshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3483) Move Function grouping collectors from Solr to grouping module
[ https://issues.apache.org/jira/browse/LUCENE-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen resolved LUCENE-3483. --- Resolution: Fixed Move Function grouping collectors from Solr to grouping module -- Key: LUCENE-3483 URL: https://issues.apache.org/jira/browse/LUCENE-3483 Project: Lucene - Java Issue Type: Improvement Components: modules/grouping Affects Versions: 4.0 Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.0 Attachments: LUCENE-3483.patch, LUCENE-3483.patch, LUCENE-3483.patch Move the Function*Collectors from Solr (inside Grouping source file) to grouping module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3483) Move Function grouping collectors from Solr to grouping module
[ https://issues.apache.org/jira/browse/LUCENE-3483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1310#comment-1310 ] Martijn van Groningen commented on LUCENE-3483: --- Committed in r1179808 Move Function grouping collectors from Solr to grouping module -- Key: LUCENE-3483 URL: https://issues.apache.org/jira/browse/LUCENE-3483 Project: Lucene - Java Issue Type: Improvement Components: modules/grouping Affects Versions: 4.0 Reporter: Martijn van Groningen Assignee: Martijn van Groningen Priority: Minor Fix For: 4.0 Attachments: LUCENE-3483.patch, LUCENE-3483.patch, LUCENE-3483.patch Move the Function*Collectors from Solr (inside Grouping source file) to grouping module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2814) Core names that contain a - fail in new Admin Gui
[ https://issues.apache.org/jira/browse/SOLR-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122233#comment-13122233 ] Martijn van Groningen commented on SOLR-2814: - I also noticed this today. I didn't know that this was the problem. Now that I have renamed core I know it is. The core with a dash does exist in Solr but it isn't possible to interact with core via the new gui. Core names that contain a - fail in new Admin Gui --- Key: SOLR-2814 URL: https://issues.apache.org/jira/browse/SOLR-2814 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: Working with Solr 4 trunk Reporter: Eric Pugh Priority: Minor Attachments: solr-admin.png If you have a core with a - in the name, any clicks on it in the new web GUI seem to be ignored. A core named uspatentgrant works fine, but a core named us-patent-grant isn't openable in the GUI. Nothing is logged in the solr output either. I will attach a screenshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2814) Core names that contain a - fail in new Admin Gui
[ https://issues.apache.org/jira/browse/SOLR-2814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122237#comment-13122237 ] Eric Pugh commented on SOLR-2814: - Much better description of the behavior of the bug! Core names that contain a - fail in new Admin Gui --- Key: SOLR-2814 URL: https://issues.apache.org/jira/browse/SOLR-2814 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 4.0 Environment: Working with Solr 4 trunk Reporter: Eric Pugh Priority: Minor Attachments: solr-admin.png If you have a core with a - in the name, any clicks on it in the new web GUI seem to be ignored. A core named uspatentgrant works fine, but a core named us-patent-grant isn't openable in the GUI. Nothing is logged in the solr output either. I will attach a screenshot. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2765) Shard/Node states
[ https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122277#comment-13122277 ] Jamie Johnson commented on SOLR-2765: - Yeah 100% agree. The current implementation of update doesn't check to see if the data in the node changed, you'd need a watcher on each node to do that. The other project that I'm working on does just that. We create a watcher on /live_nodes to track the list of available servers, we create a watch on the collection to see if a slice was added/removed, we create a watcher on each slice (not sure if that is the correct terminology) to check if a shard is added/removed and subsequently a watcher on each shard to track data changes. So lots of watchers all around. Would it be easier to store this information on the ephemeral nodes (under live_nodes)? Then we only need a watcher for live_nodes (add/remove) and a watcher for each shard under live_nodes to see if their data changed. I'm not sure what else is using the collection hierarchy (just query?), but perhaps would be a bit simpler. Shard/Node states - Key: SOLR-2765 URL: https://issues.apache.org/jira/browse/SOLR-2765 Project: Solr Issue Type: Sub-task Components: SolrCloud, update Reporter: Yonik Seeley Fix For: 4.0 Attachments: shard-roles.patch Need state for shards that indicate they are recovering, active/enabled, or disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2815) Fields with a - in the name are interpreted as functions in the fl= parameter.
Fields with a - in the name are interpreted as functions in the fl= parameter. Key: SOLR-2815 URL: https://issues.apache.org/jira/browse/SOLR-2815 Project: Solr Issue Type: Bug Components: search Affects Versions: 4.0 Environment: Using latest from trunk Reporter: Eric Pugh If you query for a field that has a - character in the name, you get odd results. I took the example schema and added a field called in-stock to go along with the existing inStock field. A query for http://localhost:8983/solr/select?q=*:*fl=id,in-stock throws back an error saying the field in can't be found. I can sort of work around it by quoting the field name as in-stock: http://localhost:8983/solr/select?q=*:*fl=id,%22in-stock%22rows=1 However the output is still off: doc str name=idGB18030TEST/str str name=in-stockin-stock/str /doc In looking at it, I think the dash character causes the field name to be interpreted as an actual function! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3433) Random access non RAM resident IndexDocValues (CSF)
[ https://issues.apache.org/jira/browse/LUCENE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122312#comment-13122312 ] Michael McCandless commented on LUCENE-3433: Patch is awesome Simon; thank you. Only thing I noticed: can you fix SortedSource.numOrds back to .getValueCount? +1 to commit! Random access non RAM resident IndexDocValues (CSF) --- Key: LUCENE-3433 URL: https://issues.apache.org/jira/browse/LUCENE-3433 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Yonik Seeley Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, LUCENE-3433.patch, sorted_source.patch There should be a way to get specific IndexDocValues by going through the Directory rather than loading all of the values into memory. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3488) Factor out SearcherManager from NRTManager
[ https://issues.apache.org/jira/browse/LUCENE-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122314#comment-13122314 ] Michael McCandless commented on LUCENE-3488: I still see a few javadoc warnings... but otherwise +1 to commit; what a great simplification. It's nice that you can again pass either a Directory or Writer to SearcherManager as your source for new readers... Factor out SearcherManager from NRTManager -- Key: LUCENE-3488 URL: https://issues.apache.org/jira/browse/LUCENE-3488 Project: Lucene - Java Issue Type: Improvement Affects Versions: 3.5, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.5, 4.0 Attachments: LUCENE-3488.patch, LUCENE-3488.patch, LUCENE-3488.patch Currently we have NRTManager and SearcherManager while NRTManager contains a big piece of the code that is already in SearcherManager. Users are kind of forced to use NRTManager if they want to have SearcherManager goodness with NRT. The integration into NRTManager also forces you to maintain two instances even if you know you always want deletes. To me NRTManager tries to do more than necessary and mixes lots of responsibilities ie. handling searchers and handling indexing generations. NRTManager should use a SearcherManager by aggregation rather than duplicate a lot of logic. SearcherManager should have a NRT and Directory based implementation users can simply choose from. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1536: -- Attachment: LUCENE-1536-rewrite.patch New patch (still only Lucene Core, no contrib/modules/solr modified): - Nuked Filter handling completely from IndexSearcher. Algorithms and Random access optimizations were added to FilteredQuery. IS.search(Query, Filter,...) now only wraps the query with the Filter, if filter!=null (small helper method). - The random access threshhold is still in IndexSearcher.setFilterRandomAccessThreshold(), FilteredQuery gets it in it's weight from IndexSearcher. This is maybe not the best solutions, we can also add a setter to FilteredQuery and IS passes it to FilteredQuery. What do you think? Mike: Can you do perf tests? if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122355#comment-13122355 ] Michael McCandless commented on LUCENE-1536: I will do perf tests! Working on getting luceneutil to do random filters... but could be a few days (I'm offline for the next 3 days) unless I can commit to luceneutil and someone else can run the tests... if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122356#comment-13122356 ] Uwe Schindler commented on LUCENE-1536: --- I will add further tests tomorrow, to test all code paths in FilteredQuery. There is a short-circuit (it implements Scorer.score(Collector) for fast top-scorer as it existed in IndexSearcher.searchWithFilter before. To test the standard scorer behavior (nextDoc/advance), a test should be added that adds FilteredQuery as clause with others to a BQ, so ConjunctionScorer tries nextDoc/advance. Somebody else might look at the scorer and double check. I had to rewrite FilteredQuery#Weight#Scorer, as the filterIter is already advanced to first doc (to check the random access threshold). if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-1536) if a filter can support random access API, we should use it
[ https://issues.apache.org/jira/browse/LUCENE-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-1536: -- Attachment: LUCENE-1536-rewrite.patch New patch: - Fixed the FilteredQuery-Scorer's advance by logic change. Its now much easier to understand. The corresponding tests are in TestFilteredQuery: All tests are executed 2 times: as random access filter and as iterator filter. Also FilteredQuery is added to BQ, so the conventional scorer (nextDoc/advance) is tested. The tests for CachingWrapper* are still disabled, have to rewrite them tomorrow. Then we can change contrib and Solr. if a filter can support random access API, we should use it --- Key: LUCENE-1536 URL: https://issues.apache.org/jira/browse/LUCENE-1536 Project: Lucene - Java Issue Type: Improvement Components: core/search Affects Versions: 2.4 Reporter: Michael McCandless Assignee: Michael McCandless Priority: Minor Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: CachedFilterIndexReader.java, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536-rewrite.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch, LUCENE-1536.patch I ran some performance tests, comparing applying a filter via random-access API instead of current trunk's iterator API. This was inspired by LUCENE-1476, where we realized deletions should really be implemented just like a filter, but then in testing found that switching deletions to iterator was a very sizable performance hit. Some notes on the test: * Index is first 2M docs of Wikipedia. Test machine is Mac OS X 10.5.6, quad core Intel CPU, 6 GB RAM, java 1.6.0_07-b06-153. * I test across multiple queries. 1-X means an OR query, eg 1-4 means 1 OR 2 OR 3 OR 4, whereas +1-4 is an AND query, ie 1 AND 2 AND 3 AND 4. u s means united states (phrase search). * I test with multiple filter densities (0, 1, 2, 5, 10, 25, 75, 90, 95, 98, 99, 99.9 (filter is non-null but all bits are set), 100 (filter=null, control)). * Method high means I use random-access filter API in IndexSearcher's main loop. Method low means I use random-access filter API down in SegmentTermDocs (just like deleted docs today). * Baseline (QPS) is current trunk, where filter is applied as iterator up high (ie in IndexSearcher's search loop). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-3492: Description: I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. The work on this issue is on my github account (lots of experiments): https://github.com/dweiss/randomizedtesting Or directly: git clone git://github.com/dweiss/randomizedtesting.git was: I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. The work on this issue is on my github account (lots of experiments): https://github.com/dweiss/randomizedtesting Or directly: git clone git://github.com/dweiss/randomizedtesting.git -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3492) Extract a generic framework for running randomized tests.
[ https://issues.apache.org/jira/browse/LUCENE-3492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122401#comment-13122401 ] Dawid Weiss commented on LUCENE-3492: - Ok. I've published the project on github here: https://github.com/dweiss/randomizedtesting The repo contains the runner, some tests and examples. Lots of TODOs (in TODO), so consider this a work-in-progress, but if anybody cares to take a look and shout if something is definitely not right -- go ahead. mvn verify on the topmost project compiles everything and runs the tests/ examples. I don't see any functional deviations or differences in execution between ant maven and my Eclipse GUI (mentioned by Robert) which is good. Extract a generic framework for running randomized tests. - Key: LUCENE-3492 URL: https://issues.apache.org/jira/browse/LUCENE-3492 Project: Lucene - Java Issue Type: Improvement Components: general/test Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Attachments: Screen Shot 2011-10-06 at 12.58.02 PM.png I love the idea of randomized testing. Everyone (we at CarrotSearch, Lucene and Solr folks) have their glue to make it possible. The question is if there's something to pull out that others could share without having the need to import Lucene-specific classes. The work on this issue is on my github account (lots of experiments): https://github.com/dweiss/randomizedtesting Or directly: git clone git://github.com/dweiss/randomizedtesting.git -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2765) Shard/Node states
[ https://issues.apache.org/jira/browse/SOLR-2765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122409#comment-13122409 ] Mark Miller commented on SOLR-2765: --- hmmm...you know forever I have been trying to avoid juggling potentially 100's to 1000's of watchers per node like that. Perhaps it really is the lesser of all evils though? This whole situation is why I initially avoided incremental update and it only went in recently. How many zk nodes do you tend to see in that project with lots of watchers? The problem with live nodes is that its per solr instance, and some of this info will probably want to be per core? if we made a new structure that put cores/shards under a solr instance node, I guess that could reduce the number watches needed - but I wonder if it's worth the effort - I'd almost rather see the full watcher option on the current node structure fall down first... Shard/Node states - Key: SOLR-2765 URL: https://issues.apache.org/jira/browse/SOLR-2765 Project: Solr Issue Type: Sub-task Components: SolrCloud, update Reporter: Yonik Seeley Fix For: 4.0 Attachments: shard-roles.patch Need state for shards that indicate they are recovering, active/enabled, or disabled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2816) Versioning
Versioning -- Key: SOLR-2816 URL: https://issues.apache.org/jira/browse/SOLR-2816 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Adds and deletes need to be versioned by the leader so that this can be relayed to all replicas for consistency (so an equivalent index can be built). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2816) Versioning
[ https://issues.apache.org/jira/browse/SOLR-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-2816: --- Attachment: SOLR-2816.patch First cut at adding version info - mostly scaffolding so far. But I'll commit to the branch shortly since it doesn't get in the way of anything. Versioning -- Key: SOLR-2816 URL: https://issues.apache.org/jira/browse/SOLR-2816 Project: Solr Issue Type: Sub-task Components: SolrCloud, update Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-2816.patch Adds and deletes need to be versioned by the leader so that this can be relayed to all replicas for consistency (so an equivalent index can be built). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
[ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122469#comment-13122469 ] arin ghazarian commented on SOLR-2155: -- Hi David, i am interested in using this geohash prefix filtering/bbox feature in solr 4.x with solrcloud, do you have any plans to convert this plugin to a solr 4 compatible one? Thanks, arin Geospatial search using geohash prefixes Key: SOLR-2155 URL: https://issues.apache.org/jira/browse/SOLR-2155 Project: Solr Issue Type: Improvement Reporter: David Smiley Assignee: Grant Ingersoll Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch, Solr2155-1.0.2-project.zip, Solr2155-1.0.3-project.zip, Solr2155-for-1.0.2-3.x-port.patch There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a gazateer) occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox to support different queried shapes so that the filter need not care about these details. This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122474#comment-13122474 ] Koji Sekiguchi commented on LUCENE-3440: Very nice progress, thanks! I think this is almost close to commit. I think the following is a must to do: # update description and figures of the package javadoc ( https://builds.apache.org//job/Lucene-trunk/javadoc/contrib-highlighter/org/apache/lucene/search/vectorhighlight/package-summary.html#package_description ) # update test cases. currently they cannot be compiled. FastVectorHighlighter: IDF-weighted terms for ordered fragments Key: LUCENE-3440 URL: https://issues.apache.org/jira/browse/LUCENE-3440 Project: Lucene - Java Issue Type: Improvement Components: modules/highlighter Affects Versions: 3.5, 4.0 Reporter: sebastian L. Priority: Minor Labels: FastVectorHighlighter Fix For: 3.5, 4.0 Attachments: LUCENE-3.5-SNAPSHOT-3440-7.patch, LUCENE-4.0-SNAPSHOT-3440-7.patch, weight-vs-boost_table01.html, weight-vs-boost_table02.html The FastVectorHighlighter uses for every term found in a fragment an equal weight, which causes a higher ranking for fragments with a high number of words or, in the worst case, a high number of very common words than fragments that contains *all* of the terms used in the original query. This patch provides ordered fragments with IDF-weighted terms: total weight = total weight + IDF for unique term per fragment * boost of query; The ranking-formula should be the same, or at least similar, to that one used in org.apache.lucene.search.highlight.QueryTermScorer. The patch is simple, but it works for us. Some ideas: - A better approach would be moving the whole fragments-scoring into a separate class. - Switch scoring via parameter - Exact phrases should be given a even better score, regardless if a phrase-query was executed or not - edismax/dismax-parameters pf, ps and pf^boost should be observed and corresponding fragments should be ranked higher -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3493) Solr reopen on a custom reader doesn't work
[ https://issues.apache.org/jira/browse/LUCENE-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122506#comment-13122506 ] Jason Rutherglen commented on LUCENE-3493: -- Uwe, check this out on the 3.3 version, what do you think? :) {code} @Override public final IndexReader reopen() throws CorruptIndexException, IOException { // Preserve current readOnly return doReopen(readOnly, null); } @Override public final IndexReader reopen(boolean openReadOnly) throws CorruptIndexException, IOException { return doReopen(openReadOnly, null); } @Override public final IndexReader reopen(final IndexCommit commit) throws CorruptIndexException, IOException { return doReopen(true, commit); } {code} Solr reopen on a custom reader doesn't work --- Key: LUCENE-3493 URL: https://issues.apache.org/jira/browse/LUCENE-3493 Project: Lucene - Java Issue Type: Bug Components: core/index Affects Versions: 3.4 Reporter: Jason Rutherglen Priority: Blocker Attachments: LUCENE-3493.patch When a custom index reader is used with Solr and reopen, the custom reader vanishes after the reopen. It's a bug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2817) Search result with grouping activated delivers wrong numFound values.
Search result with grouping activated delivers wrong numFound values. - Key: SOLR-2817 URL: https://issues.apache.org/jira/browse/SOLR-2817 Project: Solr Issue Type: Bug Affects Versions: 3.4 Reporter: bronco Priority: Blocker I use the 3.4 brach of solr for grouping my results, but obviously there is something wrong in my eyes. If I activate the grouping functionality I get the expected results. In my case for example 4 rows, but the the numFound value says it has 26 results. This result is used by the Apache Solr Library in Drupal to calculate the correct pager on my website. Now I get a search result page with 4 results and a confused pager with 3 more steps to not existing search pages. I try a lot of things including using a facet and all the other suggested things by google but I couldn't change the numFound value to the correct one in my case. If I not use the group.main=true setting including a facet I get in the facet conclusion the correct values of found results but it also say matches 6 (numFound=6). The Apache Solr Library make it nesessary to use the group.main=true setting because without this value the library is not able to parse the resultset. So I get always an empty website without any results even solr answers correctly. Is it somehow possible to change the numFound value to the correct one of the group count result. I think nobody groups a field and shows maybe 10 entries which belong to the request and tell the user we have 200 other things which are not interesting you. But I have to tell you... Any help is apreciated to get some understanding why solr delivers me this numFound result and how to fix this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2817) Search result with grouping activated delivers wrong numFound values.
[ https://issues.apache.org/jira/browse/SOLR-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13122520#comment-13122520 ] bronco commented on SOLR-2817: -- A little edit: This is the result without group.main=true. As you can see the numFound value is correct with 2 and matches say 6. lst name=grouped lst name=site int name=matches6/int arr name=groups lst str name=groupValuehttp://www.domain-1.com/str result name=doclist numFound=2 start=0 maxScore=0.64790744doc float name=score0.64790744/float str name=body /str without the group.main setting result name=response numFound=6 start=0 maxScore=0.64790744 docfloat name=score0.64790744/float str name=body /str I hope this makes it more clear. Search result with grouping activated delivers wrong numFound values. - Key: SOLR-2817 URL: https://issues.apache.org/jira/browse/SOLR-2817 Project: Solr Issue Type: Bug Affects Versions: 3.4 Reporter: bronco Priority: Blocker Labels: grouping, solr I use the 3.4 brach of solr for grouping my results, but obviously there is something wrong in my eyes. If I activate the grouping functionality I get the expected results. In my case for example 4 rows, but the the numFound value says it has 26 results. This result is used by the Apache Solr Library in Drupal to calculate the correct pager on my website. Now I get a search result page with 4 results and a confused pager with 3 more steps to not existing search pages. I try a lot of things including using a facet and all the other suggested things by google but I couldn't change the numFound value to the correct one in my case. If I not use the group.main=true setting including a facet I get in the facet conclusion the correct values of found results but it also say matches 6 (numFound=6). The Apache Solr Library make it nesessary to use the group.main=true setting because without this value the library is not able to parse the resultset. So I get always an empty website without any results even solr answers correctly. Is it somehow possible to change the numFound value to the correct one of the group count result. I think nobody groups a field and shows maybe 10 entries which belong to the request and tell the user we have 200 other things which are not interesting you. But I have to tell you... Any help is apreciated to get some understanding why solr delivers me this numFound result and how to fix this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2817) Search result with grouping activated delivers wrong numFound value and breaks pager.
[ https://issues.apache.org/jira/browse/SOLR-2817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bronco updated SOLR-2817: - Summary: Search result with grouping activated delivers wrong numFound value and breaks pager. (was: Search result with grouping activated delivers wrong numFound values.) Search result with grouping activated delivers wrong numFound value and breaks pager. - Key: SOLR-2817 URL: https://issues.apache.org/jira/browse/SOLR-2817 Project: Solr Issue Type: Bug Affects Versions: 3.4 Reporter: bronco Priority: Blocker Labels: grouping, solr I use the 3.4 brach of solr for grouping my results, but obviously there is something wrong in my eyes. If I activate the grouping functionality I get the expected results. In my case for example 4 rows, but the the numFound value says it has 26 results. This result is used by the Apache Solr Library in Drupal to calculate the correct pager on my website. Now I get a search result page with 4 results and a confused pager with 3 more steps to not existing search pages. I try a lot of things including using a facet and all the other suggested things by google but I couldn't change the numFound value to the correct one in my case. If I not use the group.main=true setting including a facet I get in the facet conclusion the correct values of found results but it also say matches 6 (numFound=6). The Apache Solr Library make it nesessary to use the group.main=true setting because without this value the library is not able to parse the resultset. So I get always an empty website without any results even solr answers correctly. Is it somehow possible to change the numFound value to the correct one of the group count result. I think nobody groups a field and shows maybe 10 entries which belong to the request and tell the user we have 200 other things which are not interesting you. But I have to tell you... Any help is apreciated to get some understanding why solr delivers me this numFound result and how to fix this issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org