[jira] [Updated] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3041: --- Attachment: LUCENE-3041.patch Updated patch. This simplifies the hierarchy a lot. DispatchingQueryProcessor is merged into QueryProcessor, which then becomes an abstract class. QueryProcessor now has #dispatchProcessing(Query) which is the entry point to the dispatching process. DefaultQueryProcessor is changed to RewriteCachingQueryProcessor which caches the rewriting of querys. This could be extended further to provide special support for BooleanQuery. Remaining to do is to provide a test which illustrates walking through a complex class. Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Attachments: LUCENE-3041.patch, LUCENE-3041.patch Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025669#comment-13025669 ] Simon Willnauer commented on LUCENE-3041: - Chris, nice simplification. I have one question, lets say we have a boolean query OR(AND(Fuzzy:A, Fuzzy:B), AND(Fuzzy A, Fuzzy:C)) how would it be possible with the current patch to reuse the rewrite for Fuzzy:A? As far as I can see If I don't rewrite the boolean query myself the current patch will rewrite the top level query and returns right? So somehow it must be possible to walk down the query ast. or do I miss something? Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Attachments: LUCENE-3041.patch, LUCENE-3041.patch Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025677#comment-13025677 ] Chris Male commented on LUCENE-3041: No, you didn't miss something. The RewriteCachingQueryProcessor currently only rewrites the top level query. It needs to be extended to handle BooleanQuerys and any other composite query (BoostingQuery for example). I might actually add a DefaultQueryProcessor again which walks the full Query AST by default. Then get RewritingCachingQueryProcessor to extend and cache. I'll iterate a new patch. Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Attachments: LUCENE-3041.patch, LUCENE-3041.patch Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3047) HyphenationCompoundWordTokenFilter does not work correctly with the german word Brustamputation
HyphenationCompoundWordTokenFilter does not work correctly with the german word Brustamputation --- Key: LUCENE-3047 URL: https://issues.apache.org/jira/browse/LUCENE-3047 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1 Environment: Linux 2.6.32-31-generic java version 1.6.0_21 Java(TM) SE Runtime Environment (build 1.6.0_21-b06) Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode) Reporter: Lars Feistner Priority: Minor Following Test fails: @Test public void testBrustamputation() throws IOException { Analyzer compoundAnalyzer = new Analyzer() { @Override public TokenStream tokenStream( String fieldName, Reader reader ) { InputStream in = this.getClass().getResourceAsStream( /de_DR.xml ); final InputSource inputSource = new InputSource( in ); inputSource.setEncoding( iso-8859-1 ); HyphenationTree hyphenator = null; try { hyphenator = HyphenationCompoundWordTokenFilter.getHyphenationTree( inputSource ); } catch ( Exception ex ) { Assert.fail( , ex); } HashSet dict = new HashSet( Arrays.asList( new String[]{brust, amputation} ) ); return new HyphenationCompoundWordTokenFilter( Version.LUCENE_31, new WhitespaceTokenizer( Version.LUCENE_31, reader ), hyphenator, dict, CompoundWordTokenFilterBase.DEFAULT_MIN_WORD_SIZE, 4, CompoundWordTokenFilterBase.DEFAULT_MAX_SUBWORD_SIZE, false ); } }; TokenStream tokenStream = compoundAnalyzer.tokenStream( Kurztext, new StringReader( brustamputation ) ); CharTermAttribute t = tokenStream.addAttribute( CharTermAttribute.class ); SetString tokenSet = new HashSetString(); while ( tokenStream.incrementToken() ) { tokenSet.add( t.toString() ); System.out.println( t ); } Assert.assertTrue( tokenSet.contains( brust ), brust ); Assert.assertTrue( tokenSet.contains( brustamputation ), brustamputation ); Assert.assertTrue( tokenSet.contains( amputation ), amputation ); } -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Tue, Apr 26, 2011 at 11:34 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Apr 26, 2011 at 11:07 PM, Robert Muir rcm...@gmail.com wrote: It appears there are some problems with modularization of the code, especially between lucene and solr, so I would like for us to have a discussion on this thread. The specifics of each case matter of course. I agree. Some of the refactored code has been changed to use the lucene namespace, and it seems only fair that other code that has traditionally been the domain of Solr keep the solr namespace. This helps keep the proper mindset that code is not being moved from solr to lucene as too many people keep putting it, but it's being exposed to lucene users and is now shared. Why impose namespace restrictions based on where code was originally committed? I think the namespace of refactored code should reflect the nature of the code, not its original origins? For example, when I refactored UnInvertedField, it split nicely into a Solr piece and a core Lucene piece, and so I gave the core Lucene piece then org.apache.lucene.index namespace. I think leaving refactored code in the solr namespace sends the wrong message (ie, that this module depends on Solr somehow). The lucene namespace makes it clear that it only depends on Lucene. Eg, the patch on LUCENE-2995 (consolidating our various spell/suggest impls) also consolidates everything under the lucene namespace, which I think makes sense? Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Tue, Apr 26, 2011 at 11:41 PM, Grant Ingersoll gsing...@apache.org wrote: I think this needs a bit more explanation. AIUI, the primary cause for concern is that by making something a module, you are taking a private, internal API of Solr's and now making it a public API that must be maintained (and backwards maintained) which could slow down development as one now needs to be concerned with more factors than you would if it were merely an implementation detail in Solr. This concern doesn't make sense to me: if we mark a module experimental, we are fully free to change it, even drastically. Pre-merge, I agree, it was a nightmare factoring code across projects... but now that we are merged, and now that we have @experimental, I don't understand this argument. Maybe we can take a concrete example, eg LUCENE-2995 (factored out suggest module): how does this being its own module hurt Solr? Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Filters with 2.9.4
Hi Uwe, Thanks for the reply. Things are a bit tangled, because I've used early Solr stuff with DocSet and have extensively used my own caching Filters because I couldn't get what I wanted with the standard versions a few years ago. It will take a while to undo that, but I'm working towards that. However, it still seems to me that the Filter.getDocIdSet() method should also be given the docBase for the given reader. It seems odd that the Collector has that knowledge but the Filter does not even though they are pretty closely related classes. What do you think? Antony On 19/04/2011 5:01 PM, Uwe Schindler wrote: Hi Antony, Why not use CachingWrapperFilter together with a TermsFilter or QueryWrapperFilter(TermQuery)? This Filter keeps track of all used segment readers. So you build an instance: Filter f = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term(...; And reuse that filter instance with all queries, the user starts. No need to hack the cache yourself. The above variant is much more effective as it works better with reopen()'ed index readers (after index changed), because it reuses the unchanged segment readers. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Antony Bowesman [mailto:a...@thorntothehorn.org] Sent: Tuesday, April 19, 2011 7:30 AM To: Lucene Dev Subject: Filters with 2.9.4 Hi, Another migrate to 2.9.4 issue for me... When a search is done by a user, I collect a 'DocSet' of Documents for that 'owner' (Term(id, XX)). This is a single set for all Documents in the index and NOT per reader. Then when searches are made I use caching Filters, but I use my master DocSet as a Filter for those chained Filters. However, with 2.9, Filters are now called per segment reader and there's a DocIdSet for each Reader. There is no way for the filter implementation to know the docBase for the passed reader, like the collector does. As the Javadocs for Filter.getDocIdSet imply, a Filter must only return doc ids for the given reader. I am now stuck with a filter implementation that can no longer interset the master bitset for my 'owners'. Was this envisaged during the changes and is there a way I can get hold of the docBase for an IndexReader. Thanks Antony - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Filters with 2.9.4
Hi, In Lucene trunk the Filter gets a ReaderContext which contain a doc base if available. For Lucene 2 and 3 this is not available. The Lucene 2.9 code did not change documented behavior. The fact that Filters always got the top level reader was never documented (it was just like that in early Lucene versions) and so is no break. The same applies not only to filters, it also applies to Scorers created by Queries. Those also don't know anything about the top-level searcher (and they don't need). For a filter to work this is also not an requirement - the IndexReader passed as parameter is self contained and provides all information for processing the current segment). You should simply fix your caching (which is much more effective after this change, as the cache items don't get invalid after a reopen of an index where only few segments changed). I would suggest to correct your code and use CachingWrapperFilter. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Antony Bowesman [mailto:a...@thorntothehorn.org] Sent: Wednesday, April 27, 2011 1:22 PM To: dev@lucene.apache.org Subject: Re: Filters with 2.9.4 Hi Uwe, Thanks for the reply. Things are a bit tangled, because I've used early Solr stuff with DocSet and have extensively used my own caching Filters because I couldn't get what I wanted with the standard versions a few years ago. It will take a while to undo that, but I'm working towards that. However, it still seems to me that the Filter.getDocIdSet() method should also be given the docBase for the given reader. It seems odd that the Collector has that knowledge but the Filter does not even though they are pretty closely related classes. What do you think? Antony On 19/04/2011 5:01 PM, Uwe Schindler wrote: Hi Antony, Why not use CachingWrapperFilter together with a TermsFilter or QueryWrapperFilter(TermQuery)? This Filter keeps track of all used segment readers. So you build an instance: Filter f = new CachingWrapperFilter(new QueryWrapperFilter(new TermQuery(new Term(...; And reuse that filter instance with all queries, the user starts. No need to hack the cache yourself. The above variant is much more effective as it works better with reopen()'ed index readers (after index changed), because it reuses the unchanged segment readers. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Antony Bowesman [mailto:a...@thorntothehorn.org] Sent: Tuesday, April 19, 2011 7:30 AM To: Lucene Dev Subject: Filters with 2.9.4 Hi, Another migrate to 2.9.4 issue for me... When a search is done by a user, I collect a 'DocSet' of Documents for that 'owner' (Term(id, XX)). This is a single set for all Documents in the index and NOT per reader. Then when searches are made I use caching Filters, but I use my master DocSet as a Filter for those chained Filters. However, with 2.9, Filters are now called per segment reader and there's a DocIdSet for each Reader. There is no way for the filter implementation to know the docBase for the passed reader, like the collector does. As the Javadocs for Filter.getDocIdSet imply, a Filter must only return doc ids for the given reader. I am now stuck with a filter implementation that can no longer interset the master bitset for my 'owners'. Was this envisaged during the changes and is there a way I can get hold of the docBase for an IndexReader. Thanks Antony - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2479) Phrase (arbitrary delimiter) based autocomplete
Phrase (arbitrary delimiter) based autocomplete --- Key: SOLR-2479 URL: https://issues.apache.org/jira/browse/SOLR-2479 Project: Solr Issue Type: New Feature Components: spellchecker Reporter: Dawid Weiss Assignee: Dawid Weiss Priority: Minor Fix For: 4.0 Much like the one described here by Google: http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html?utm_source=feedburnerutm_medium=feedutm_campaign=Feed%3A+blogspot%2FMKuf+%28Official+Google+Blog%29 My idea was to allow arbitrary delimiters -- then infix suggestions would also be possible (although these are _not_ of much practical importance and relatively few geeks would find them useful :). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Apr 27, 2011, at 12:14 AM, Robert Muir wrote: On Tue, Apr 26, 2011 at 11:41 PM, Grant Ingersoll gsing...@apache.org wrote: I think this needs a bit more explanation. AIUI, the primary cause for concern is that by making something a module, you are taking a private, internal API of Solr's and now making it a public API that must be maintained (and backwards maintained) which could slow down development as one now needs to be concerned with more factors than you would if it were merely an implementation detail in Solr. Can we solve this? It seems like for lucene users, they currently only have this choice: A. no access to feature X at all but, couldn't they at least have this choice: A. no access to feature X at all B. having access to some feature, but it has relaxed backwards compatibility to address the concern. In other words, we could mark the api @experimental or whatever, and the user can choose not to use it from a lucene level if they don't want to deal with upgrade hassles. Honestly, too much fight too see the trees through the forrest. Yonik has compromised down with pretty much every module brought up, that if its not stated as this feature is going to Lucene, if it goes to a module, if the module can have similar recs as the code had in Solr - that he's okay with it. To him it's very important that some of this stuff comes off as shared between Lucene/Solr and not just Lucene's. That's what I have gathered anyway. Fine by me. My memory is that Yonik has never been stead fast against modules. He has tried to negotiate what he thinks is best in terms of this stuff. The break down comes from the personalities involved. Noone has been willing to swim to the end because it's hard work. Well some things are hard work. I say get used to it. I am. The problem is that Simon says things like, everything should be a module and solr should just be sugar on Lucene. That scares Yonik. Then Yonik makes comments questioning individual modules. That scares the other guys. Both sides retreat to their corners. Fantastic. Yes there is a middle ground - I've seen it swirl around and disappear back into the blood a few times. These volatile personalities are just not finding it. - Mark - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Filters with 2.9.4
Thanks Uwe. I'll work towards the CachingWrapperFilter. Antony On 27/04/2011 9:33 PM, Uwe Schindler wrote: Hi, In Lucene trunk the Filter gets a ReaderContext which contain a doc base if available. For Lucene 2 and 3 this is not available. The Lucene 2.9 code did not change documented behavior. The fact that Filters always got the top level reader was never documented (it was just like that in early Lucene versions) and so is no break. The same applies not only to filters, it also applies to Scorers created by Queries. Those also don't know anything about the top-level searcher (and they don't need). For a filter to work this is also not an requirement - the IndexReader passed as parameter is self contained and provides all information for processing the current segment). You should simply fix your caching (which is much more effective after this change, as the cache items don't get invalid after a reopen of an index where only few segments changed). I would suggest to correct your code and use CachingWrapperFilter. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Wed, Apr 27, 2011 at 8:13 AM, Mark Miller markrmil...@gmail.com wrote: The problem is that Simon says things like, everything should be a module and solr should just be sugar on Lucene. That scares Yonik. Then Yonik makes comments questioning individual modules. That scares the other guys. Both sides retreat to their corners. why? In the best interest of the project, what are the reasons why this a bad thing? Then users could access solr's features from the API. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: bug in LuceneTestCase#TEST_MIN_ITER
Fixed the behavior in Revision: 1097097 simon On Tue, Apr 26, 2011 at 6:14 PM, Shai Erera ser...@gmail.com wrote: I think you're right Simon ! Obviously I didn't test it with that scenario in mind :). Shai On Tue, Apr 26, 2011 at 6:15 PM, Simon Willnauer simon.willna...@googlemail.com wrote: hey I wonder how this TEST_MIN_ITER feature works though... I expect that if I set -Dtests.iter.min=1 -Dtests.iter=10 and I fail in any of those iterations that the the runner stops immediately and prints a failure. Is that correct? if so I don't understand this code: if (testsFailed) { lastIterFailed = i; if (i == TEST_ITER_MIN - 1) { if (verbose) { System.out.println(\nNOTE: iteration + lastIterFailed + failed !); } break; } } this only stops if it fails at tests.iter.min but not if it has failed test.iter.min+1 This should rather be something like if(i =TEST_ITERM_MIN-1) right? simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: modularization discussion
if its not stated as this feature is going to Lucene It seems as though some people assume that since Lucene is a library, and Solr is an application, that exposing Solr API *means* making it part of Lucene. It ain't necessarily so, and it need not be a point of contention. I want to reiterate my opinion (voiced pre-merge) that there be a third entity here besides Solr and Lucene. E.g., if modules/ became thirdentity/, with its own org.apache.thirdentity namespace, wouldn't questions of ownership/control mostly go away? Steve
Re: modularization discussion
On Wed, Apr 27, 2011 at 6:28 AM, Michael McCandless luc...@mikemccandless.com wrote: Why impose namespace restrictions based on where code was originally committed? I think the namespace of refactored code should reflect the nature of the code, not its original origins? And if it's a very core part of solr that we've tended to hang a lot of new features on, etc, then the nature of that code should still hopefully be solrish. For example, when I refactored UnInvertedField, it split nicely into a Solr piece and a core Lucene piece, and so I gave the core Lucene piece then org.apache.lucene.index namespace. That's because it was factored directly into Lucene-core, not into a module. I think leaving refactored code in the solr namespace sends the wrong message (ie, that this module depends on Solr somehow). The lucene namespace makes it clear that it only depends on Lucene. But that won't be true... it's likely that many modules will depend on other modules. But as I said... it seems only fair to meet half way and use the solr namespace for some modules and the lucene namespace for others. -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Otto updated SOLR-236: --- Comment: was deleted (was: Am I right that trunk is 4.0? What is the newest patch that works on that code? All patches I tried so far failed for me. Also, would someone we able to share a solr.WAR file that is already patched and fairly up-to-date? Thanks) Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: Next Attachments: DocSetScoreCollector.java, NonAdjacentDocumentCollapser.java, NonAdjacentDocumentCollapserTest.java, SOLR-236-1_4_1-NPEfix.patch, SOLR-236-1_4_1-paging-totals-working.patch, SOLR-236-1_4_1.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-branch_3x.patch, SOLR-236-distinctFacet.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236-trunk.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch, collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, solr-236.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: modularization discussion
On 4/27/2011 at 9:25 AM, Yonik wrote: it seems only fair to meet half way and use the solr namespace for some modules and the lucene namespace for others. Let's eliminate a source of conflict, and make modules another product that is neither Lucene nor Solr. Steve
[jira] [Resolved] (SOLR-2272) Join
[ https://issues.apache.org/jira/browse/SOLR-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2272. Resolution: Fixed Join Key: SOLR-2272 URL: https://issues.apache.org/jira/browse/SOLR-2272 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Fix For: 4.0 Attachments: SOLR-2272.patch, SOLR-2272.patch, SOLR-2272.patch Limited join functionality for Solr, mapping one set of IDs matching a query to another set of IDs, based on the indexed tokens of the fields. Example: fq={!join from=parent_ptr to:parent_id}child_doc:query -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3048) Improve BooleanQuery rewrite documentation
Improve BooleanQuery rewrite documentation -- Key: LUCENE-3048 URL: https://issues.apache.org/jira/browse/LUCENE-3048 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Reporter: Chris Male Priority: Minor While looking over BooleanQuery#rewrite, I found a couple of things confusing. Why, in the case of a single clause, is the boost set as it is, and whats going on with the lazy initialisation of the cloned BooleanQuery. I'm just adding a few lines of documentation to both situations to clarify this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3048) Improve BooleanQuery rewrite documentation
[ https://issues.apache.org/jira/browse/LUCENE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3048: --- Attachment: LUCENE-3048.patch Patch adding comments as mentioned Improve BooleanQuery rewrite documentation -- Key: LUCENE-3048 URL: https://issues.apache.org/jira/browse/LUCENE-3048 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Reporter: Chris Male Priority: Minor Attachments: LUCENE-3048.patch While looking over BooleanQuery#rewrite, I found a couple of things confusing. Why, in the case of a single clause, is the boost set as it is, and whats going on with the lazy initialisation of the cloned BooleanQuery. I'm just adding a few lines of documentation to both situations to clarify this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3048) Improve BooleanQuery rewrite documentation
[ https://issues.apache.org/jira/browse/LUCENE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-3048: --- Assignee: Simon Willnauer Improve BooleanQuery rewrite documentation -- Key: LUCENE-3048 URL: https://issues.apache.org/jira/browse/LUCENE-3048 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Reporter: Chris Male Assignee: Simon Willnauer Priority: Minor Attachments: LUCENE-3048.patch While looking over BooleanQuery#rewrite, I found a couple of things confusing. Why, in the case of a single clause, is the boost set as it is, and whats going on with the lazy initialisation of the cloned BooleanQuery. I'm just adding a few lines of documentation to both situations to clarify this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3048) Improve BooleanQuery rewrite documentation
[ https://issues.apache.org/jira/browse/LUCENE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025819#comment-13025819 ] Simon Willnauer commented on LUCENE-3048: - looks useful chris! I will commit it, thanks! Improve BooleanQuery rewrite documentation -- Key: LUCENE-3048 URL: https://issues.apache.org/jira/browse/LUCENE-3048 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Reporter: Chris Male Assignee: Simon Willnauer Priority: Minor Attachments: LUCENE-3048.patch While looking over BooleanQuery#rewrite, I found a couple of things confusing. Why, in the case of a single clause, is the boost set as it is, and whats going on with the lazy initialisation of the cloned BooleanQuery. I'm just adding a few lines of documentation to both situations to clarify this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3048) Improve BooleanQuery rewrite documentation
[ https://issues.apache.org/jira/browse/LUCENE-3048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-3048. - Resolution: Fixed Improve BooleanQuery rewrite documentation -- Key: LUCENE-3048 URL: https://issues.apache.org/jira/browse/LUCENE-3048 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Reporter: Chris Male Assignee: Simon Willnauer Priority: Minor Attachments: LUCENE-3048.patch While looking over BooleanQuery#rewrite, I found a couple of things confusing. Why, in the case of a single clause, is the boost set as it is, and whats going on with the lazy initialisation of the cloned BooleanQuery. I'm just adding a few lines of documentation to both situations to clarify this. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Wed, Apr 27, 2011 at 9:25 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Apr 27, 2011 at 6:28 AM, Michael McCandless luc...@mikemccandless.com wrote: Why impose namespace restrictions based on where code was originally committed? I think the namespace of refactored code should reflect the nature of the code, not its original origins? And if it's a very core part of solr that we've tended to hang a lot of new features on, etc, then the nature of that code should still hopefully be solrish. I'm confused... aren't they all solrish? Like, of the refactorings on the table, which ones are not solrish? Is the real issue here that you want Solr's name to live on no matter how this code is refactored in the future? For example, when I refactored UnInvertedField, it split nicely into a Solr piece and a core Lucene piece, and so I gave the core Lucene piece then org.apache.lucene.index namespace. That's because it was factored directly into Lucene-core, not into a module. OK. I think leaving refactored code in the solr namespace sends the wrong message (ie, that this module depends on Solr somehow). The lucene namespace makes it clear that it only depends on Lucene. But that won't be true... it's likely that many modules will depend on other modules. Sure but that's fine? Each layer can depend on other stuff in its layer, or in stuff in the lower (more core) layers. Solr depends on Solr stuff and modules and Lucene core. Modules depend on other modules an Lucene core. But as I said... it seems only fair to meet half way and use the solr namespace for some modules and the lucene namespace for others. Actually I think a whole new namespace (Steven's suggestion) is a great idea? Would that work? (Else we'll be arguing on every module refactoring what namespace it should take...). Or, I would also be fine with naming all modules factored out of solr under the solr namespace, as long as we make it clear that you can use them w/o the rest of Solr. Are there other (technical) objections to ongoing refactoring besides this namespace problem? Mike http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [Lucene.Net] Lucene.NET 2.9.4g -- only usable with .NET 4.0 ?
Sorry, for now, only 4.0. DIGY -Original Message- From: Granroth, Neal V. [mailto:neal.granr...@thermofisher.com] Sent: Wednesday, April 27, 2011 6:06 PM To: lucene-net-...@lucene.apache.org Subject: [Lucene.Net] Lucene.NET 2.9.4g -- only usable with .NET 4.0 ? Digy, Am I correct that your trial code changes make this version of Lucene.NET incompatible and un-buildable with any version of .NET prior to 4.0? - Neal
Re: [Lucene.Net] Lucene.NET 2.9.4g -- only usable with .NET 4.0 ?
On 27.04.2011 17:40, Amanuel Workneh wrote: Am I correct that your trial code changes make this version of Lucene.NET incompatible and un-buildable with any version of .NET prior to 4.0? As I understand it, 2.9.4g only replaces non-generic collections with generic ones. Generics was introduced in .NET Framework 2.0. Oh, sorry, I took a look at the code just to make sure. It does use SortedSet, a .NET 4 feature. It also uses HashSet, introduced in .NET 3.5. We could get a copy of these classes from the Mono project: 4.0 collection classes: https://github.com/mono/mono/tree/master/mcs/class/System/System.Collections.Generic 3.5 collection classes: https://github.com/mono/mono/tree/master/mcs/class/System.Core/System.Collections.Generic They are licensed under the MIT/X11 license, which should be compatible with ASF's policy. Robert
[jira] [Created] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer) - Key: LUCENE-3049 URL: https://issues.apache.org/jira/browse/LUCENE-3049 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1 Reporter: Jonathan Young Calling HHMMSegmenter.process() on a string which is longer than 32767 characters will usually result in a NullPointerException being thrown with the following backtrace: java.lang.NullPointerException at org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190) at org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208) The root cause is the declaration of index as a _short_ at line 77 of modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
[ https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Young updated LUCENE-3049: --- Lucene Fields: [New, Patch Available] (was: [New]) Patch attached. NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer) - Key: LUCENE-3049 URL: https://issues.apache.org/jira/browse/LUCENE-3049 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1 Reporter: Jonathan Young Original Estimate: 1h Remaining Estimate: 1h Calling HHMMSegmenter.process() on a string which is longer than 32767 characters will usually result in a NullPointerException being thrown with the following backtrace: java.lang.NullPointerException at org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190) at org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208) The root cause is the declaration of index as a _short_ at line 77 of modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-3.x - Build # 7488 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/7488/ 1 tests failed. REGRESSION: org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2894) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:589) at java.lang.StringBuffer.append(StringBuffer.java:337) at java.text.RuleBasedCollator.getCollationKey(RuleBasedCollator.java:617) at org.apache.lucene.collation.CollationKeyFilter.incrementToken(CollationKeyFilter.java:93) at org.apache.lucene.collation.CollationTestBase.assertThreadSafe(CollationTestBase.java:304) at org.apache.lucene.collation.TestCollationKeyAnalyzer.testThreadSafe(TestCollationKeyAnalyzer.java:89) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1097) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1025) Build Log (for compile errors): [...truncated 5263 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
[ https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025927#comment-13025927 ] Steven Rowe commented on LUCENE-3049: - Jonathan, FYI, you didn't attach a patch? NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer) - Key: LUCENE-3049 URL: https://issues.apache.org/jira/browse/LUCENE-3049 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1 Reporter: Jonathan Young Original Estimate: 1h Remaining Estimate: 1h Calling HHMMSegmenter.process() on a string which is longer than 32767 characters will usually result in a NullPointerException being thrown with the following backtrace: java.lang.NullPointerException at org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190) at org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208) The root cause is the declaration of index as a _short_ at line 77 of modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
[ https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Young resolved LUCENE-3049. Resolution: Duplicate Lucene Fields: [New] (was: [Patch Available, New]) Recently fixed at revision 1092328. NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer) - Key: LUCENE-3049 URL: https://issues.apache.org/jira/browse/LUCENE-3049 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1 Reporter: Jonathan Young Original Estimate: 1h Remaining Estimate: 1h Calling HHMMSegmenter.process() on a string which is longer than 32767 characters will usually result in a NullPointerException being thrown with the following backtrace: java.lang.NullPointerException at org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190) at org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208) The root cause is the declaration of index as a _short_ at line 77 of modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
[ https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025929#comment-13025929 ] Jonathan Young edited comment on LUCENE-3049 at 4/27/11 6:17 PM: - In preparing the patch, I updated, and then discovered it had already been recently fixed at revision 1092328. was (Author: jyoung): Recently fixed at revision 1092328. NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer) - Key: LUCENE-3049 URL: https://issues.apache.org/jira/browse/LUCENE-3049 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1 Reporter: Jonathan Young Original Estimate: 1h Remaining Estimate: 1h Calling HHMMSegmenter.process() on a string which is longer than 32767 characters will usually result in a NullPointerException being thrown with the following backtrace: java.lang.NullPointerException at org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190) at org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208) The root cause is the declaration of index as a _short_ at line 77 of modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3049) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer)
[ https://issues.apache.org/jira/browse/LUCENE-3049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Young updated LUCENE-3049: --- Comment: was deleted (was: Patch attached.) NullPointerException in BiSegGraph.getShortPath (in smartcn chinese analyzer) - Key: LUCENE-3049 URL: https://issues.apache.org/jira/browse/LUCENE-3049 Project: Lucene - Java Issue Type: Bug Components: contrib/analyzers Affects Versions: 3.1 Reporter: Jonathan Young Original Estimate: 1h Remaining Estimate: 1h Calling HHMMSegmenter.process() on a string which is longer than 32767 characters will usually result in a NullPointerException being thrown with the following backtrace: java.lang.NullPointerException at org.apache.lucene.analysis.cn.smart.hhmm.BiSegGraph.getShortPath(BiSegGraph.java:190) at org.apache.lucene.analysis.cn.smart.hhmm.HHMMSegmenter.process(HHMMSegmenter.java:208) The root cause is the declaration of index as a _short_ at line 77 of modules/analysis/smartcn/src/java/org/apache/lucene/analysis/cn/smart/hhmm/SegGraph.java . -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2400) FieldAnalysisRequestHandler; add information about token-relation
[ https://issues.apache.org/jira/browse/SOLR-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025958#comment-13025958 ] Stefan Matheis (steffkes) commented on SOLR-2400: - Yes =) Ty Uwe, applied the Patch: works perfectly! I've tried splitting on Words, also removing of Stopwords - both are looking good. Will see how we could integrate this -- actually for the normal languages an their analysis .. afterwords for the Japanase one :) FieldAnalysisRequestHandler; add information about token-relation - Key: SOLR-2400 URL: https://issues.apache.org/jira/browse/SOLR-2400 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Stefan Matheis (steffkes) Priority: Minor Attachments: 110303_FieldAnalysisRequestHandler_output.xml, 110303_FieldAnalysisRequestHandler_view.png, SOLR-2400.patch, SOLR-2400.patch, field.xml The XML-Output (simplified example attached) is missing one small information .. which could be very useful to build an nice Analysis-Output, and that's Token-Relation (if there is special/correct word for this, please correct me). Meaning, that is actually not possible to follow the Analysis-Process (completly) while the Tokenizers/Filters will drop out Tokens (f.e. StopWord) or split it into multiple Tokens (f.e. WordDelimiter). Would it be possible to include this Information? If so, it would be possible to create an improved Analysis-Page for the new Solr Admin (SOLR-2399) - short scribble attached -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025961#comment-13025961 ] Robert Muir commented on LUCENE-3023: - I was helping Simon look at reintegrating this branch (produce a patch for easy review, etc), but I found some problems. 1. it looks like some commits were marked as merged from trunk, but not actually merged. so if we reintegrate into trunk we will lose some changes. 2. some files have lost their svn:eol-style, which makes the comparison difficult. I'm looking at these issues now. Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[HUDSON] Lucene-Solr-tests-only-trunk - Build # 7501 - Failure
Build: https://builds.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/7501/ 1 tests failed. REGRESSION: org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple Error Message: expected:3 but was:2 Stack Trace: junit.framework.AssertionFailedError: expected:3 but was:2 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1246) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1175) at org.apache.solr.client.solrj.TestLBHttpSolrServer.testSimple(TestLBHttpSolrServer.java:127) Build Log (for compile errors): [...truncated 9069 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3023: Attachment: diffMccand.py ok, i think these issues are resolved. I'm attaching the script Mike wrote that I used for checking that we don't lose any changes (I think its the same script we used for the flex branch). the way I did it is to checkout a/ and b/, reintegrate the branch into b/, and run the script to produce a huge patch. if some things look suspicious like they are lost changes, then i reverse apply the huge patch to the branch with eclipse and only selectively apply those lost changes and then commit. Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: modularization discussion
On Wed, Apr 27, 2011 at 11:49 AM, Michael McCandless luc...@mikemccandless.com wrote: On Wed, Apr 27, 2011 at 9:25 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Apr 27, 2011 at 6:28 AM, Michael McCandless luc...@mikemccandless.com wrote: Why impose namespace restrictions based on where code was originally committed? I think the namespace of refactored code should reflect the nature of the code, not its original origins? And if it's a very core part of solr that we've tended to hang a lot of new features on, etc, then the nature of that code should still hopefully be solrish. I'm confused... aren't they all solrish? Like, of the refactorings on the table, which ones are not solrish? The benchmarking stuff definitely originated in lucene-land, there was much more lucene analysis than solr analysis in that module consolidation, and non-sandboxish stuff in lucene-contrib that may be refactored/moved to modules. Is the real issue here that you want Solr's name to live on no matter how this code is refactored in the future? For example, when I refactored UnInvertedField, it split nicely into a Solr piece and a core Lucene piece, and so I gave the core Lucene piece then org.apache.lucene.index namespace. That's because it was factored directly into Lucene-core, not into a module. OK. I think leaving refactored code in the solr namespace sends the wrong message (ie, that this module depends on Solr somehow). The lucene namespace makes it clear that it only depends on Lucene. But that won't be true... it's likely that many modules will depend on other modules. Sure but that's fine? Each layer can depend on other stuff in its layer, or in stuff in the lower (more core) layers. Solr depends on Solr stuff and modules and Lucene core. Modules depend on other modules an Lucene core. But my point was the namespace doesn't tell you what the dependencies of the modules are. lucene wouldn't mean that it depends on lucene-core only... (and depending what it is, may not depend on lucene-core at all) and solr wouldn't mean that it depends on solr-core. But as I said... it seems only fair to meet half way and use the solr namespace for some modules and the lucene namespace for others. Actually I think a whole new namespace (Steven's suggestion) is a great idea? Would that work? (Else we'll be arguing on every module refactoring what namespace it should take...). Or, I would also be fine with naming all modules factored out of solr under the solr namespace, as long as we make it clear that you can use them w/o the rest of Solr. Of course! That's the whole point of refactoring a module out of some solr functionality. Actual dependencies (i.e. which modules depend on which modules) would be TBD of course. Are there other (technical) objections to ongoing refactoring besides this namespace problem? I don't think so in general - as I stated before, w.r.t. LUCENE-2883, later discussions led me to believe there was very little disagreement left (and I actually thought some of us had come to an agreement). -Yonik - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3023: Attachment: LUCENE-3023.patch Attached is the DWPT branch in patch format against trunk (for easier reviewing). Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3023) Land DWPT on trunk
[ https://issues.apache.org/jira/browse/LUCENE-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026069#comment-13026069 ] Robert Muir commented on LUCENE-3023: - What about TestIndexWriter.testIndexingThenDeleting? I noticed in the branch the test method is changed to _testIndexingThenDeleting (disabled). However, if i re-enable it (rename it back) it never seems to finish... Land DWPT on trunk -- Key: LUCENE-3023 URL: https://issues.apache.org/jira/browse/LUCENE-3023 Project: Lucene - Java Issue Type: Task Affects Versions: CSF branch, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3023.patch, LUCENE-3023.patch, diffMccand.py, realtime-TestAddIndexes-3.txt, realtime-TestAddIndexes-5.txt, realtime-TestIndexWriterExceptions-assert-6.txt, realtime-TestIndexWriterExceptions-npe-1.txt, realtime-TestIndexWriterExceptions-npe-2.txt, realtime-TestIndexWriterExceptions-npe-4.txt, realtime-TestOmitTf-corrupt-0.txt With LUCENE-2956 we have resolved the last remaining issue for LUCENE-2324 so we can proceed landing the DWPT development on trunk soon. I think one of the bigger issues here is to make sure that all JavaDocs for IW etc. are still correct though. I will start going through that first. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024966#comment-13024966 ] Lance Norskog edited comment on SOLR-2242 at 4/28/11 2:01 AM: -- From the patch: bq. {{public static final String FACET_NAMEDISTINCT = FACET + .numFacetTerms;}} So- in this issue, a _name_ is what everything else calls a _term_, and a _value_ is what everyone else calls a _count of documents with *this term* in *this field*_. Please change this in the patch. was (Author: lancenorskog): From the patch: bq. {{public static final String FACET_NAMEDISTINCT = FACET + .numFacetTerms;}} So- in this issue, a _name_ is what everything else calls a _term_. Please change this in the patch. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Congratulations!
Hi Phillipe, Congrats, I am looking forward to start working with you too ;) On Tue, Apr 26, 2011 at 8:40 PM, Mark Miller markrmil...@gmail.com wrote: Congrats Phillipe! We are very excited to have you! Your proposal sounds great. - Mark On Apr 26, 2011, at 8:31 PM, Phillipe Ramalho wrote: Hi everyone, It seems my project was accepted, I am looking forward to start coding for Lucene. Thanks! - Phillipe Ramalho -- Forwarded message -- From: no-re...@socghop.appspotmail.com Date: Mon, Apr 25, 2011 at 2:48 PM Subject: Congratulations! To: phillipe.rama...@gmail.com Dear Phillipe, Congratulations! Your proposal Lucene-2979: Simplify configuration API of contrib Query Parser as submitted to Apache Software Foundation has been accepted for Google Summer of Code 2011. Over the next few days, we will add you to the private Google Summer of Code Student Discussion List. Over the next few weeks, we will send instructions to this list regarding turn in proof of enrollment, tax forms, etc. Now that you've been accepted, please take the opportunity to speak with your mentors about plans for the Community Bonding Period: what documentation should you be reading, what version control system will you need to set up, etc., before start of coding begins on May 23rd. Welcome to Google Summer of Code 2011! We look forward to having you with us. With best regards, The Google Summer of Code Program Administration Team -- Phillipe Ramalho - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer updated SOLR-2242: - Attachment: SOLR-2242.patch I noticed that with the original patch applied, SimpleFacetsTest would fail. The reason is a tiny bug that affects backwards-compatibility in that this would wrap the counts with a counts element in the response. This is valid if using the namedistinct param, but if a user doesn't specify this, it shouldn't affect old behavior. This updated patch corrects this little issue and SimpleFacetsTest now passes. Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3041: --- Attachment: LUCENE-3041.patch A much larger patch that implements full query AST walking. The problem with having the QueryProcessor fully external to Query#rewrite, is that composite Querys would need to expose their children. This is a little messy and could be hard with more exotic user-made Querys. So this patch basically expands Query#rewrite to include the QueryProcessor. Composite queries can then pass their children to the processor during their rewrite. For backwards compat, and simplicity, I've created a SimpleQueryProcessor which directly calls rewrite. This means casual users do not need to concern themselves with processing. Overtime we can expose the QueryProcessor API through IndexSearcher and other situations. Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026103#comment-13026103 ] Bill Bell commented on SOLR-2242: - Lance Norskog, What do you want it to be called? I would use a committer to take this issue on. It has several votes, and lots of downloads. People are using it successfully already. Do you want me to switch the numFacetTerms to numFacetNames ? Anything else? I feel like we are going in circles on this issue. {code} This will output the numFacetTerms AND hgid: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidfacet.mincount=1f.hgid.facet.numFacetTerms=2 lst name=facet_fields lst name=hgid int name=numFacetTerms7/int !-- this is not 11 -- lst name=counts int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst /lst {code} Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026103#comment-13026103 ] Bill Bell edited comment on SOLR-2242 at 4/28/11 3:51 AM: -- Lance Norskog, What do you want it to be called? I would use a committer to take this issue on. It has several votes, and lots of downloads. People are using it successfully already. Do you want me to switch the numFacetTerms to numFacetNames ? Anything else? I feel like we are going in circles on this issue. {code} This will output the numFacetTerms AND hgid: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidfacet.mincount=1f.hgid.facet.numFacetNames=2 lst name=facet_fields lst name=hgid int name=numFacetNames7/int !-- this is not 11 -- lst name=counts int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst /lst {code} was (Author: billnbell): Lance Norskog, What do you want it to be called? I would use a committer to take this issue on. It has several votes, and lots of downloads. People are using it successfully already. Do you want me to switch the numFacetTerms to numFacetNames ? Anything else? I feel like we are going in circles on this issue. {code} This will output the numFacetTerms AND hgid: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=hgidfacet.mincount=1f.hgid.facet.numFacetTerms=2 lst name=facet_fields lst name=hgid int name=numFacetTerms7/int !-- this is not 11 -- lst name=counts int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst /lst {code} Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Male updated LUCENE-3041: --- Attachment: LUCENE-3041.patch Updated patch which removes the stupid test I'd included Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated SOLR-2242: Attachment: SOLR-2242.solr3.1.patch Putting up or shutting up :) Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026124#comment-13026124 ] Lance Norskog edited comment on SOLR-2242 at 4/28/11 5:33 AM: -- Putting up or shutting up :) This splits apart whether to count terms v.s. whether to count docs per term. They are independent concepts. Instead of 'numFacetTerms=0/1/2' it is 'numTerms=true/false'. if you set 'numTerms=true', it counts terms. If you set facet.limit=0, it does not do the facet search. It does not count docs per term. If you set 'numTerms=false' and 'facet.limit=0', it does nothing. And, everything is called 'facet' and 'term' :) was (Author: lancenorskog): Putting up or shutting up :) Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2242) Get distinct count of names for a facet field
[ https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026124#comment-13026124 ] Lance Norskog edited comment on SOLR-2242 at 4/28/11 5:33 AM: -- Putting up or shutting up :) This splits apart whether to count terms v.s. whether to count docs per term. They are independent concepts. Instead of 'numFacetTerms=0/1/2' it is 'numTerms=true/false'. if you set 'numTerms=true', it counts terms. If you set facet.limit=0, it does not do the facet search. It does not count docs per term. If you set 'numTerms=false' and 'facet.limit=0', it does nothing. 'numFacetTerms' is redundant- we know it's all about facets. Thus, 'numTerms'. was (Author: lancenorskog): Putting up or shutting up :) This splits apart whether to count terms v.s. whether to count docs per term. They are independent concepts. Instead of 'numFacetTerms=0/1/2' it is 'numTerms=true/false'. if you set 'numTerms=true', it counts terms. If you set facet.limit=0, it does not do the facet search. It does not count docs per term. If you set 'numTerms=false' and 'facet.limit=0', it does nothing. And, everything is called 'facet' and 'term' :) Get distinct count of names for a facet field - Key: SOLR-2242 URL: https://issues.apache.org/jira/browse/SOLR-2242 Project: Solr Issue Type: New Feature Components: Response Writers Affects Versions: 4.0 Reporter: Bill Bell Priority: Minor Fix For: 4.0 Attachments: SOLR-2242.patch, SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch, SOLR.2242.v2.patch When returning facet.field=name of field you will get a list of matches for distinct values. This is normal behavior. This patch tells you how many distinct values you have (# of rows). Use with limit=-1 and mincount=1. The feature is called namedistinct. Here is an example: http://localhost:8983/solr/select?q=*:*facet=truefacet.field=manufacet.mincount=1facet.limit=-1f.manu.facet.namedistinct=0facet.field=pricef.price.facet.namedistinct=1 Here is an example on field hgid (without namedistinct): {code} - lst name=facet_fields - lst name=hgid int name=HGPY045FD36D4000A1/int int name=HGPY0FBC6690453A91/int int name=HGPY1E44ED6C4FB3B1/int int name=HGPY1FA631034A1B81/int int name=HGPY3317ABAC43B481/int int name=HGPY3A17B2294CB5A5/int int name=HGPY3ADD2B3D48C391/int /lst /lst {code} With namedistinct (HGPY045FD36D4000A, HGPY0FBC6690453A9, HGPY1E44ED6C4FB3B, HGPY1FA631034A1B8, HGPY3317ABAC43B48, HGPY3A17B2294CB5A, HGPY3ADD2B3D48C39). This returns number of rows (7), not the number of values (11). {code} - lst name=facet_fields - lst name=hgid int name=_count_7/int /lst /lst {code} This works actually really good to get total number of fields for a group.field=hgid. Enjoy! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3041) Support Query Visting / Walking
[ https://issues.apache.org/jira/browse/LUCENE-3041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13026129#comment-13026129 ] Lance Norskog commented on LUCENE-3041: --- This is an excellent opportunity to redefine Queries as immutable, which would make query rewriting an order of magnitude safer. Support Query Visting / Walking --- Key: LUCENE-3041 URL: https://issues.apache.org/jira/browse/LUCENE-3041 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Chris Male Priority: Minor Attachments: LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch, LUCENE-3041.patch Out of the discussion in LUCENE-2868, it could be useful to add a generic Query Visitor / Walker that could be used for more advanced rewriting, optimizations or anything that requires state to be stored as each Query is visited. We could keep the interface very simple: {code} public interface QueryVisitor { Query visit(Query query); } {code} and then use a reflection based visitor like Earwin suggested, which would allow implementators to provide visit methods for just Querys that they are interested in. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Lucene.NET 2.9.4g -- only usable with .NET 4.0 ?
Am I correct that your trial code changes make this version of Lucene.NET incompatible and un-buildable with any version of .NET prior to 4.0? As I understand it, 2.9.4g only replaces non-generic collections with generic ones. Generics was introduced in .NET Framework 2.0. Oh, sorry, I took a look at the code just to make sure. It does use SortedSet, a .NET 4 feature. It also uses HashSet, introduced in .NET 3.5. Kind regards, Amanuel