Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml
thanks for the heads-up, this is good to know. I've updated http://wiki.apache.org/lucene-java/AvailableLockFactories which I recently created as a guide to help in choosing between different LockFactories. I believe the Native LockFactory is very useful, I wouldn't consider this a bug nor consider discouraging it's use, people just need to be informed of the behavior and know that no LockFactory impl is good for all cases. Adding some lines to it's javadoc seems appropriate. Regards, Sanne 2010/1/20 Chris Hostetter hossman_luc...@fucit.org: : At a minimu, shouldn't NativeFSLock.obtain() be checking for : OverlappingFileLockException and treating that as a failure to acquire the : lock? ... : Perhaps - that should make it work in more cases - but in my simple : testing its not 100% reliable. ... : File locks are held on behalf of the entire Java virtual machine. : * They are not suitable for controlling access to a file by multiple : * threads within the same virtual machine. ...Grrr so where does that leave us? Yonik's added comment was that native isnt' recommended when running multiple webapps in the same container. in truth, native *can* work when running multiple webapps in the same container, just as long as those cotnainers don't refrence the same data dirs I'm worried that we should recommend people avoid native altogether because even if you are only running one webapp, it seems like a reload or that app could trigger some similar bad behavior. So what/how should we document all of this? -Hoss
Case-insensitive searches and facet case
Hi, Regarding case-insensitive searching: In order to support 'case-insensivity' (lower-casing, really), I've set my index-time and query-time fieldType analyzer to use a LowerCaseFilterFactory filter. This works, but then all my facets get returned in lower-case (e.g. 'object:MyObject (3)' becomes 'object:myobject (3)'). Is there a way to maintain case-impartiality whilst allowing facets to be returned 'case-preserved'? Thanks, Peter _ Got a cool Hotmail story? Tell us now http://clk.atdmt.com/UKM/go/195013117/direct/01/
Re: Case-insensitive searches and facet case
On Jan 20, 2010, at 12:26 PM, Peter S wrote: Hi, Regarding case-insensitive searching: In order to support 'case-insensivity' (lower-casing, really), I've set my index-time and query-time fieldType analyzer to use a LowerCaseFilterFactory filter. This works, but then all my facets get returned in lower-case (e.g. 'object:MyObject (3)' becomes 'object:myobject (3)'). Is there a way to maintain case-impartiality whilst allowing facets to be returned 'case-preserved'? Yes, use different fields. Generally facet fields are string which will maintain exact case. You can leverage the copyField capabilities in schema.xml to clone a field and analyze it differently. Erik
Re: Case-insensitive searches and facet case
To amplify this correct answer, use one field for searching (querying). This would be lower cased. Then use a second field for faceting (case preserved). The only gotcha here is that your original data may have inconsistent casing. My usual answer for that is to either impose a conventional case pattern (which takes you back to one field if you like) or to do a spelling corrector analysis to find the most common case pattern for each unique lower cased string. Then during indexing, I impose that pattern on the facet field. On Wed, Jan 20, 2010 at 9:46 AM, Erik Hatcher erik.hatc...@gmail.comwrote: Is there a way to maintain case-impartiality whilst allowing facets to be returned 'case-preserved'? Yes, use different fields. Generally facet fields are string which will maintain exact case. You can leverage the copyField capabilities in schema.xml to clone a field and analyze it differently. -- Ted Dunning, CTO DeepDyve
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802950#action_12802950 ] Hoss Man commented on SOLR-1553: Committed revision 901342. ... this was the same as my SOLR-1553.pf-refactor.patch with the one addition of restoring the use of DisjunctionMaxQuery for the pf* params (per yonik's comment that he couldn't remember why he changed it) if we figure out his reason (i'm sure he had one) we can re-evaluate. extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 1.5 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802973#action_12802973 ] Hoss Man commented on SOLR-1677: bq. I think I am slightly offended with some of your statements about 'subjective opinion of the Lucene Community' and 'they should do relevancy testing which use some language-specific stemmer whose behavior changed in a small but significant way'. That was not at all my intention, i'm sorry about that. I was in fact trying to speak entirely in generalities and theoretical examples. The point I was trying to make is that the types of bug fixes we make in Lucene are no mathematical absolutes -- we're not fixing bugs where 1+1=3. Even if everyone on java-dev, and java-user agrees that behavior A is broken and behavior B is correct, that is still (to me) a subjective opinion -- 1000 mens trash may be one mans treasure, and there could be users out there who have come to expect/rely on that behavior A. I tried to use a stemmer as an example because it's the type of class where making behavior more correct (ie: making the stemming match the semantics of the language more accurately) doesn't necessarily improve the percieved behavior for all users -- someone could be very happy with the sloppy stemming in the 3.1 version of a (hypothetical) EsperantoStemmer because it gives him really loose matches. And if you (or any one else) put in a lot of hard work making that stemmer better my all concievable metrics in 3.4, then i've got no problem telling that person Sorry dude, if you don't want those fixes don't upgrade, or here are some other suggestions for getting 'loose' matching on that field. My concern is that there may be people who don't even realize they are depending on behavior like this. Without an easy way for users to understand what objects have improved/fixed behavior between luceneMatchVersion=X and luceneMatchVersion=Y they won't know the full list of things they should be considering/testing when they do change luceneMatchVersion. bq. I'm also not that worried that users won't know what changed - they will just know that they are in the same boat as those downloading Lucene latest greatest for the first time. But that's not true: a person downloading for the first time won't have any preconcieved expectaionts of how something will behavior; that's a very different boat from a person upgrading is going to expect things that were working to keep working -- those things may have actaully been bugs in earlier versions, but if they _seemed_ to be working for their use cases, it's going to feel like it's broken when the behavior changes. For a user who is conciously upgrading i'm ok with that. but when there is no easy way of knowing what behavior will change as a result of setting luceneMatchVersion=X that doens't feel fair to the user. Robert mentioned in an earlier comment that StopFilter's position increment behavior changes depending on the luceneMatchVersion -- what if an existing Solr 1.3 user notices a bug in some Tokenizer, and adds {{luceneMatchVersion3.0/luceneMatchVersion}} to his schema.xml to fix it. Without clear documentation n _everything_ that is affected when doing that, he may not realize that StopFilter changed at all -- and even though the position incrememnt behavior may now be more correct, it might drasticly change the results he gets when using dismax with a particular qs or ps value. Hence my point that this becomes a serious documentation concern: finding a way to make it clear to users what they need to consider when modifying luceneMatchVersion. bq. I'm still all for allowing Version per component for experts use. But man, I wouldn't want to be in the boat, managing all my components as they mimic various bugs/bad behavior for various components. But if the example configs only show a global setting that isn't directly linked to any of hte individual object configurations, then normal users won't have any idea what could have/use individual luceneMatchVerssion settings anyway (even if they wanted to manage it piecemeal) Like i said: i've come around to the idea of having/advocating a global value. Once i got passed my mistaken thinking of Version as controlling alternate versions (as miller very clearly put it) I started to understand what you are all saying and i agree with you: a single global value is a good idea. My concern is just how to document things so that people don't get confused when they do need to change it. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task
[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory
[ https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802979#action_12802979 ] Robert Muir commented on SOLR-1677: --- bq. The point I was trying to make is that the types of bug fixes we make in Lucene are no mathematical absolutes - we're not fixing bugs where 1+1=3. You are wrong, they are absolutes. And here are the JIRA issues for stemming bugs, since you didnt take my hint to go and actually read them. LUCENE-2055: I used the snowball tests against these stemmers which claim to implement 'snowball algorithm', and they fail. This is an absolute, and the fix is to instead use snowball. LUCENE-2203: I used the snowball tests against these stemmers and they failed. Here is Martin Porter's confirmation that these are bugs: http://article.gmane.org/gmane.comp.search.snowball/1139 Perhaps you should come up with a better example than stemming, as you don't know what you are talking about. Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory --- Key: SOLR-1677 URL: https://issues.apache.org/jira/browse/SOLR-1677 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Reporter: Uwe Schindler Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards compatibility with old indexes created using older versions of Lucene. The most important example is StandardTokenizer, which changed its behaviour with posIncr and incorrect host token types in 2.4 and also in 2.9. In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with much more Unicode support, almost every Tokenizer/TokenFilter needs this Version parameter. In 2.9, the deprecated old ctors without Version take LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer. This patch adds basic support for the Lucene Version property to the base factories. Subclasses then can use the luceneMatchVersion decoded enum (in 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently contains a helper map to decode the version strings, but in 3.0 is can be replaced by Version.valueOf(String), as the Version is a subclass of Java5 enums. The default value is Version.LUCENE_24 (as this is the default for the no-version ctors in Lucene). This patch also removes unneeded conversions to CharArraySet from StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed to match Lucene 3.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: failing tests
I'd look at the properties setup in the Ant build.xml test target to see what's missing. I don't use Eclipse, so that is the only advice I have. For the tests, I believe the solr.solr.home dir needs to be the one under the src test dir. On Jan 20, 2010, at 12:27 AM, Siv Anette Fjellkårstad wrote: Hi! I'm trying to run Solr1.4.0's unit tests from Eclipse (under Windows). About half the tests are failing, and I don't know what I'm doing wrong. This is what I've done: 1. Checked out the code outside Eclipse's workspace 2. File New Project Java project. 3. Create project from existing source 4. Five compiler errors. Fixed in this way: Properties Java Build Path order and Export Moved “JRE System Library” to the top 5. I've tried to set Run As Run Configuration Arguments VM Arguments: -Dsolr.solr.home=my solr dir, but perhaps I set the wrong directory? I can see that we have a lot of solrConfig.xml, but I don't know how to choose the right one for each test. When I add one conf-directory to the build path, another one is still missing. What have I done wrong? Kind regards, Siv This email originates from Steria AS, Biskop Gunnerus' gate 14a, N-0051 OSLO, http://www.steria.no. This email and any attachments may contain confidential/intellectual property/copyright information and is only for the use of the addressee(s). You are prohibited from copying, forwarding, disclosing, saving or otherwise using it in any way if you are not the addressee(s) or responsible for delivery. If you receive this email by mistake, please advise the sender and cancel it immediately. Steria may monitor the content of emails within its network to ensure compliance with its policies and procedures. Any email is susceptible to alteration and its integrity cannot be assured. Steria shall not be liable if the message is altered, modified, falsified, or even edited.
[jira] Updated: (SOLR-236) Field collapsing
[ https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-236: --- Attachment: SOLR-236.patch Attached updated patch that works with the latest trunk. This patch is not compatible with 1.4 branch. Field collapsing Key: SOLR-236 URL: https://issues.apache.org/jira/browse/SOLR-236 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Emmanuel Keller Assignee: Shalin Shekhar Mangar Fix For: 1.5 Attachments: collapsing-patch-to-1.3.0-dieter.patch, collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, field-collapse-4-with-solrj.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, SOLR-236_collapsing.patch This patch include a new feature called Field collapsing. Used in order to collapse a group of results with similar value for a given field to a single entry in the result set. Site collapsing is a special case of this, where all results for a given web site is collapsed into one or two entries in the result set, typically with an associated more documents from this site link. See also Duplicate detection. http://www.fastsearch.com/glossary.aspx?m=48amid=299 The implementation add 3 new query parameters (SolrParams): collapse.field to choose the field used to group results collapse.type normal (default value) or adjacent collapse.max to select how many continuous results are allowed before collapsing TODO (in progress): - More documentation (on source code) - Test cases Two patches: - field_collapsing.patch for current development version - field_collapsing_1.1.0.patch for Solr-1.1.0 P.S.: Feedback and misspelling correction are welcome ;-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Case-insensitive searches and facet case
Hi Erik, Thanks for your reply, As soon as you mentioned it, I realized I have used this technique in the past with a couple of fields...so thanks for jogging my failing memory! Thanks, Peter From: erik.hatc...@gmail.com To: solr-dev@lucene.apache.org Subject: Re: Case-insensitive searches and facet case Date: Wed, 20 Jan 2010 12:46:44 -0500 On Jan 20, 2010, at 12:26 PM, Peter S wrote: Hi, Regarding case-insensitive searching: In order to support 'case-insensivity' (lower-casing, really), I've set my index-time and query-time fieldType analyzer to use a LowerCaseFilterFactory filter. This works, but then all my facets get returned in lower-case (e.g. 'object:MyObject (3)' becomes 'object:myobject (3)'). Is there a way to maintain case-impartiality whilst allowing facets to be returned 'case-preserved'? Yes, use different fields. Generally facet fields are string which will maintain exact case. You can leverage the copyField capabilities in schema.xml to clone a field and analyze it differently. Erik _ Got a cool Hotmail story? Tell us now http://clk.atdmt.com/UKM/go/195013117/direct/01/