Re: svn commit: r899979 - /lucene/solr/trunk/example/solr/conf/solrconfig.xml

2010-01-20 Thread Sanne Grinovero
thanks for the heads-up, this is good to know.
I've updated http://wiki.apache.org/lucene-java/AvailableLockFactories
which I recently created as a guide to help in choosing between
different LockFactories.

I believe the Native LockFactory is very useful, I wouldn't consider
this a bug nor consider discouraging it's use, people just need to be
informed of the behavior and know that no LockFactory impl is good for
all cases.

Adding some lines to it's javadoc seems appropriate.

Regards,
Sanne

2010/1/20 Chris Hostetter hossman_luc...@fucit.org:

 :  At a minimu, shouldn't NativeFSLock.obtain() be checking for
 :  OverlappingFileLockException and treating that as a failure to acquire the
 :  lock?
        ...
 : Perhaps - that should make it work in more cases - but in my simple
 : testing its not 100% reliable.
        ...
 : File locks are held on behalf of the entire Java virtual machine.
 :      * They are not suitable for controlling access to a file by multiple
 :      * threads within the same virtual machine.

 ...Grrr  so where does that leave us?

 Yonik's added comment was that native isnt' recommended when running
 multiple webapps in the same container.  in truth, native *can*
 work when running multiple webapps in the same container, just as long as
 those cotnainers don't refrence the same data dirs

 I'm worried that we should recommend people avoid native altogether
 because even if you are only running one webapp, it seems like a reload
 or that app could trigger some similar bad behavior.

 So what/how should we document all of this?

 -Hoss




Case-insensitive searches and facet case

2010-01-20 Thread Peter S

Hi,
 
Regarding case-insensitive searching:
 
In order to support 'case-insensivity' (lower-casing, really), I've set my 
index-time and query-time fieldType analyzer to use a LowerCaseFilterFactory 
filter. This works, but then all my facets get returned in lower-case (e.g. 
'object:MyObject (3)' becomes 'object:myobject (3)').
 
Is there a way to maintain case-impartiality whilst allowing facets to be 
returned 'case-preserved'?
 
Thanks,
Peter
 
  
_
Got a cool Hotmail story? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/

Re: Case-insensitive searches and facet case

2010-01-20 Thread Erik Hatcher


On Jan 20, 2010, at 12:26 PM, Peter S wrote:



Hi,

Regarding case-insensitive searching:

In order to support 'case-insensivity' (lower-casing, really), I've  
set my index-time and query-time fieldType analyzer to use a  
LowerCaseFilterFactory filter. This works, but then all my facets  
get returned in lower-case (e.g. 'object:MyObject (3)' becomes  
'object:myobject (3)').


Is there a way to maintain case-impartiality whilst allowing facets  
to be returned 'case-preserved'?


Yes, use different fields.  Generally facet fields are string which  
will maintain exact case.  You can leverage the copyField capabilities  
in schema.xml to clone a field and analyze it differently.


Erik



Re: Case-insensitive searches and facet case

2010-01-20 Thread Ted Dunning
To amplify this correct answer, use one field for searching (querying).
This would be lower cased.  Then use a second field for faceting (case
preserved).  The only gotcha here is that your original data may have
inconsistent casing.  My usual answer for that is to either impose a
conventional case pattern (which takes you back to one field if you like) or
to do a spelling corrector analysis to find the most common case pattern for
each unique lower cased string.  Then during indexing, I impose that pattern
on the facet field.

On Wed, Jan 20, 2010 at 9:46 AM, Erik Hatcher erik.hatc...@gmail.comwrote:

 Is there a way to maintain case-impartiality whilst allowing facets to be
 returned 'case-preserved'?


 Yes, use different fields.  Generally facet fields are string which will
 maintain exact case.  You can leverage the copyField capabilities in
 schema.xml to clone a field and analyze it differently.




-- 
Ted Dunning, CTO
DeepDyve


[jira] Commented: (SOLR-1553) extended dismax query parser

2010-01-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802950#action_12802950
 ] 

Hoss Man commented on SOLR-1553:


Committed revision 901342.
...
this was the same as my SOLR-1553.pf-refactor.patch with the one addition of 
restoring the use of DisjunctionMaxQuery for the pf* params (per yonik's 
comment that he couldn't remember why he changed it)

if we figure out his reason (i'm sure he had one) we can re-evaluate.

 extended dismax query parser
 

 Key: SOLR-1553
 URL: https://issues.apache.org/jira/browse/SOLR-1553
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Fix For: 1.5

 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch


 An improved user-facing query parser based on dismax

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802973#action_12802973
 ] 

Hoss Man commented on SOLR-1677:



bq. I think I am slightly offended with some of your statements about 
'subjective opinion of the Lucene Community' and 'they should do relevancy 
testing which use some language-specific stemmer whose behavior changed in a 
small but significant way'.

That was not at all my intention, i'm sorry about that.  I was in fact trying 
to speak entirely in generalities and theoretical examples.

The point I was trying to make is that the types of bug fixes we make in Lucene 
are no mathematical absolutes -- we're not fixing bugs where 1+1=3.  Even if 
everyone on java-dev, and java-user agrees that behavior A is broken and 
behavior B is correct, that is still (to me) a subjective opinion -- 1000 mens 
trash may be one mans treasure, and there could be users out there who have 
come to expect/rely on that behavior A.

I tried to use a stemmer as an example because it's the type of class where 
making behavior more correct (ie: making the stemming match the semantics of 
the language more accurately) doesn't necessarily improve the percieved 
behavior for all users -- someone could be very happy with the sloppy 
stemming in the 3.1 version of a (hypothetical) EsperantoStemmer because it 
gives him really loose matches.  And if you (or any one else) put in a lot of 
hard work making that stemmer better my all concievable metrics in 3.4, then 
i've got no problem telling that person Sorry dude, if you don't want those 
fixes don't upgrade, or here are some other suggestions for getting 'loose' 
matching on that field.

My concern is that there may be people who don't even realize they are 
depending on behavior like this.  Without an easy way for users to understand 
what objects have improved/fixed behavior between luceneMatchVersion=X and 
luceneMatchVersion=Y they won't know the full list of things they should be 
considering/testing when they do change luceneMatchVersion.

bq. I'm also not that worried that users won't know what changed - they will 
just know that they are in the same boat as those downloading Lucene latest 
greatest for the first time.

But that's not true:  a person downloading for the first time won't have any 
preconcieved expectaionts of how something will behavior; that's a very 
different boat from a person upgrading is going to expect things that were 
working to keep working -- those things may have actaully been bugs in earlier 
versions, but if they _seemed_ to be working for their use cases, it's going to 
feel like it's broken when the behavior changes.  For a user who is conciously 
upgrading i'm ok with that.  but when there is no easy way of knowing what 
behavior will change as a result of setting luceneMatchVersion=X that doens't 
feel fair to the user.

Robert mentioned in an earlier comment that StopFilter's position increment 
behavior changes depending on the luceneMatchVersion -- what if an existing 
Solr 1.3 user notices a bug in some Tokenizer, and adds 
{{luceneMatchVersion3.0/luceneMatchVersion}} to his schema.xml to fix it.  
Without clear documentation n _everything_ that is affected when doing that, he 
may not realize that StopFilter changed at all -- and even though the position 
incrememnt behavior may now be more correct, it might drasticly change the 
results he gets when using dismax with a particular qs or ps value.  Hence my 
point that this becomes a serious documentation concern: finding a way to make 
it clear to users what they need to consider when modifying luceneMatchVersion.

bq. I'm still all for allowing Version per component for experts use. But man, 
I wouldn't want to be in the boat, managing all my components as they mimic 
various bugs/bad behavior for various components.

But if the example configs only show a global setting that isn't directly 
linked to any of hte individual object configurations, then normal users 
won't have any idea what could have/use individual luceneMatchVerssion settings 
anyway (even if they wanted to manage it piecemeal)

Like i said: i've come around to the idea of having/advocating a global value.  
Once i got passed my mistaken thinking of Version as controlling alternate 
versions (as miller very clearly put it) I started to understand what you are 
all saying and i agree with you: a single global value is a good idea.

My concern is just how to document things so that people don't get confused 
when they do need to change it.


 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  

[jira] Commented: (SOLR-1677) Add support for o.a.lucene.util.Version for BaseTokenizerFactory and BaseTokenFilterFactory

2010-01-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12802979#action_12802979
 ] 

Robert Muir commented on SOLR-1677:
---

bq. The point I was trying to make is that the types of bug fixes we make in 
Lucene are no mathematical absolutes - we're not fixing bugs where 1+1=3.

You are wrong, they are absolutes.
And here are the JIRA issues for stemming bugs, since you didnt take my hint to 
go and actually read them.

LUCENE-2055: I used the snowball tests against these stemmers which claim to 
implement 'snowball algorithm', and they fail. This is an absolute, and the fix 
is to instead use snowball.
LUCENE-2203: I used the snowball tests against these stemmers and they failed. 
Here is Martin Porter's confirmation that these are bugs: 
http://article.gmane.org/gmane.comp.search.snowball/1139

Perhaps you should come up with a better example than stemming, as you don't 
know what you are talking about.  

 Add support for o.a.lucene.util.Version for BaseTokenizerFactory and 
 BaseTokenFilterFactory
 ---

 Key: SOLR-1677
 URL: https://issues.apache.org/jira/browse/SOLR-1677
 Project: Solr
  Issue Type: Sub-task
  Components: Schema and Analysis
Reporter: Uwe Schindler
 Attachments: SOLR-1677.patch, SOLR-1677.patch, SOLR-1677.patch, 
 SOLR-1677.patch


 Since Lucene 2.9, a lot of analyzers use a Version constant to keep backwards 
 compatibility with old indexes created using older versions of Lucene. The 
 most important example is StandardTokenizer, which changed its behaviour with 
 posIncr and incorrect host token types in 2.4 and also in 2.9.
 In Lucene 3.0 this matchVersion ctor parameter is mandatory and in 3.1, with 
 much more Unicode support, almost every Tokenizer/TokenFilter needs this 
 Version parameter. In 2.9, the deprecated old ctors without Version take 
 LUCENE_24 as default to mimic the old behaviour, e.g. in StandardTokenizer.
 This patch adds basic support for the Lucene Version property to the base 
 factories. Subclasses then can use the luceneMatchVersion decoded enum (in 
 3.0) / Parameter (in 2.9) for constructing Tokenstreams. The code currently 
 contains a helper map to decode the version strings, but in 3.0 is can be 
 replaced by Version.valueOf(String), as the Version is a subclass of Java5 
 enums. The default value is Version.LUCENE_24 (as this is the default for the 
 no-version ctors in Lucene).
 This patch also removes unneeded conversions to CharArraySet from 
 StopFilterFactory (now done by Lucene since 2.9). The generics are also fixed 
 to match Lucene 3.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: failing tests

2010-01-20 Thread Grant Ingersoll
I'd look at the properties setup in the Ant build.xml test target to see what's 
missing.  I don't use Eclipse, so that is the only advice I have.  For the 
tests, I believe the solr.solr.home dir needs to be the one under the src test 
dir.

On Jan 20, 2010, at 12:27 AM, Siv Anette Fjellkårstad wrote:

 Hi!
 
 I'm trying to run Solr1.4.0's unit tests from Eclipse (under Windows). About 
 half the tests are failing, and I don't know what I'm doing wrong. This is 
 what I've done:
 
 1. Checked out the code outside Eclipse's workspace
 2. File  New  Project  Java project.
 3. Create project from existing source
 
 4. Five compiler errors. Fixed in this way:
 Properties  Java Build Path  order and Export
 Moved “JRE System Library” to the top
 
 5. I've tried to set Run As  Run Configuration  Arguments  VM Arguments: 
 -Dsolr.solr.home=my solr dir, but perhaps I set the wrong directory?
 
 I can see that we have a lot of solrConfig.xml, but I don't know how to 
 choose the right one for each test.
 
 When I add one conf-directory to the build path, another one is still 
 missing. What have I done wrong?
 
 Kind regards,
 Siv
 
 This email originates from Steria AS, Biskop Gunnerus' gate 14a, N-0051 OSLO, 
 http://www.steria.no. This email and any attachments may contain 
 confidential/intellectual property/copyright information and is only for the 
 use of the addressee(s). You are prohibited from copying, forwarding, 
 disclosing, saving or otherwise using it in any way if you are not the 
 addressee(s) or responsible for delivery. If you receive this email by 
 mistake, please advise the sender and cancel it immediately. Steria may 
 monitor the content of emails within its network to ensure compliance with 
 its policies and procedures. Any email is susceptible to alteration and its 
 integrity cannot be assured. Steria shall not be liable if the message is 
 altered, modified, falsified, or even edited.



[jira] Updated: (SOLR-236) Field collapsing

2010-01-20 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-236:
---

Attachment: SOLR-236.patch

Attached updated patch that works with the latest trunk. This patch is not 
compatible with 1.4 branch.

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Case-insensitive searches and facet case

2010-01-20 Thread Peter S

Hi Erik,

 

Thanks for your reply,

 

As soon as you mentioned it, I realized I have used this technique in the past 
with a couple of fields...so thanks for jogging my failing memory!

 

Thanks,

Peter

 

 


 
 From: erik.hatc...@gmail.com
 To: solr-dev@lucene.apache.org
 Subject: Re: Case-insensitive searches and facet case
 Date: Wed, 20 Jan 2010 12:46:44 -0500
 
 
 On Jan 20, 2010, at 12:26 PM, Peter S wrote:
 
 
  Hi,
 
  Regarding case-insensitive searching:
 
  In order to support 'case-insensivity' (lower-casing, really), I've 
  set my index-time and query-time fieldType analyzer to use a 
  LowerCaseFilterFactory filter. This works, but then all my facets 
  get returned in lower-case (e.g. 'object:MyObject (3)' becomes 
  'object:myobject (3)').
 
  Is there a way to maintain case-impartiality whilst allowing facets 
  to be returned 'case-preserved'?
 
 Yes, use different fields. Generally facet fields are string which 
 will maintain exact case. You can leverage the copyField capabilities 
 in schema.xml to clone a field and analyze it differently.
 
 Erik
 
  
_
Got a cool Hotmail story? Tell us now
http://clk.atdmt.com/UKM/go/195013117/direct/01/