SOLR-236 Patch

2010-06-24 Thread Amdebirhan, Samson, VF-Group
Hi

 

Trying to apply the SOLR-236 patch to a trunk i get what follows. Can
anyone help me understanding what I am missing ?

 


.

svn checkout http://svn.apache.org/repos/asf/lucene/dev/trunk

 

patch -p0 -i SOLR-236-trunk.patch --dry-run

 

patching file
solr/src/test/org/apache/solr/search/fieldcollapse/MyDocTermsIndex.java

patching file
solr/src/java/org/apache/solr/handler/component/CollapseComponent.java

patching file
solr/src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml

patching file
solr/src/java/org/apache/solr/search/fieldcollapse/collector/FieldValueC
ountCollapseCollectorFactory.java

patching file
solr/src/java/org/apache/solr/search/fieldcollapse/collector/DocumentGro
upCountCollapseCollectorFactory.java

can't find file to patch at input line 1068

Perhaps you used the wrong -p or --strip option?

The text leading up to this was:

--

|Index: solr/src/java/org/apache/solr/search/DocSetHitCollector.java

|===

|--- solr/src/java/org/apache/solr/search/DocSetHitCollector.java
(revision 922957)

|+++ solr/src/java/org/apache/solr/search/DocSetHitCollector.java
(revision )

 


.

 

 

 

Regards

Sam

 

 



Re: Field missing when use distributed search + dismax

2010-06-24 Thread Scott Zhang
I believe I especially set it to fl=id,type. No luck.

I believe there is something wrong when solr merge the results.

On Thu, Jun 24, 2010 at 12:41 PM, Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:

 Make sure you list it in ...fl=ID,type or set it in the defaults section
 of your handler.
  Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Scott Zhang macromars...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Tue, June 22, 2010 11:04:07 AM
  Subject: Field missing when use distributed search + dismax
 
  Hi. All.
   I was using distributed search over 30 solr instance, the
  previous one
 was using the standard query handler. And the result was
  returned correctly.
 each result has 2 fields. ID and type.

  Today I want to use search withk dismax, I tried search with each
 instance
  with dismax. It works correctly, return ID and type for each
 result. The
  strange thing is when I
 use distributed search, the result only have ID.
  The field type
 disappeared. I need that type to know what the ID refer
  to. Why solr
 eat my type?


 Thanks.
 Regards.
 Scott



Re: Alphabetic range

2010-06-24 Thread Sophie M.

Hello Otis,

this morning, instead of

http://localhost:8983/solr/music/select?indent=onversion=2.2q=ArtistSort:mi*fq=start=0rows=10fl=ArtistSortqt=standardwt=standardexplainOther=hl.fl=

I tried :

http://localhost:8983/solr/music/select?indent=onversion=2.2q=ArtistSort:Mi*fq=start=0rows=10fl=ArtistSortqt=standardwt=standardexplainOther=hl.fl=

and I get all artists missing :) So all is well. Thank you for your advice
because I still have problems with accents and Analysis will surely help me.

Sophie
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Alphabetic-range-tp916716p919091.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fuzzy query performance

2010-06-24 Thread Peter Karich
Thanks, Robert and Otis!
will try it out now.

Peter.

 Btw. here you can see Robert's presentation on what he did to speed up fuzzy 
 queries:  http://www.slideshare.net/otisg
  Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

   
 So, you mean I should try it out her:

 href=http://svn.apache.org/viewvc/lucene/dev/trunk/solr/; target=_blank 
 
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/
   

 
 yes, 
   
 the speedups are only in trunk.
 



Re: anyone use hadoop+solr?

2010-06-24 Thread Marc Sturlese

Hi Otis, just for curiosity, wich strategy do you use? Index in the map or
reduce side?
Do you use it to build shards or a single monolitic index?
Thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p919335.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 1.4 - Image-Highlighting and Payloads

2010-06-24 Thread MitchK

Sebastian,

sounds like an exciting project.



 We've found the argument TokenGroup in method highlightTerm
 implemented in SimpleHtmlFormatter. TokenGroup provides the method
 getPayload(), but the returned value is always NULL. 
 
No, Token provides this method, not TokenGroup. But this might not be the
mistake.

Hm, since this approach is very special, I would suggest to do something
easier.
You already got tools to retrive the word and the word's position from the
image, right?

What would be, if you add a field to the schema.xml with a preprocessed
input-string.

I.e:
You got two fields:
page's text and page's text's word-positions.

Page's text's word-positions needs preprocessing outside of Solr where you
add the coordinates of the words .

This preprocessing will be a little bit tricky.
If the 10th word is Solr and the 30th word also, you do not want to have
solr two times with different coordinates.
In fact, you want to store both coordinates for the term solr.

However, on the Solr-side you can add this preprocessed string to a field
with TermVectors.
If your query hits the page, you will get all the coordinates you want to
get.
Unfortunately, highlighting must be done on the client side.

Hope this helps
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-1-4-Image-Highlighting-and-Payloads-tp919266p919342.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fuzzy query performance

2010-06-24 Thread Peter Karich
wow! indeed a lot faster (~order of a magnitude). Hopefully we do not
encounter a bug with the trunk :-)

So, Thanks and congrats for that awesome piece of software!

 On Wed, Jun 23, 2010 at 3:34 PM, Peter Karich peat...@yahoo.de wrote:

   
 So, you mean I should try it out her:
 http://svn.apache.org/viewvc/lucene/dev/trunk/solr/


 
 yes, the speedups are only in trunk.

   


underscore, comma in terms.prefix

2010-06-24 Thread stockii

Hello.

this is my filterchain for suggestion with termsComponent:

fieldType name=textgen class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.PatternReplaceFilterFactory
pattern=([,_]) replacement=  replace=all /

filter class=solr.CommonGramsFilterFactory 
words=stopwords.txt
ignoreCase=true/
filter class=solr.StandardFilterFactory/
filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1
generateNumberParts=0 catenateWords=0 splitOnCaseChange=1
splitOnNumerics=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ShingleFilterFactory maxShingleSize=3
outputUnigrams=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/

!-- Ein und Mehrzahl, ü == ue und ue == ü --
filter class=solr.SnowballPorterFilterFactory 
language=German2 /
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/

filter class=solr.CommonGramsFilterFactory words=stopwords.txt
ignoreCase=true/
filter class=solr.StandardFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateAll=1
splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
!-- filter class=solr.ShingleFilterFactory 
maxShingleSize=2
outputUnigrams=false/ --
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


so my question/problem is.

- when i index with this settings i got a underscore (_) in my index. is
comma replace with underscore ? 
- solr import this strin: Eiseimer COOL mit Greifer into this - cool mit
mit when i search for terms.prefix=cool
why is mit twice ? sometimes ist cool twice in my suggest 

any idea ?? ! =) thx



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html
Sent from the Solr - User mailing list archive at Nabble.com.


MoreLikeThis (mlt) : use the match's maxScore for result score normalization

2010-06-24 Thread Chantal Ackermann
Hi there,

consider the following response extract for a MoreLikeThis request:

result name=match numFound=1 start=0 maxScore=13.4579935
result name=response numFound=103708 start=0
maxScore=4.1711807

The first result element is the document that was input and for which to
return more like this results.
The second result element contains the results returned by the handler.

As they both come with a different maxScore I was wondering whether I
could safely use the match's maxScore to normalize the scores of the
more like this documents.

Would that allow to reflect to the user the quality/relevancy of the
hits for different MoreLikeThis requests (and only those)?
(What does the match's maxScore mean?)

Thanks!
Chantal



Re: dataimport.properties is not updated on delta-import

2010-06-24 Thread warb

Hello again!

Upon further investigation it seems that something is amiss with
delta-import after all, the delta-import does not actually import anything
(I thought it did when I ran it previously but I am not sure that was the
case any longer.) It does complete successfully as seen from the front-end
(dataimport?command=delta-import). Also in the logs it is stated the the
import was successful (INFO: Delta Import completed successfully), but there
are exception pertaining to some documents. 

The exception message is that the id field is missing
(org.apache.solr.common.SolrException: Document [null] missing required
field: id). Now, I have checked the column names in the table, the
data-config.xml file and the schema.xml file and they all have the
column/field names written in lowercase and are even named exactly the same.

Do Solr rollback delta-imports if one or more of the documents failed?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-properties-is-not-updated-on-delta-import-tp916753p919609.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: underscore, comma in terms.prefix

2010-06-24 Thread Otis Gospodnetic
stocki,

Solr's Analysis page will tell you what's happening.  I can't tell by just 
looking, though I would first try removing the CommonGramsFF and see if 
repetition is still happening.

 

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: stockii st...@shopgate.com
 To: solr-user@lucene.apache.org
 Sent: Thu, June 24, 2010 10:04:55 AM
 Subject: underscore, comma in terms.prefix
 
 
Hello.

this is my filterchain for suggestion with 
 termsComponent:

fieldType name=textgen class=solr.TextField 
 positionIncrementGap=100
  analyzer 
 type=index
tokenizer 
 class=solr.WhitespaceTokenizerFactory/

 
filter 
 class=solr.PatternReplaceFilterFactory
  
   pattern=([,_]) replacement=  replace=all 
 /


 filter class=solr.CommonGramsFilterFactory 
 words=stopwords.txt
ignoreCase=true/

 filter 
 class=solr.StandardFilterFactory/

 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1
generateNumberParts=0 catenateWords=0 
 splitOnCaseChange=1
splitOnNumerics=0/

 filter class=solr.LowerCaseFilterFactory/

 filter class=solr.ShingleFilterFactory 
 maxShingleSize=3
outputUnigrams=true /

 filter 
 class=solr.RemoveDuplicatesTokenFilterFactory/
  
 /analyzer
  analyzer 
 type=query
tokenizer 
 class=solr.WhitespaceTokenizerFactory/

 
!-- Ein und 
 Mehrzahl, ü == ue und ue == ü --

 filter class=solr.SnowballPorterFilterFactory language=German2 
 /
charFilter 
 class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/

 
filter 
 class=solr.CommonGramsFilterFactory 
 words=stopwords.txt
ignoreCase=true/

 filter class=solr.StandardFilterFactory/
  
   filter 
 class=solr.WordDelimiterFilterFactory
generateWordParts=1 
 generateNumberParts=1 catenateAll=1
splitOnCaseChange=1/
  
   filter 
 class=solr.LowerCaseFilterFactory/

 !-- filter class=solr.ShingleFilterFactory 
 maxShingleSize=2
outputUnigrams=false/ --

 filter 
 class=solr.RemoveDuplicatesTokenFilterFactory/
  
 /analyzer
/fieldType


so my 
 question/problem is.

- when i index with this settings i got a underscore 
 (_) in my index. is
comma replace with underscore ? 
- solr import this 
 strin: Eiseimer COOL mit Greifer into this - cool mit
mit when i 
 search for terms.prefix=cool
why is mit twice ? sometimes ist cool twice in 
 my suggest 

any idea ?? ! =) thx



-- 
View this 
 message in context: 
 href=http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919565.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.


Re: underscore, comma in terms.prefix

2010-06-24 Thread stockii

okay thx. 

WordDelimiterFactory with the option generateNumberParts=0 maked trouble
;-)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/underscore-comma-in-terms-prefix-tp919565p919655.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

2010-06-24 Thread Chantal Ackermann
Hi Otis,

thank you for this super quick answer. I understand that normalizing and
comparing scores is fishy, and I wouldn't want to do it for regular
search results.

I just thought that in this special case, the maxScore which is returned
for the input document to the MoreLikeThis handler -- and this is only
present in MoreLikeThis responses (with include=true) -- might be the
missing additional value that would allow to normalize on. (In this
special case there are two maxScores.)

But I don't know what the match's maxScore is derived from. As the input
element should surely be the best match for the request a maxScore of
13.4579935 looks suspicious?

Thanks,
Chantal




On Thu, 2010-06-24 at 16:25 +0200, Otis Gospodnetic wrote:
 Chantal,
 
 The short answer is that you can't compare relevancy scores across requests.  
 I think this may be in a FAQ.
 Check this:
 http://search-lucene.com/?q=score+compare+absolute+relativefc_project=Lucenefc_project=Solr
 
 Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/
 
 
 
 - Original Message 
  From: Chantal Ackermann chantal.ackerm...@btelligent.de
  To: solr-user@lucene.apache.org solr-user@lucene.apache.org
  Sent: Thu, June 24, 2010 10:17:57 AM
  Subject: MoreLikeThis (mlt) : use the match's maxScore for result score 
  normalization
  
  Hi there,
 
 consider the following response extract for a MoreLikeThis 
  request:
 
 result name=match numFound=1 start=0 
  maxScore=13.4579935
 result name=response numFound=103708 
  start=0
 maxScore=4.1711807
 
 The first result element is the 
  document that was input and for which to
 return more like this 
  results.
 The second result element contains the results returned by the 
  handler.
 
 As they both come with a different maxScore I was wondering 
  whether I
 could safely use the match's maxScore to normalize the scores of 
  the
 more like this documents.
 
 Would that allow to reflect to the 
  user the quality/relevancy of the
 hits for different MoreLikeThis requests 
  (and only those)?
 (What does the match's maxScore 
  mean?)
 
 Thanks!
 Chantal





Re: MoreLikeThis (mlt) : use the match's maxScore for result score normalization

2010-06-24 Thread MitchK

Chantal,

have a look at 
http://lucene.apache.org/java/3_0_1/api/all/org/apache/lucene/search/similar/MoreLikeThis.html
More like this  to have a guess what the MLT's score concerns.

The problem is that you can't compare scores.
The query for the normal result-response was maybe something like 
Bill Gates featuring Linus Torvald - The perfect OS song.
The user picks now one of the responsed documents and says he wants More
like this - maybe, because the concerned topic was okay, but the content
was not enough or whatever...
But the sent query is totaly different (as you can see in the link) - so
that would be like comparing apples and oranges, since they do not use the
same base.

What would be the use case? Why is score-normalization needed?

Kind regards from Germany,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/MoreLikeThis-mlt-use-the-match-s-maxScore-for-result-score-normalization-tp919598p919716.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dataimport.properties is not updated on delta-import

2010-06-24 Thread Erick Erickson
Is there any chance that the id field is, indeed, missing for those
documents?
Does your schema require ID? I've also seen constraints added to a DB
that are not retro-active, so even if there is a constraint requiring ID
it's
still possible that some items in your DB don't have them.

A shot in the dark.
Erick

On Thu, Jun 24, 2010 at 10:21 AM, warb w...@mail.com wrote:


 Hello again!

 Upon further investigation it seems that something is amiss with
 delta-import after all, the delta-import does not actually import anything
 (I thought it did when I ran it previously but I am not sure that was the
 case any longer.) It does complete successfully as seen from the front-end
 (dataimport?command=delta-import). Also in the logs it is stated the the
 import was successful (INFO: Delta Import completed successfully), but
 there
 are exception pertaining to some documents.

 The exception message is that the id field is missing
 (org.apache.solr.common.SolrException: Document [null] missing required
 field: id). Now, I have checked the column names in the table, the
 data-config.xml file and the schema.xml file and they all have the
 column/field names written in lowercase and are even named exactly the
 same.

 Do Solr rollback delta-imports if one or more of the documents failed?
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/dataimport-properties-is-not-updated-on-delta-import-tp916753p919609.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: performance sorting multivalued field

2010-06-24 Thread Chris Hostetter

: I just like play with things. First checked the behavior of sorting on
: multiValued field and what I noticed was, let's say you have docs with field

sorting on a multivalued is defined to have un-specified behavior.  it 
might fail with an error, or it might fail silently.

fundementally solr can't sort on a multivvalued field, no matter how much 
you might want it to, because if a doc contains the values a and z 
then there is no deterministic way to decide where that document 
should appear in an alphabetical list.


-Hoss



Re: Multiple Solr Webapps in Glassfish with JNDI

2010-06-24 Thread Kelly Taylor

Yes, but I dont see that Glassfish has the concept of context fragments
like Tomcat does...even though under the covers Glassfish is a bit of
Tomcat...(Catalina)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiple-Solr-Webapps-in-Glassfish-with-JNDI-tp918383p920008.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: performance sorting multivalued field

2010-06-24 Thread wojtekpia


Chris Hostetter-3 wrote:
 
 sorting on a multivalued is defined to have un-specified behavior.  it 
 might fail with an error, or it might fail silently.
 

I learned this the hard way, it failed silently for a long time until it
failed with an error: 
http://lucene.472066.n3.nabble.com/Different-sort-behavior-on-same-code-td503761.html

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p920012.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Some minor Solritas layout tweaks

2010-06-24 Thread Erik Hatcher

Ken - thanks for these improvements!   Comments below...

On Jun 23, 2010, at 8:24 PM, Ken Krugler wrote:
I grabbed the latest  greatest from trunk, and then had to make a  
few minor layout tweaks.


1. In main.css, the .query-box input { height} isn't tall enough  
(at least on my Mac 10.5/FF 3.6 config), so character descenders get  
clipped.


I bumped it from 40px to 50px, and that fixed the issue for me.


Yeah, wasn't tall enough for my view either and I figured someone with  
some better CSS know-how would fix 'er up.  Thanks!  :)


2. The constraint text (for removing facet constraints) overlaps  
with the Solr logo.


It looks like the div that contains this anchor text is missing a  
class=constraints, as I see a .constraints in the CSS.


I added this class name, and also (to main.css):

.constraints {
 margin-top: 10px;
}

But IANAWD, so this is probably not the best way to fix the issue.

3. And then I see a .constraints-title in the CSS, but it's not used.


I've just committed your changes for 1, 2, and 3.


Was the intent of this to set the '' character to gray?


No, no intention there... just left-over CSS cruft.

4. It seems silly to open JIRA issues for these types of things, but  
I also don't want to add to noise on the list.


Which approach is preferred?


JIRA is ultimately the right place, for tracking and IP sign-off  
purposes.  My inbox can't keep track of these things for long.


Erik



Re: Can query boosting be used with a custom request handlers?

2010-06-24 Thread Chris Hostetter

:  Maybe this helps:
:  http://wiki.apache.org/solr/SolrPlugins#QParserPlugin

Right ... from the point of view of a custom RequestHandler (or 
SearchComponent) they key is to follow the model used by QueryComponent 
and use QParser.getParser(...) to deal with parsing query strings.

Then all of the various registered QParserPlugins can be used w/o any 
custom code.




-Hoss



Similarity

2010-06-24 Thread Blargy

Can someone explain how I can override the default behavior of the tf
contributing a higher score for documents with repeated words?

For example:

Query: foo
Doc1: foo bar score 1.0
Doc2: foo foo bar score 1.1

Doc2 contains foo twice so it is scored higher. How can I override this
behavior?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Similarity-tp920366p920366.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Similarity

2010-06-24 Thread Yonik Seeley
On Thu, Jun 24, 2010 at 3:17 PM, Blargy zman...@hotmail.com wrote:

 Can someone explain how I can override the default behavior of the tf
 contributing a higher score for documents with repeated words?

 For example:

 Query: foo
 Doc1: foo bar score 1.0
 Doc2: foo foo bar score 1.1

 Doc2 contains foo twice so it is scored higher. How can I override this
 behavior?

Depends on the larger context of what you are trying to do.
Do you still want the idf and length norm relevancy factors?  If not,
use a filter, or boost the particular clause with 0.

-Yonik
http://www.lucidimagination.com


Re: performance sorting multivalued field

2010-06-24 Thread Marc Sturlese

Thanks, that's very useful info. However can't reproduce the error. I've
created and index where all documents have a multivalued date field and each
document have a minimum of one value in that field. (most of the docs have 2
or 3). So, the number of un-inverted term instances is greater than
the number of documents.
*There are lot's of docs with the same value, I mention that because I
supose that same value has nothing to do with the number of un-inverted term
instances.

Never get the error explained here:
http://lucene.472066.n3.nabble.com/Different-sort-behavior-on-same-code-td503761.html
Could be that solr 1.4 or lucene 2.9.1 handle this avoiding the error?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/performance-sorting-multivalued-field-tp905943p920464.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Similarity

2010-06-24 Thread Blargy


Yonik Seeley-2-2 wrote:
 
 Depends on the larger context of what you are trying to do.
 Do you still want the idf and length norm relevancy factors?  If not,
 use a filter, or boost the particular clause with 0.
 

I do want the other relevancy factors.. ie boost, phrase-boosting etc but I
just want to make it so that only unique terms in the query contribute to
the overall score.

For example:

Query: foo
Doc1: foo bar baz
Doc2: foo foo bar

The above documents should have the same score.

Query foo baz
Doc1: foo bar baz
Doc2: foo foo bar

In this example Doc1 should be scored higher because it has 2 unique terms
that match


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Similarity-tp920366p920530.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Similarity

2010-06-24 Thread Yonik Seeley
On Thu, Jun 24, 2010 at 4:20 PM, Blargy zman...@hotmail.com wrote:
 Yonik Seeley-2-2 wrote:

 Depends on the larger context of what you are trying to do.
 Do you still want the idf and length norm relevancy factors?  If not,
 use a filter, or boost the particular clause with 0.


 I do want the other relevancy factors.. ie boost, phrase-boosting etc but I
 just want to make it so that only unique terms in the query contribute to
 the overall score.

You can use a custom similarity, but the current downside is that it
will be applied to all fields and all queries.

There could possibly be other workarounds for you, but you would need
to give realistic examples with all of the context (the whole URL
being sent to Solr).

-Yonik
http://www.lucidimagination.com


RE: solr indexing takes a long time and is not reponsive to abort command

2010-06-24 Thread Ya-Wen Hsu
This situation doesn't happen consistently. When we only ran the problematic 
core, the indexing took significant longer than usual(4hrs - 11 hrs). It ran 
successful in the end. When we ran indexing for all cores at the same time, the 
problematic core never finished indexing such that we have to kill the process. 
This happened twice already. I'm running it parallel again to see if the 
problem still persists.

I also notice one thing, in the dataimport UI, the Total Documents Processed 
is missing from the problematic core and appeared for other cores. Does anyone 
know why? Thanks!

Wen

-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Friday, June 18, 2010 5:38 PM
To: solr-user@lucene.apache.org
Subject: Re: solr indexing takes a long time and is not reponsive to abort 
command

Does this happen over and over? Does it happen every time?

On Fri, Jun 18, 2010 at 1:19 PM, Ya-Wen Hsu y...@eline.com wrote:
 I don’t see my last email showed in the mailing list so I’m sending again. 
 Below is the original email.

 Hi,

 I have multi-core solr setup. All cores finished indexing in reasonable time 
 but one. I look at the dataimport info for the one that’s hanging. The 
 process is still in busy state but no requests made or rows fetched. The 
 database side just showed the process is waiting for future command and is 
 doing nothing. The attempt to abort the process doesn’t really work. Does 
 anyone know what’s happening here? Thanks!

 Wen




-- 
Lance Norskog
goks...@gmail.com


questions about Solr shards

2010-06-24 Thread Babak Farhang
Hi everyone,

There are a couple of notes on the limitations of this approach at
http://wiki.apache.org/solr/DistributedSearch which I'm having trouble
understanding.

1. When duplicate doc IDs are received, Solr chooses the first doc
   and discards subsequent ones

Received here is from the perspective of the base Solr instance at
query time, right?  I.e. if you inadvertently indexed 2 versions of
the document with the same unique ID but different contents to 2
shards, then at query time, the first document (putting aside for
the moment what exactly first means) would win.  Am I reading this
right?


2. The index could change between stages, e.g. a document that matched a
   query and was subsequently changed may no longer match but will still be
   retrieved.

I have no idea what this second statement means.


And one other question about shards:

3. The examples I've seen documented do not illustrate sharded,
multicore setups; only sharded monolithic cores.  I assume sharding
works with multicore as well (i.e. the two issues are orthogonal).  Is
this right?


Any help on interpreting the above would be much appreciated.

Thank you,
-Babak


Re: Similarity

2010-06-24 Thread Dave Searle
You could write some client code to translate your query into the following

(Foo and baz) or (foo or baz)

This seems to work well for me

On 24 Jun 2010, at 21:20, Blargy zman...@hotmail.com wrote:

 
 
 Yonik Seeley-2-2 wrote:
 
 Depends on the larger context of what you are trying to do.
 Do you still want the idf and length norm relevancy factors?  If not,
 use a filter, or boost the particular clause with 0.
 
 
 I do want the other relevancy factors.. ie boost, phrase-boosting etc but I
 just want to make it so that only unique terms in the query contribute to
 the overall score.
 
 For example:
 
 Query: foo
 Doc1: foo bar baz
 Doc2: foo foo bar
 
 The above documents should have the same score.
 
 Query foo baz
 Doc1: foo bar baz
 Doc2: foo foo bar
 
 In this example Doc1 should be scored higher because it has 2 unique terms
 that match
 
 
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Similarity-tp920366p920530.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Synonym configuration

2010-06-24 Thread xdzgor

Hi, can someone please confirm the following statements about configuration
for the synonym filter, or correct me where I'm wrong?

a = b
a search for a, is changed into a search for b


a, b = c
a search for a or a search for b, is changed into a search for c
(the same as a=c and b=c)


a = b, c
a search for a, is changed into a search for b and a search for c
(the same as a=b and a=c)


a, b = c, d
a search for a or a search for b, is changed into a search for c and a
search for d
(the same as a=c,d and b=c,d)


a, b, c
depends on the value of the expand parameter in configuration (in the
synonym filter config i schema.xml)

if expand==true
a, b, c = a, b, c

if expand==false
a, b, c = a


Thanks,
Peter
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Synonym-configuration-tp921082p921082.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Synonym configuration

2010-06-24 Thread Koji Sekiguchi

(10/06/25 11:33), xdzgor wrote:

Hi, can someone please confirm the following statements about configuration
for the synonym filter, or correct me where I'm wrong?

a =  b
a search for a, is changed into a search for b


a, b =  c
a search for a or a search for b, is changed into a search for c
(the same as a=c and b=c)


a =  b, c
a search for a, is changed into a search for b and a search for c
(the same as a=b and a=c)


a, b =  c, d
a search for a or a search for b, is changed into a search for c and a
search for d
(the same as a=c,d and b=c,d)


a, b, c
depends on the value of the expand parameter in configuration (in the
synonym filter config i schema.xml)

if expand==true
a, b, c =  a, b, c

if expand==false
a, b, c =  a


Thanks,
Peter
   

Peter,

I think you are correct!

Koji

--
http://www.rondhuit.com/en/



Re: solr indexing takes a long time and is not reponsive to abort command

2010-06-24 Thread Don Werve
2010/6/25 Ya-Wen Hsu y...@eline.com

 This situation doesn't happen consistently. When we only ran the
 problematic core, the indexing took significant longer than usual(4hrs - 11
 hrs). It ran successful in the end. When we ran indexing for all cores at
 the same time, the problematic core never finished indexing such that we
 have to kill the process. This happened twice already. I'm running it
 parallel again to see if the problem still persists.


Off the top of my head:

Have you accidentally opened this core multiple times within the same JVM?
 I had the same thing happen to me when I was testing out a Solr interface I
had written under JRuby; that was loads of fun to track down...

How physically large is the core ('du -sh' if you're on Unix), and how many
files does the index contain?  I've run into issues where frequent updates
created a lot of index files, and which slowed down all core access.

If you've got a lot of index files, has the problem core been optimized?


Re: questions about Solr shards

2010-06-24 Thread Otis Gospodnetic
Hi Babak,

1. Yes, you are reading that correctly.

2. This describes the situation where, for instance, a document with ID=10 is 
updated between the 2 calls to the Solr instance/shard where that doc ID=10 
lives.

3. Yup, orthogonal.  You can have a master with multiple cores for sharded and 
non-sharded indices and you can have a slave with cores that hold complete 
indices or just their shards.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Babak Farhang farh...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, June 24, 2010 6:32:54 PM
 Subject: questions about Solr shards
 
 Hi everyone,

There are a couple of notes on the limitations of this 
 approach at

 target=_blank http://wiki.apache.org/solr/DistributedSearch which I'm 
 having trouble
understanding.

1. When duplicate doc IDs are received, 
 Solr chooses the first doc
   and discards subsequent 
 ones

Received here is from the perspective of the base Solr instance 
 at
query time, right?  I.e. if you inadvertently indexed 2 versions 
 of
the document with the same unique ID but different contents to 
 2
shards, then at query time, the first document (putting aside for
the 
 moment what exactly first means) would win.  Am I reading 
 this
right?


2. The index could change between stages, e.g. a 
 document that matched a
   query and was subsequently changed may no 
 longer match but will still be
   retrieved.

I have no idea what 
 this second statement means.


And one other question about 
 shards:

3. The examples I've seen documented do not illustrate 
 sharded,
multicore setups; only sharded monolithic cores.  I assume 
 sharding
works with multicore as well (i.e. the two issues are 
 orthogonal).  Is
this right?


Any help on interpreting the 
 above would be much appreciated.

Thank you,
-Babak


Re: anyone use hadoop+solr?

2010-06-24 Thread Otis Gospodnetic
Marc,

In Map, purposely ending up with lots of smaller indices/shards at the end of 
the whole MapReduce job.
 Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Marc Sturlese marc.sturl...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thu, June 24, 2010 8:14:22 AM
 Subject: Re: anyone use hadoop+solr?
 
 
Hi Otis, just for curiosity, wich strategy do you use? Index in the map 
 or
reduce side?
Do you use it to build shards or a single monolitic 
 index?
Thanks

-- 
View this message in context: 
 href=http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p919335.html;
  
 target=_blank 
 http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p919335.html
Sent 
 from the Solr - User mailing list archive at Nabble.com.