Re: Replacing FAST functionality at sesam.no

2008-09-08 Thread Mck
 So then i change type=string to type=shingleString along with
  [snip]
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.ShingleFilterFactory outputUnigrams=true 
  outputUnigramIfNoNgram=true maxShingleSize=99 /
/analyzer

Debugging ShingleFilter I see that without quotes the shingles
StringBuffer array consists of just the current token.

When the query does have quotes the shingles array fills up with the
expected shingles.
And the Query (infact a MultiPhraseQuery)
  returned from SolrQueryParser.getFieldQuery()
  looks like

list_entry_shingle:(abcd abcd efgh abcd efgh ijkl) (efgh efgh ijkl) ijkl

I'm struggling to make sense of this.
How can the shingles be matched if they aren't quoted?
Why put the parenthesis () when the query has default operator OR?

I would be expecting a Query instead like:
abcd abcd efgh abcd efgh ijkl efgh efgh ijkl ijkl

(This with the ShingleFilter disabled does indeed work perfectly).

Am i barking up the wrong tree?
Is there a way to get the shingles phrased?

Otis, you mentioned this briefly on your reply on the dev list:
 Make sure you turn them into phrase queries

did you mean here something more than just quoting the original query?

~mck

-- 
Claiming Java is easier than C++ is like saying that K2 is shorter than
Everest. Larry O'Brien 
| semb.wever.org | sesat.no | sesam.no |


signature.asc
Description: This is a digitally signed message part


Re: Replacing FAST functionality at sesam.no

2008-09-08 Thread Shalin Shekhar Mangar
I'm not very familiar with shingles but it seems to be that you should have
ShingleFilter at index time and make the query as a phrase query?

On Mon, Sep 8, 2008 at 1:00 PM, Mck [EMAIL PROTECTED] wrote:

  So then i change type=string to type=shingleString along with
   [snip]
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.ShingleFilterFactory outputUnigrams=true
 outputUnigramIfNoNgram=true maxShingleSize=99 /
 /analyzer

 Debugging ShingleFilter I see that without quotes the shingles
 StringBuffer array consists of just the current token.

 When the query does have quotes the shingles array fills up with the
 expected shingles.
 And the Query (infact a MultiPhraseQuery)
  returned from SolrQueryParser.getFieldQuery()
  looks like

 list_entry_shingle:(abcd abcd efgh abcd efgh ijkl) (efgh efgh ijkl) ijkl

 I'm struggling to make sense of this.
 How can the shingles be matched if they aren't quoted?
 Why put the parenthesis () when the query has default operator OR?

 I would be expecting a Query instead like:
 abcd abcd efgh abcd efgh ijkl efgh efgh ijkl ijkl

 (This with the ShingleFilter disabled does indeed work perfectly).

 Am i barking up the wrong tree?
 Is there a way to get the shingles phrased?

 Otis, you mentioned this briefly on your reply on the dev list:
  Make sure you turn them into phrase queries

 did you mean here something more than just quoting the original query?

 ~mck

 --
 Claiming Java is easier than C++ is like saying that K2 is shorter than
 Everest. Larry O'Brien
 | semb.wever.org | sesat.no | sesam.no |




-- 
Regards,
Shalin Shekhar Mangar.


Re: Replacing FAST functionality at sesam.no

2008-09-08 Thread Mck
 I'm not very familiar with shingles but it seems to be that you should
 have ShingleFilter at index time and make the query as a phrase query?

Then the entry abcd efgh ijkl would be indexed as 
(abcd abcd efgh abcd efgh ijkl efgh efgh ijkl ijkl)

and a subsequent query abcd would return this entry.
If this is so then this is not exact matching and not what we are
looking for.

The filter behaviour we are looking for is like:
   (i've included ^$ to denote the exact matching)

Original Query   -- Filtered Query
 abcd--  ^abcd$
abcd efgh  -- (^abcd$ ^abcd efgh$ ^efgh$)
abcd efgh ijkl -- (^abcd$ ^abcd efgh$ ^abcd efgh ijkl$ ^efgh$ ^efgh 
ijkl$ ^ijkl$)


~mck

-- 
All stable processes we shall predict. All unstable processes we shall
control. John von Neumann 
| semb.wever.org | sesat.no | sesam.no |


signature.asc
Description: This is a digitally signed message part


RE: matser /slave issue on solr

2008-09-08 Thread dudes dudes

Thanks Bill for your suggestions, they helped a lot,,, problems are resolved :)

cheers
ak 

 Date: Fri, 5 Sep 2008 15:24:06 -0400
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Re: matser /slave issue on solr
 
 Try running snappuller with the -V option to show debug output.
 
 Here's the closest thing to a step by step doc:
 http://wiki.apache.org/solr/SolrCollectionDistributionOperationsOutline
 
 Please note the first bullet under slave Solr server that the user id
 under which the scripts run must be able to ssh/scp from the slave to the
 master without being prompted for a password.
 
 Bill
 
 On Fri, Sep 5, 2008 at 9:17 AM, dudes dudes  wrote:
 

 Hi Bill,

 Thanks very much for your kind reply... I have tried your suggestion, but
 unfortunately   didn't work...
 and  tried other tweaks from the link you sent,,, but no luck :(,,, I also
 don't find any errors in the log files..

 is there any online step by step docs on this topic by any chance ?

 thanks
 ak

 
 Date: Thu, 4 Sep 2008 09:40:01 -0400
 From: [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Subject: Re: matser /slave issue on solr

 On your slave,

 solr_hostname should be localhost
 and
 master_host should be the hostname of your master server

 Check out the following Wiki for a full description of the variables in
 scripts.conf:

 http://wiki.apache.org/solr/SolrCollectionDistributionScripts

 Bill

 On Thu, Sep 4, 2008 at 4:46 AM, dudes dudes  wrote:


 Hello All,

 I have taken the following steps to configure master and slave
 servers
 However, the slave doesn't seem to sync with the master...
 Please let me know what I have done wrong ,,,

 both are nightly version 2008-7-7 on ubuntu machine java 1.6

 On the master machine:

 1) the scripts.conf file .

 user=
 solr_hostname= localhost
 solr_port= 8983
 rsyncd_port= 18983
 data_dir=
 webapp_name= solr
 master_host=
 master_data_dir=
 master_status_dir=

 2) Indexed some docs

 3) Then I issued the following commands..

 ./rsyncd-enable; rsyncd-start
 ./snapshooter

 On the slave machine:

 1) the scripts.conf file

 user=
 solr_hostname=mastereserver.companyname.com
 solr_port=8080
 rsyncd_port=18983
 data_dir=
 webapp_name=solr
 master_host=localhost
 master_data_dir=/root/masterSolr/apache-solr-nightly/example/solr/data/


 master_status_dir=/root/masterSolr/apache-solr-nightly/example/solr/logs/clients/

 2) Then the following commands are issued:

 ./snappuller -P 18983
 ./snapinstaller
 ./commit



 3) however on the stats.jsp it says numDocs=0 ( on the salve machine).

 thanks for your time and suggestions
 ak





 _
 Get all your favourite content with the slick new MSN Toolbar - FREE
 http://clk.atdmt.com/UKM/go/111354027/direct/01/


 _
 Discover Bird's Eye View now with Multimap from Live Search
 http://clk.atdmt.com/UKM/go/111354026/direct/01/

_
Discover Bird's Eye View now with Multimap from Live Search
http://clk.atdmt.com/UKM/go/111354026/direct/01/

Re: AW: Cross-context-forward to solr-instance

2008-09-08 Thread David Smiley @MITRE.org

FWIW, I'm also using the SolrRequestFilter for forwards, despite the warning. 
Solr1.3 doesn't have the concept of a default core anymore yet I want this
feature.  I made an uber-simple JSP like this:
jsp:forward page=%= mycorename/select? + request.getQueryString() %
/
And so now my clients don't need to update their URL just because I've
migrated to Solr 1.3.  Oh, I needed to set up the dispatcher FORWARD as you
mentioned and I also remapped the /select/* servlet mapping to my jsp.:
  servlet
servlet-nameselectDefaultCore/servlet-name
jsp-file/selectDefaultCore.jsp/jsp-file
  /servlet

  servlet-mapping
servlet-nameselectDefaultCore/servlet-name
url-pattern/select/*/url-pattern
  /servlet-mapping

The only problem I've seen so far is that if I echo the params
(echoParams=all), I see the output doubled.  Weird but inconsequential.

~ David Smiley


Hachmann wrote:
 
 Hi,
 
 I made a mistake. At least with Tomcat 5.5.x, if you configure the
 SolrRequestFilter with dispatcherFORWARD/dispatcher it indeed gets
 called even when you forward from another web-context! 
 
 Note, that the documentation says this might be problematic!
 
 Sorry for the previous overhasty post.
 Björn
 
 -Ursprüngliche Nachricht-
 Von: 
 [EMAIL PROTECTED]
 g 
 [mailto:[EMAIL PROTECTED]
 pache.org] Im Auftrag von Hachmann, Bjoern
 Gesendet: Samstag, 6. September 2008 08:01
 An: solr-user@lucene.apache.org
 Betreff: Cross-context-forward to solr-instance
 
 Hi, 
  
 yesterday I tried the Solr-1.3-RC2 and everything seems to 
 work fine using the traditional single-core setup. But while 
 troubleshooting the new multi-core feature, I realized for 
 the first time, that I have been using the deprecated (even 
 in 1.2) class SolrServlet. This is a huge problem for us, as 
 we run the solr-web-app parallel to our main web-app in the 
 same servlet-container. Using this approach we can internally 
 forward update- and select-requests to the Solr-instance 
 currently in use. 
  
 ServletContext ctx = getServletContext().getContext(solr1);
 RequestDispatcher rd = ctx.getNamedDispatcher(SolrServer);
 rd.forward(request, response);
 
 As you can see, this approach only works for the servlet 
 named 'SolrServer' which references the deprecated class. 
 
 The attempt of using a path based dispatcher 
 (ctx.getRequestDispatcher) was not successful, even though I 
 configured the SolrRequestFilter in the solr-web.xml to work 
 on forwards (dispatcherFORWARD/dispatcher), which the 
 documentation discourages. Maybe this is because of the 
 cross-context-dispatch?
 
 At the moment I ran totally out of ideas, apart from 
 completely redesigning our whole setup. Any ideas are highly 
 appreciated. 
 
 Thanks in advance,
 Björn
 
 

-- 
View this message in context: 
http://www.nabble.com/Cross-context-forward-to-solr-instance-tp19343349p19373757.html
Sent from the Solr - User mailing list archive at Nabble.com.



Problem retrieving results

2008-09-08 Thread Alex Gadea
I am suddenly experiencing a problem retrieving results from a SOLR 
installation.  The install shows that there are documents indexed and I have 
issued multiple commits.  When I execute a query I receive 0 results back, but 
when I close the query handler, it indicates that the queryResultCache had a 
hit ratio of 66%.  I have deleted the index directory and recreated it.  I'm 
relatively new to SOLR and have no idea what to look at next.  Any suggestions? 
 Is there any way to issue a query against SOLR that will return all records in 
the index?

Thanks.
Alex


Re: Replacing FAST functionality at sesam.no

2008-09-08 Thread Otis Gospodnetic
Just glancing over this.  I believe one of the recent shingle contributions 
over in Lucene contrib/ indeed has the option to add those begin/end marker 
characters, so if this will solve your exact matching needs, that's the thing 
to look at.  You'll have to write (and contribute?) a bit of glue to use it in 
Solr.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Mck [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, September 8, 2008 4:43:50 AM
 Subject: Re: Replacing FAST functionality at sesam.no
 
  I'm not very familiar with shingles but it seems to be that you should
  have ShingleFilter at index time and make the query as a phrase query?
 
 Then the entry abcd efgh ijkl would be indexed as 
 (abcd abcd efgh abcd efgh ijkl efgh efgh ijkl ijkl)
 
 and a subsequent query abcd would return this entry.
 If this is so then this is not exact matching and not what we are
 looking for.
 
 The filter behaviour we are looking for is like:
(i've included ^$ to denote the exact matching)
 
 Original Query   -- Filtered Query
 abcd--  ^abcd$
 abcd efgh  -- (^abcd$ ^abcd efgh$ ^efgh$)
 abcd efgh ijkl -- (^abcd$ ^abcd efgh$ ^abcd efgh ijkl$ ^efgh$ ^efgh 
 ijkl$ ^ijkl$)
 
 
 ~mck
 
 -- 
 All stable processes we shall predict. All unstable processes we shall
 control. John von Neumann 
 | semb.wever.org | sesat.no | sesam.no |



Re: Problem retrieving results

2008-09-08 Thread Yonik Seeley
On Mon, Sep 8, 2008 at 12:16 PM, Alex Gadea [EMAIL PROTECTED] wrote:
 Is there any way to issue a query against SOLR that will return all records 
 in the index?

http://localhost:8983/solr/select?q=*:*

Checking the admin stats page should also tell you the number of
documents in the index.

-Yonik


Re: Faceting MoreLikeThisComponent results

2008-09-08 Thread wojtekpia

Thanks Hoss. I created SOLR 760:
https://issues.apache.org/jira/browse/SOLR-760



hossman wrote:
 
 
 : When using the MoreLikeThisHandler with facets turned on, the facets
 show
 : counts of things that are more like my original document. When I use the
 : MoreLikeThisComponent, the facets show counts of things that match my
 : original document (I'm querying by document ID), so there is only one
   ...
 : How can I facet the results of the MoreLikeThisComponent?
 
 I don't think you can at this point.  The good news is MoreLikeThisHandler 
 isn't getting removed anytime soon.
 
 
 What we need to do is provide more options on the componets to dictate 
 their behavior when deciding what to process and how to return it ... your 
 example could be solved be either adding an option to MLTComponent telling 
 it to overwrite hte main result set; or by adding an option to 
 FacetComponent specifying the name of a DocSet in the response to use in 
 it's intersections.
 
 I think it would be good to do both.
 
 (HighlightComponent should probably also have an option just like the one 
 i discribed for FacetComponent)
 
 Would you mind filing a feature request?
 
 
 -Hoss
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Faceting-MoreLikeThisComponent-results-tp19206833p19376403.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: 1.3.0 candidate

2008-09-08 Thread Grant Ingersoll

This is temporarily removed, as I need to create another.

On Sep 7, 2008, at 8:45 PM, Grant Ingersoll wrote:


I've posted what I hope is the final 1.3.0 candidate at 
http://people.apache.org/~gsingers/solr/1.3.0/

Please try it out and provide feedback. Note, this is not an  
official release.


Cheers,
Grant





Re: handling multiple multiple resources with single requestHandler

2008-09-08 Thread Aleksandar Bradic


Sure - overriding the SolrDispatchFilter seems like a right way to go  
(especially maintenance-wise :) ).


Thanks :)

ps. - as far as the : - situation is concerned - that was useful -  
but i guess it didn't look nice ;)
(anyway - i guess that the :-trim filter must have persisted there  
in the in order to support legacy apps using the :-notation)


.Alek

On Sep 7, 2008, at 7:44 AM, Chris Hostetter wrote:



: Any ideas on how could we register single request handler for  
handling

: multiple (wildcarded) contexts/resource uri's ?
:
: (something like) :
:
: requestHandler name=/app/* class=solr.StandardRequestHandler 
: requestHandler name=/app/*/query  
class=solr.StandardRequestHandler


One of the reasons wildcards aren't supported is because it creates
ambiguity when dealing with dynamicly created RequestHandlers.

Once upon a time we had the notion that a : (colon) could be used  
in the
query path to denote that SolrDispatchFilter should stop there and  
treat
everything up to the colon as the handler name, while everything  
after the
colon should be put in the SolrQueryRequest for use by the  
RequestHandler,

ie...
  /app/query?q=solr
  /app/query:yakko/foo/yak?q=solr
  /app/query:dot/bar/hoss?q=solr
...would all get processed by the /app/query handler which would  
have

access to the , yakko/foo/yak, and dot/bar/hoss parts for each
request.

That seems to have been removed from SOlrDispatchFilter at some  
point, I'm

not clear why but there are clearly remnents of it so maybe it was a
mistake...

   // unused feature ?
   int idx = path.indexOf( ':' );
   if( idx  0 ) {
 // save the portion after the ':' for a 'handler' path  
parameter

 path = path.substring( 0, idx );
   }

...i'm kind of tired right now, but if i'm reading that correctly it's
flat out ignoring anything after the colon. (which seems like the  
worst of
both worlds ... you can't have a : in your request handler name,  
but you

can't have access to what comes after it if you put it in the URL)

I'm Not sure what's going on there.  Maybe someone else understands.

: The only way I can do it right now is by modifying  
SolrDispatchFilter, and
: manually adding request context trimming there (reducing the  
requested context
: to /app/), and registering handler for that context (which would  
later
: resolve other parts of it) - but if there is another way to do  
this -
: without changing the code, I would be more than happy to learn  
about it :)


if you're comfortable with ServletFilters enough to muck with
SolrDispatchFilter, then wouldn't writing a new filter that you  
configure
to sit in front of SolrDispatchFilter and take pieces out of the URL  
and
add them as request params be just as easy to write (and a lot  
easier to

maintain) ?


-Hoss




RE: AW: Cross-context-forward to solr-instance

2008-09-08 Thread Lance Norskog
You can give a default core set by adding a default parameter to the query
in solrconfig.xml. This is hacky, but it gives you a set of cores instead of
just one core.

-Original Message-
From: David Smiley @MITRE.org [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 08, 2008 7:54 AM
To: solr-user@lucene.apache.org
Subject: Re: AW: Cross-context-forward to solr-instance


FWIW, I'm also using the SolrRequestFilter for forwards, despite the
warning. 
Solr1.3 doesn't have the concept of a default core anymore yet I want this
feature.  I made an uber-simple JSP like this:
jsp:forward page=%= mycorename/select? + request.getQueryString() %
/
And so now my clients don't need to update their URL just because I've
migrated to Solr 1.3.  Oh, I needed to set up the dispatcher FORWARD as you
mentioned and I also remapped the /select/* servlet mapping to my jsp.:
  servlet
servlet-nameselectDefaultCore/servlet-name
jsp-file/selectDefaultCore.jsp/jsp-file
  /servlet

  servlet-mapping
servlet-nameselectDefaultCore/servlet-name
url-pattern/select/*/url-pattern
  /servlet-mapping

The only problem I've seen so far is that if I echo the params
(echoParams=all), I see the output doubled.  Weird but inconsequential.

~ David Smiley


Hachmann wrote:
 
 Hi,
 
 I made a mistake. At least with Tomcat 5.5.x, if you configure the 
 SolrRequestFilter with dispatcherFORWARD/dispatcher it indeed gets 
 called even when you forward from another web-context!
 
 Note, that the documentation says this might be problematic!
 
 Sorry for the previous overhasty post.
 Björn
 
 -Ursprüngliche Nachricht-
 Von: 
 [EMAIL PROTECTED]
 g
 [mailto:[EMAIL PROTECTED]
 pache.org] Im Auftrag von Hachmann, Bjoern
 Gesendet: Samstag, 6. September 2008 08:01
 An: solr-user@lucene.apache.org
 Betreff: Cross-context-forward to solr-instance
 
 Hi,
  
 yesterday I tried the Solr-1.3-RC2 and everything seems to work fine 
 using the traditional single-core setup. But while troubleshooting 
 the new multi-core feature, I realized for the first time, that I 
 have been using the deprecated (even in 1.2) class SolrServlet. This 
 is a huge problem for us, as we run the solr-web-app parallel to our 
 main web-app in the same servlet-container. Using this approach we 
 can internally forward update- and select-requests to the 
 Solr-instance currently in use.
  
 ServletContext ctx = getServletContext().getContext(solr1);
 RequestDispatcher rd = ctx.getNamedDispatcher(SolrServer);
 rd.forward(request, response);
 
 As you can see, this approach only works for the servlet named 
 'SolrServer' which references the deprecated class.
 
 The attempt of using a path based dispatcher
 (ctx.getRequestDispatcher) was not successful, even though I 
 configured the SolrRequestFilter in the solr-web.xml to work on 
 forwards (dispatcherFORWARD/dispatcher), which the documentation 
 discourages. Maybe this is because of the cross-context-dispatch?
 
 At the moment I ran totally out of ideas, apart from completely 
 redesigning our whole setup. Any ideas are highly appreciated.
 
 Thanks in advance,
 Björn
 
 

--
View this message in context:
http://www.nabble.com/Cross-context-forward-to-solr-instance-tp19343349p1937
3757.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: Problem retrieving results

2008-09-08 Thread Alex Gadea
On the stats page it shows:

caching : true
numDocs : 170
maxDoc : 340
readerImpl : MultiReader
readerDir : 
org.apache.lucene.store.FSDirectory@/usr/local/apache-solr/example/solr/data/index
indexVersion : 1220876093260
openedAt : Mon Sep 08 13:58:17 EDT 2008
registeredAt : Mon Sep 08 13:58:18 EDT 2008 

If I use the query to retrieve all, I get all the results so at least I know 
they are there.  Phew!  If I do a query of:

http://localhost:8983/solr/select?q=suit

I get nothing even though one of the records that was returned includes that 
word in it.

Thanks,

Alex


- Original Message -
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, September 8, 2008 12:28:41 PM GMT -05:00 US/Canada Eastern
Subject: Re: Problem retrieving results

On Mon, Sep 8, 2008 at 12:16 PM, Alex Gadea [EMAIL PROTECTED] wrote:
 Is there any way to issue a query against SOLR that will return all records 
 in the index?

http://localhost:8983/solr/select?q=*:*

Checking the admin stats page should also tell you the number of
documents in the index.

-Yonik


Re: Problem retrieving results

2008-09-08 Thread Alex Gadea
Never mind.  I figured out the problem - there was a copyField that was the 
default field that was not setup properly.  

Thanks for the help!
Alex

- Original Message -
From: Alex Gadea [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, September 8, 2008 2:01:21 PM GMT -05:00 US/Canada Eastern
Subject: Re: Problem retrieving results

On the stats page it shows:

caching : true
numDocs : 170
maxDoc : 340
readerImpl : MultiReader
readerDir : 
org.apache.lucene.store.FSDirectory@/usr/local/apache-solr/example/solr/data/index
indexVersion : 1220876093260
openedAt : Mon Sep 08 13:58:17 EDT 2008
registeredAt : Mon Sep 08 13:58:18 EDT 2008 

If I use the query to retrieve all, I get all the results so at least I know 
they are there.  Phew!  If I do a query of:

http://localhost:8983/solr/select?q=suit

I get nothing even though one of the records that was returned includes that 
word in it.

Thanks,

Alex


- Original Message -
From: Yonik Seeley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, September 8, 2008 12:28:41 PM GMT -05:00 US/Canada Eastern
Subject: Re: Problem retrieving results

On Mon, Sep 8, 2008 at 12:16 PM, Alex Gadea [EMAIL PROTECTED] wrote:
 Is there any way to issue a query against SOLR that will return all records 
 in the index?

http://localhost:8983/solr/select?q=*:*

Checking the admin stats page should also tell you the number of
documents in the index.

-Yonik


RE: 1.3.0 candidate

2008-09-08 Thread Teruhiko Kurosaka
Grant,
Is this coming back soon? Rough estimate?

-kuro  

 -Original Message-
 From: Grant Ingersoll [mailto:[EMAIL PROTECTED] 
 Sent: Monday, September 08, 2008 10:34 AM
 To: solr-user@lucene.apache.org
 Subject: Re: 1.3.0 candidate
 
 This is temporarily removed, as I need to create another.
 
 On Sep 7, 2008, at 8:45 PM, Grant Ingersoll wrote:
 
  I've posted what I hope is the final 1.3.0 candidate at 
  http://people.apache.org/~gsingers/solr/1.3.0/
 
  Please try it out and provide feedback. Note, this is not 
 an official 
  release.
 
  Cheers,
  Grant
 
 
 


MoreLikeThis mlt.qf boosting

2008-09-08 Thread Clas Rydergren
Hi! I have been testing the MoreLikeThis feature in Solr. I have
indexed a subset of Wikipedia with the fields title (the title of the
Wikipedia page) and content (the Wikipedia page content). When
performing a MoreLikeThis request on this index as:

http://server:8983/solr/mlt?stream.body=google+yahoomlt.fl=title,contentmlt.interestingTerms=detailsmlt.boost=truemlt.mintf=0fl=titlemlt.qf=title^1000.0+content^0.1

I get the following (manually compressed) output:

str name=titleSequoia Capital /str
str name=titleGoogle /str
str name=titleGoogle Translate /str
lst name=interestingTerms
float name=content:yahoo0.1 /float
float name=content:google0.08868032 float
/lst

Note that the document with Google in the title is ranked lower than
the Sequoia Capital document. I have two questions:

Firstly: Why is the interestingTerms prefixed with the tag
content? Does that mean that the MLT-query is made in the
content-field only? If so, how to adjust the search to include both
title and content (also copied to a field called text)?

Secondly, and possibly related to the first question: Independent of
the qf-boosts (now 1000.0 and 0.1) the second search result (with
Google title) is not ranked higher in the MLT search. Why is that?
This indicate to me that the MLT field boosting not works as I expect.
I would have like to see the document with the Google title ranked
first. How should I boost to do that?

Cheers
Clas


Re: Index partioning

2008-09-08 Thread Chris Hostetter
: I found this thread in the archive...
: 
: I'm responsible for a number of ruby on rails websites, all of which need
: search.  Solr seems to have everything I need, but I am wondering what's the
: best way to maintain multiple indexes?
: 
: Multiple Solr instances on different ports?

having multiple indexes is a different beast from the Index Partitioning 
topic this thread was discussing ... there's some good info on the wiki 
about the various options (they each have their trade offs to consider)

http://wiki.apache.org/solr/MultipleIndexes

-Hoss



Re: SolrCore, reload, synonyms not reloaded

2008-09-08 Thread Chris Hostetter

: I'm using Solr 1.3 and I've never been able to get the SolrCore (formerly
: MultiCore) reload feature to pick up changes I made to my synonyms file.  At
: index time I expand synonyms.  If I change my synonyms.txt file then do a
: MultiCore RELOAD and then reindex my data and then do a query that should
: work now that I added a synonym, it doesn't work.  If I go to the analysis
: page and try putting in the text I see that it did pick up the changes.  I'm
: forced to bring down the the webapp for the changes to truly be reloaded. 
: Has anyone else seen this?  

David: I don't really use the Multi Core support, but your problem 
descripting intrigued me so i tried it out, and i can *not* reproduce the 
problem you are having.

Steps i took

1) applied the patch listed at the end of this email to the Solr trunk.  
note that it adds a text field to the multicore core1 example configs.  
this field uses SynonymFilter at index time.  I also added a synonyms file 
with chris, hostetter as the only entry.

2) cd example; java -Dsolr.solr.home=multicore -jar start.jar

3) java -Ddata=args -Durl=http://localhost:8983/solr/core1/update -jar post.jar 
'adddocfield name=id1/fieldfield name=textchris and 
david/field/doc/add'

4) checked luke handler, confirmed that chris, hostetter, and,  david 
were indexed terms.

5) added david, smiley to my synonyms file

6) http://localhost:8983/solr/admin/cores?action=RELOADcore=core1

7) repeated step #3

8) confirmed with luke that smiley was now an indexed term.  also 
confirmed that query for text:smiley found my doc


Here's the patch...



Index: example/multicore/core1/conf/schema.xml
===
--- example/multicore/core1/conf/schema.xml (revision 693303)
+++ example/multicore/core1/conf/schema.xml (working copy)
@@ -19,6 +19,18 @@
 schema name=example core one version=1.1
   types
fieldtype name=string  class=solr.StrField sortMissingLast=true 
omitNorms=true/
+
+fieldType name=text class=solr.TextField positionIncrementGap=100
+  analyzer type=index
+tokenizer class=solr.WhitespaceTokenizerFactory/
+filter class=solr.LowerCaseFilterFactory/
+filter class=solr.SynonymFilterFactory 
synonyms=index_synonyms.txt ignoreCase=true expand=true/
+  /analyzer
+  analyzer type=query
+tokenizer class=solr.WhitespaceTokenizerFactory/
+filter class=solr.LowerCaseFilterFactory/
+  /analyzer
+/fieldType
   /types
 
  fields   
@@ -27,6 +39,7 @@
   field name=type type=stringindexed=true  stored=true  
multiValued=false / 
   field name=name type=stringindexed=true  stored=true  
multiValued=false / 
   field name=core1type=stringindexed=true  stored=true  
multiValued=false / 
+  field name=texttype=textindexed=true  stored=true  
multiValued=false / 
  /fields
 
  !-- field to use to determine and enforce document uniqueness. --
Index: example/multicore/core1/conf/index_synonyms.txt
===
--- example/multicore/core1/conf/index_synonyms.txt (revision 0)
+++ example/multicore/core1/conf/index_synonyms.txt (revision 0)
@@ -0,0 +1,2 @@
+chris, hostetter
+

Property changes on: example/multicore/core1/conf/index_synonyms.txt
___
Name: svn:keywords
   + Date Author Id Revision HeadURL
Name: svn:eol-style
   + native