Re[2]: Question about search suggestion

2008-08-27 Thread Aleksey Gogolev

I searched and read about auto-complete feature. Thanks. It looks
nice, I think I should try it first.

NM On Tue, 26 Aug 2008 15:15:21 +0300
NM Aleksey Gogolev [EMAIL PROTECTED] wrote:

 
 Hello.
 
 I'm new to solr and I need to make a search suggest (like google
 suggestions).
 

NM Hi Aleksey,
NM please search the archives of this list for subjects containing 
'autocomplete'
NM or 'auto-suggest'. that should give you a few ideas and starting points.

NM best,
NM B

NM _
NM {Beto|Norberto|Numard} Meijome

NM The more I see the less I know for sure. 
NM   John Lennon

NM I speak for myself, not my employer. Contents may be hot. Slippery when wet.
NM Reading disclaimers makes you go blind. Writing them is worse. You have been
NM Warned.

NM __ NOD32 3387 (20080826) Information __

NM This message was checked by NOD32 antivirus system.
NM http://www.eset.com




-- 
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey mailto:[EMAIL PROTECTED]



Wrong sort by score

2008-08-27 Thread Yuri Jan
Hi,

I have encountered a weird problem in solr.
In one of my queries (dismax, default sorting) I noticed that the results
are not sorted by score (according to debugQuery).
The first 150 results are tied (with score 12.806474), and after those,
there is a bunch of results with higher score (12.962835).

What can be the cause?
I'm overriding the tf function in my similarity class. Can it be related?

Thanks,
Yuri


Re: Wrong sort by score

2008-08-27 Thread Yonik Seeley
On Wed, Aug 27, 2008 at 9:10 AM, Yuri Jan [EMAIL PROTECTED] wrote:
 I have encountered a weird problem in solr.
 In one of my queries (dismax, default sorting) I noticed that the results
 are not sorted by score (according to debugQuery).

 The first 150 results are tied (with score 12.806474), and after those,
 there is a bunch of results with higher score (12.962835).

 What can be the cause?
 I'm overriding the tf function in my similarity class. Can it be related?

Do the explain scores in the debug section match the normal scores
paired with the documents?  (add score to the fl parameter to get a
score with each document).

-Yonik


Re: Wrong sort by score

2008-08-27 Thread Yuri Jan
Actually, no...
The score in the fl are 12.806475 and 10.386531 respectively, so the results
according to that are sorted correctly.
Is it just a problem with the debugQuery?

On Wed, Aug 27, 2008 at 9:21 AM, Yonik Seeley [EMAIL PROTECTED] wrote:

  On Wed, Aug 27, 2008 at 9:10 AM, Yuri Jan [EMAIL PROTECTED] wrote:
  I have encountered a weird problem in solr.
  In one of my queries (dismax, default sorting) I noticed that the results
  are not sorted by score (according to debugQuery).
 
  The first 150 results are tied (with score 12.806474), and after those,
  there is a bunch of results with higher score (12.962835).
 
  What can be the cause?
  I'm overriding the tf function in my similarity class. Can it be related?

 Do the explain scores in the debug section match the normal scores
 paired with the documents?  (add score to the fl parameter to get a
 score with each document).

 -Yonik



Re: SpellCheckComponent bug?

2008-08-27 Thread Grant Ingersoll
Hmm, sounds like a bug.  A test case would be great, but at a minimum  
file a JIRA.


Do those other terms that collate properly have multiple suggestions?

On Aug 25, 2008, at 6:24 PM, Matthew Runo wrote:


Hello folks!

I seem to be seeing a bug in the SpellCheckComponent..

Search term: Quicksilver... I get two suggestions...

lst name=suggestion
int name=frequency2/int
str name=wordQuicksilver/str
/lst

lst name=suggestion
int name=frequency220/int
str name=wordQuiksilver/str
/lst

...and it's not correctly spelled...

bool name=correctlySpelledfalse/bool

...but the collation is of the first term - not the one with the  
highest frequency?


str name=collationQuicksilver/str

This seems to be anti-what-the-docs-say collation should do. Other,  
more popular terms (shoez, runnning, etc) all seem to collate  
properly. I'm hitting Solr via SolrJ and not really doing anything  
too fancy - using SVN head at the moment. Just wondered if anyone  
had any ideas. There are no synonyms in this system, so I don't  
think that could be it. I've rebuilt the search index.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833






Re: SpellCheckComponent bug?

2008-08-27 Thread Matthew Runo
runnning does have multiple suggestions, Cunning and Running - but it  
properly picks Running. I have not noticed this for any other term,  
but I have not exhaustively tested others yet.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Aug 27, 2008, at 7:52 AM, Grant Ingersoll wrote:

Hmm, sounds like a bug.  A test case would be great, but at a  
minimum file a JIRA.


Do those other terms that collate properly have multiple suggestions?

On Aug 25, 2008, at 6:24 PM, Matthew Runo wrote:


Hello folks!

I seem to be seeing a bug in the SpellCheckComponent..

Search term: Quicksilver... I get two suggestions...

lst name=suggestion
int name=frequency2/int
str name=wordQuicksilver/str
/lst

lst name=suggestion
int name=frequency220/int
str name=wordQuiksilver/str
/lst

...and it's not correctly spelled...

bool name=correctlySpelledfalse/bool

...but the collation is of the first term - not the one with the  
highest frequency?


str name=collationQuicksilver/str

This seems to be anti-what-the-docs-say collation should do. Other,  
more popular terms (shoez, runnning, etc) all seem to collate  
properly. I'm hitting Solr via SolrJ and not really doing anything  
too fancy - using SVN head at the moment. Just wondered if anyone  
had any ideas. There are no synonyms in this system, so I don't  
think that could be it. I've rebuilt the search index.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833








Re: Wrong sort by score

2008-08-27 Thread Yonik Seeley
On Wed, Aug 27, 2008 at 9:38 AM, Yuri Jan [EMAIL PROTECTED] wrote:
 Actually, no...
 The score in the fl are 12.806475 and 10.386531 respectively, so the results
 according to that are sorted correctly.
 Is it just a problem with the debugQuery?

Looks like it... I guess the custom similarity isn't being used when
explain() is called.
Did you register this custom similarity in the schema?
If so, can you file a JIRA bug for this?

-Yonik


 On Wed, Aug 27, 2008 at 9:21 AM, Yonik Seeley [EMAIL PROTECTED] wrote:

  On Wed, Aug 27, 2008 at 9:10 AM, Yuri Jan [EMAIL PROTECTED] wrote:
  I have encountered a weird problem in solr.
  In one of my queries (dismax, default sorting) I noticed that the results
  are not sorted by score (according to debugQuery).
 
  The first 150 results are tied (with score 12.806474), and after those,
  there is a bunch of results with higher score (12.962835).
 
  What can be the cause?
  I'm overriding the tf function in my similarity class. Can it be related?

 Do the explain scores in the debug section match the normal scores
 paired with the documents?  (add score to the fl parameter to get a
 score with each document).

 -Yonik




Re: How does Solr search when a field is not specified?

2008-08-27 Thread Jake Conk
Thanks Otis! :)

On Tue, Aug 26, 2008 at 10:47 PM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
 Jake,

 Yes, that field would have to be some kind of an analyzed field (e.g. text), 
 not string if you wanted that query to match Jake is Testing input.  There 
 are no built-in Lucene or Solr-specific limits on field lengths.  There is 
 one parameter called maxFieldLength in Solr's solrconfig.xml, I think, 
 which tells Lucene how many tokens to consider for indexing.  If you don't 
 want that limit, increase that parameter's value to the max.


 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Jake Conk [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 26, 2008 4:38:09 PM
 Subject: How does Solr search when a field is not specified?

 Hello,

 I was wondering how does Solr search when a field is not specified,
 just a query? Say for example I got the following:

 ?q=Jake AND Test

 I have a mixture of integer, string, and text columns. Some indexed,
 some stored, and some string fields copied to text fields.

 Say I have a string field with the value Jake is Testing which is
 also copied to a text field. If I did not copyField that string field
 to a text field then would the above query not return any results if
 the word Jake and Test are not found anywhere else since we cannot
 do fulltext searches on string fields?

 Lastly, is there a limit how many characters can be in a string and text 
 field?

 Thanks,
 - Jake




Distributed Search Test

2008-08-27 Thread Ronald Aubin
Hello,
I have been performing some simple distributed search tests and don't
understand why distributed search seems to work in some circumstances but
not others.

In my setup I have compiled the example server using the solr trunk
downloaded on Aug 22nd.  I am running a sample server instance on 2 separate
hosts (localhost and fred).  I've added a portion of  the sample docs
[a-n]*.xml to the local host solr server, and added the other portion,
[m-z]*.xml sample docs to host fred.

Assuming that I have setup things correctly, I would expect to receive a see
non zero length SolrDocumentList for any distributed search that matches
syntax in the example docs.

Specifically when I test the contents of each server separately ( using the
included TestCase ) the tests pass. This confirms that each server has
different documents.  However when I do the distributed tests, it seems the
tests pass or fail based on the initial URL passed in the
createNewSolrServer(String URL).  I realize a real junit should be self
contained, unlike this one.

junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
fails. Why?

My understanding is that each host should send a query to all shards and
collate the responses, and return them to the client. Is this true?

Ron


Here is my TestCase;

package org.apache.solr.client.solrj.ron;

import junit.framework.TestCase;

import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.client.solrj.response.SolrPingResponse;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.common.params.ShardParams;

public class SolrExampleDistributedTest extends TestCase {

int port = 8983;
static final String context = /solr;

static String SOLR_SHARD1 = localhost:8983/solr;
static String SOLR_SHARD2 = fred:8983/solr;
static String SOLR_SHARDS = SOLR_SHARD1 + , + SOLR_SHARD2;
static String HTTP_PREFIX = http://;;
static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
static String QUERY1 = Samsung;
static String QUERY2 = solr;

@Override
public void setUp() throws Exception {
super.setUp();

}

public SolrExampleDistributedTest(String name) {
super(name);
}

@Override
public void tearDown() throws Exception {
super.tearDown();
}

protected SolrServer createNewSolrServer(String url) {
try {

CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
s.setConnectionTimeout(100); // 1/10th sec
s.setDefaultMaxConnectionsPerHost(100);
s.setMaxTotalConnections(100);
return s;
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}

public void testLocalhost() {
try {
SolrServer server = createNewSolrServer(SOLR_URL1);

SolrQuery query = new SolrQuery();
query.setQuery(QUERY1);
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
assertTrue(sdl.getNumFound()  0);

query = new SolrQuery();
query.setQuery(QUERY2);
qr = server.query(query);
sdl = qr.getResults();
assertTrue(sdl.getNumFound() == 0);

} catch (Exception ex) {
ex.printStackTrace();
fail();
}
}

public void testRemoteHost() {
try {
SolrServer server = createNewSolrServer(SOLR_URL2);

SolrQuery query = new SolrQuery();
query.setQuery(QUERY1);
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
assertTrue(sdl.getNumFound() == 0);

query = new SolrQuery();
query.setQuery(QUERY2);
qr = server.query(query);
sdl = qr.getResults();
assertTrue(sdl.getNumFound()  0);
} catch (Exception ex) {
// expected
ex.printStackTrace();
fail();
}
}

public void testDistrbutedSearch() {
try {
SolrServer server = createNewSolrServer(SOLR_URL1);

SolrQuery query = new SolrQuery();
query.setQuery(QUERY1);

query.setParam(ShardParams.SHARDS, SOLR_SHARDS);
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
assertTrue(sdl.getNumFound()  0);

SolrQuery query2 = new SolrQuery();
query2.setQuery(QUERY2);
query2.setParam(ShardParams.SHARDS, SOLR_SHARDS);
QueryResponse qr2 = server.query(query);
SolrDocumentList sdl2 = qr2.getResults();
assertTrue(sdl.getNumFound()  0);

} 

Re: Sorting and also looking at stored fields

2008-08-27 Thread jennyv
Aha! Yep, that's the problem (not set to store in schema.xml)! Thanks!


Re: Distributed Search Test

2008-08-27 Thread Yonik Seeley
It fails because you are using localhost as part of a shard name.
When you send the request to fred it uses the fred shard and the
localhost shard (which is the same as fred!)

-Yonik

On Wed, Aug 27, 2008 at 12:07 PM, Ronald Aubin [EMAIL PROTECTED] wrote:
 Hello,
I have been performing some simple distributed search tests and don't
 understand why distributed search seems to work in some circumstances but
 not others.

 In my setup I have compiled the example server using the solr trunk
 downloaded on Aug 22nd.  I am running a sample server instance on 2 separate
 hosts (localhost and fred).  I've added a portion of  the sample docs
 [a-n]*.xml to the local host solr server, and added the other portion,
 [m-z]*.xml sample docs to host fred.

 Assuming that I have setup things correctly, I would expect to receive a see
 non zero length SolrDocumentList for any distributed search that matches
 syntax in the example docs.

 Specifically when I test the contents of each server separately ( using the
 included TestCase ) the tests pass. This confirms that each server has
 different documents.  However when I do the distributed tests, it seems the
 tests pass or fail based on the initial URL passed in the
 createNewSolrServer(String URL).  I realize a real junit should be self
 contained, unlike this one.

 junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
 fails. Why?

 My understanding is that each host should send a query to all shards and
 collate the responses, and return them to the client. Is this true?

 Ron


 Here is my TestCase;

 package org.apache.solr.client.solrj.ron;

 import junit.framework.TestCase;

 import org.apache.solr.client.solrj.SolrQuery;
 import org.apache.solr.client.solrj.SolrServer;
 import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
 import org.apache.solr.client.solrj.response.QueryResponse;
 import org.apache.solr.client.solrj.response.SolrPingResponse;
 import org.apache.solr.common.SolrDocumentList;
 import org.apache.solr.common.params.ShardParams;

 public class SolrExampleDistributedTest extends TestCase {

int port = 8983;
static final String context = /solr;

static String SOLR_SHARD1 = localhost:8983/solr;
static String SOLR_SHARD2 = fred:8983/solr;
static String SOLR_SHARDS = SOLR_SHARD1 + , + SOLR_SHARD2;
static String HTTP_PREFIX = http://;;
static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
static String QUERY1 = Samsung;
static String QUERY2 = solr;

@Override
public void setUp() throws Exception {
super.setUp();

}

public SolrExampleDistributedTest(String name) {
super(name);
}

@Override
public void tearDown() throws Exception {
super.tearDown();
}

protected SolrServer createNewSolrServer(String url) {
try {

CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
s.setConnectionTimeout(100); // 1/10th sec
s.setDefaultMaxConnectionsPerHost(100);
s.setMaxTotalConnections(100);
return s;
} catch (Exception ex) {
throw new RuntimeException(ex);
}
}

public void testLocalhost() {
try {
SolrServer server = createNewSolrServer(SOLR_URL1);

SolrQuery query = new SolrQuery();
query.setQuery(QUERY1);
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
assertTrue(sdl.getNumFound()  0);

query = new SolrQuery();
query.setQuery(QUERY2);
qr = server.query(query);
sdl = qr.getResults();
assertTrue(sdl.getNumFound() == 0);

} catch (Exception ex) {
ex.printStackTrace();
fail();
}
}

public void testRemoteHost() {
try {
SolrServer server = createNewSolrServer(SOLR_URL2);

SolrQuery query = new SolrQuery();
query.setQuery(QUERY1);
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
assertTrue(sdl.getNumFound() == 0);

query = new SolrQuery();
query.setQuery(QUERY2);
qr = server.query(query);
sdl = qr.getResults();
assertTrue(sdl.getNumFound()  0);
} catch (Exception ex) {
// expected
ex.printStackTrace();
fail();
}
}

public void testDistrbutedSearch() {
try {
SolrServer server = createNewSolrServer(SOLR_URL1);

SolrQuery query = new SolrQuery();
query.setQuery(QUERY1);

query.setParam(ShardParams.SHARDS, SOLR_SHARDS);
QueryResponse qr = server.query(query);
SolrDocumentList sdl = qr.getResults();
assertTrue(sdl.getNumFound()  0);

SolrQuery 

Re: Distributed Search Test

2008-08-27 Thread Ronald Aubin
Yonik,
  Thanks for your reply.  I'm not sure if I understand completely.   Do you
mean that each solr server should be given a different shard list and not a
list containing all shards?

So in my case:
1) host fred should be given a shard list containing only locahost,
2)  localhost should be given a shard list of fred

I'll give it a try.

Thanks again

Ron

On Wed, Aug 27, 2008 at 12:21 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 It fails because you are using localhost as part of a shard name.
 When you send the request to fred it uses the fred shard and the
 localhost shard (which is the same as fred!)

 -Yonik

 On Wed, Aug 27, 2008 at 12:07 PM, Ronald Aubin [EMAIL PROTECTED]
 wrote:
  Hello,
 I have been performing some simple distributed search tests and don't
  understand why distributed search seems to work in some circumstances but
  not others.
 
  In my setup I have compiled the example server using the solr trunk
  downloaded on Aug 22nd.  I am running a sample server instance on 2
 separate
  hosts (localhost and fred).  I've added a portion of  the sample docs
  [a-n]*.xml to the local host solr server, and added the other portion,
  [m-z]*.xml sample docs to host fred.
 
  Assuming that I have setup things correctly, I would expect to receive a
 see
  non zero length SolrDocumentList for any distributed search that matches
  syntax in the example docs.
 
  Specifically when I test the contents of each server separately ( using
 the
  included TestCase ) the tests pass. This confirms that each server has
  different documents.  However when I do the distributed tests, it seems
 the
  tests pass or fail based on the initial URL passed in the
  createNewSolrServer(String URL).  I realize a real junit should be self
  contained, unlike this one.
 
  junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
  fails. Why?
 
  My understanding is that each host should send a query to all shards and
  collate the responses, and return them to the client. Is this true?
 
  Ron
 
 
  Here is my TestCase;
 
  package org.apache.solr.client.solrj.ron;
 
  import junit.framework.TestCase;
 
  import org.apache.solr.client.solrj.SolrQuery;
  import org.apache.solr.client.solrj.SolrServer;
  import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
  import org.apache.solr.client.solrj.response.QueryResponse;
  import org.apache.solr.client.solrj.response.SolrPingResponse;
  import org.apache.solr.common.SolrDocumentList;
  import org.apache.solr.common.params.ShardParams;
 
  public class SolrExampleDistributedTest extends TestCase {
 
 int port = 8983;
 static final String context = /solr;
 
 static String SOLR_SHARD1 = localhost:8983/solr;
 static String SOLR_SHARD2 = fred:8983/solr;
 static String SOLR_SHARDS = SOLR_SHARD1 + , + SOLR_SHARD2;
 static String HTTP_PREFIX = http://;;
 static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
 static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
 static String QUERY1 = Samsung;
 static String QUERY2 = solr;
 
 @Override
 public void setUp() throws Exception {
 super.setUp();
 
 }
 
 public SolrExampleDistributedTest(String name) {
 super(name);
 }
 
 @Override
 public void tearDown() throws Exception {
 super.tearDown();
 }
 
 protected SolrServer createNewSolrServer(String url) {
 try {
 
 CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
 s.setConnectionTimeout(100); // 1/10th sec
 s.setDefaultMaxConnectionsPerHost(100);
 s.setMaxTotalConnections(100);
 return s;
 } catch (Exception ex) {
 throw new RuntimeException(ex);
 }
 }
 
 public void testLocalhost() {
 try {
 SolrServer server = createNewSolrServer(SOLR_URL1);
 
 SolrQuery query = new SolrQuery();
 query.setQuery(QUERY1);
 QueryResponse qr = server.query(query);
 SolrDocumentList sdl = qr.getResults();
 assertTrue(sdl.getNumFound()  0);
 
 query = new SolrQuery();
 query.setQuery(QUERY2);
 qr = server.query(query);
 sdl = qr.getResults();
 assertTrue(sdl.getNumFound() == 0);
 
 } catch (Exception ex) {
 ex.printStackTrace();
 fail();
 }
 }
 
 public void testRemoteHost() {
 try {
 SolrServer server = createNewSolrServer(SOLR_URL2);
 
 SolrQuery query = new SolrQuery();
 query.setQuery(QUERY1);
 QueryResponse qr = server.query(query);
 SolrDocumentList sdl = qr.getResults();
 assertTrue(sdl.getNumFound() == 0);
 
 query = new SolrQuery();
 query.setQuery(QUERY2);
 qr = server.query(query);
 sdl = qr.getResults();
 assertTrue(sdl.getNumFound()  

java.io.FileNotFoundException: no segments* file found

2008-08-27 Thread Jeremy Hinegardner
Hi all,

I've had a multicore system running for while now, and I just cycled the
jetty server and all of a sudden I got this error:

SEVERE: java.lang.RuntimeException: java.io.FileNotFoundException: no 
segments* file found in 
org.apache.lucene.store.FSDirectory@/opt/cisearch/ci-content-search/solr/cores/0601_0/data/index:
 files:
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:899)
at org.apache.solr.core.SolrCore.init(SolrCore.java:450)
at org.apache.solr.core.MultiCore.create(MultiCore.java:255)
at org.apache.solr.core.MultiCore.load(MultiCore.java:139)
at 
org.apache.solr.servlet.SolrDispatchFilter.initMultiCore(SolrDispatchFilter.java:147)
at 
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:72)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at 
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at 
org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at 
org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at 
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at 
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.io.FileNotFoundException: no segments* file found in 
org.apache.lucene.store.FSDirectory@/opt/cisearch/ci-content-search/solr/cores/0601_0/data/index:
 files:
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:600)
at 
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:81)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
at 
org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:94)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:890)
... 29 more  

Of course, the odd thing is that the segments* file does exist:

  % ls -1 /opt/cisearch/ci-content-search/solr/cores/0601_0/data/index/segments*
  /opt/cisearch/ci-content-search/solr/cores/0601_0/data/index/segments_32i
  /opt/cisearch/ci-content-search/solr/cores/0601_0/data/index/segments.gen

Any ideas on what could cause this?  The only thing I can think of off the top
of my head is that the core was coming up at the moment between the
snapinstaller steps of:

  1) /bin/rm -rf ${data_dir}/${index} 
  2) mv -f ${data_dir}/${index}.tmp$$ ${data_dir}/${index}

Any other thoughts / conjectures ?

enjoy,

-jeremy

-- 

 Jeremy Hinegardner  [EMAIL PROTECTED] 



Re: Wrong sort by score

2008-08-27 Thread Yuri Jan
It seems like the debug information is using the custom similarity as it
should - the bug isn't there.
I see in the explain information the right tf value (I modified it to be 1
in my custom similarity).
The numbers in the explain seem to add up and make sense.
Is it possible that the score itself is wrong (the one that I get from fl)?

Thanks,
Yuri

On Wed, Aug 27, 2008 at 11:44 AM, Yonik Seeley [EMAIL PROTECTED] wrote:

 On Wed, Aug 27, 2008 at 9:38 AM, Yuri Jan [EMAIL PROTECTED] wrote:
  Actually, no...
  The score in the fl are 12.806475 and 10.386531 respectively, so the
 results
  according to that are sorted correctly.
  Is it just a problem with the debugQuery?

 Looks like it... I guess the custom similarity isn't being used when
 explain() is called.
 Did you register this custom similarity in the schema?
 If so, can you file a JIRA bug for this?

 -Yonik


  On Wed, Aug 27, 2008 at 9:21 AM, Yonik Seeley [EMAIL PROTECTED] wrote:
 
   On Wed, Aug 27, 2008 at 9:10 AM, Yuri Jan [EMAIL PROTECTED] wrote:
   I have encountered a weird problem in solr.
   In one of my queries (dismax, default sorting) I noticed that the
 results
   are not sorted by score (according to debugQuery).
  
   The first 150 results are tied (with score 12.806474), and after
 those,
   there is a bunch of results with higher score (12.962835).
  
   What can be the cause?
   I'm overriding the tf function in my similarity class. Can it be
 related?
 
  Do the explain scores in the debug section match the normal scores
  paired with the documents?  (add score to the fl parameter to get a
  score with each document).
 
  -Yonik
 
 



Re: Distributed Search Test

2008-08-27 Thread Ronald Aubin
Yonik,
I now  perfectly understand. Thanks for your help. All my tests now
work.

Ron

On Wed, Aug 27, 2008 at 12:21 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 It fails because you are using localhost as part of a shard name.
 When you send the request to fred it uses the fred shard and the
 localhost shard (which is the same as fred!)

 -Yonik

 On Wed, Aug 27, 2008 at 12:07 PM, Ronald Aubin [EMAIL PROTECTED]
 wrote:
  Hello,
 I have been performing some simple distributed search tests and don't
  understand why distributed search seems to work in some circumstances but
  not others.
 
  In my setup I have compiled the example server using the solr trunk
  downloaded on Aug 22nd.  I am running a sample server instance on 2
 separate
  hosts (localhost and fred).  I've added a portion of  the sample docs
  [a-n]*.xml to the local host solr server, and added the other portion,
  [m-z]*.xml sample docs to host fred.
 
  Assuming that I have setup things correctly, I would expect to receive a
 see
  non zero length SolrDocumentList for any distributed search that matches
  syntax in the example docs.
 
  Specifically when I test the contents of each server separately ( using
 the
  included TestCase ) the tests pass. This confirms that each server has
  different documents.  However when I do the distributed tests, it seems
 the
  tests pass or fail based on the initial URL passed in the
  createNewSolrServer(String URL).  I realize a real junit should be self
  contained, unlike this one.
 
  junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
  fails. Why?
 
  My understanding is that each host should send a query to all shards and
  collate the responses, and return them to the client. Is this true?
 
  Ron
 
 
  Here is my TestCase;
 
  package org.apache.solr.client.solrj.ron;
 
  import junit.framework.TestCase;
 
  import org.apache.solr.client.solrj.SolrQuery;
  import org.apache.solr.client.solrj.SolrServer;
  import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
  import org.apache.solr.client.solrj.response.QueryResponse;
  import org.apache.solr.client.solrj.response.SolrPingResponse;
  import org.apache.solr.common.SolrDocumentList;
  import org.apache.solr.common.params.ShardParams;
 
  public class SolrExampleDistributedTest extends TestCase {
 
 int port = 8983;
 static final String context = /solr;
 
 static String SOLR_SHARD1 = localhost:8983/solr;
 static String SOLR_SHARD2 = fred:8983/solr;
 static String SOLR_SHARDS = SOLR_SHARD1 + , + SOLR_SHARD2;
 static String HTTP_PREFIX = http://;;
 static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
 static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
 static String QUERY1 = Samsung;
 static String QUERY2 = solr;
 
 @Override
 public void setUp() throws Exception {
 super.setUp();
 
 }
 
 public SolrExampleDistributedTest(String name) {
 super(name);
 }
 
 @Override
 public void tearDown() throws Exception {
 super.tearDown();
 }
 
 protected SolrServer createNewSolrServer(String url) {
 try {
 
 CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
 s.setConnectionTimeout(100); // 1/10th sec
 s.setDefaultMaxConnectionsPerHost(100);
 s.setMaxTotalConnections(100);
 return s;
 } catch (Exception ex) {
 throw new RuntimeException(ex);
 }
 }
 
 public void testLocalhost() {
 try {
 SolrServer server = createNewSolrServer(SOLR_URL1);
 
 SolrQuery query = new SolrQuery();
 query.setQuery(QUERY1);
 QueryResponse qr = server.query(query);
 SolrDocumentList sdl = qr.getResults();
 assertTrue(sdl.getNumFound()  0);
 
 query = new SolrQuery();
 query.setQuery(QUERY2);
 qr = server.query(query);
 sdl = qr.getResults();
 assertTrue(sdl.getNumFound() == 0);
 
 } catch (Exception ex) {
 ex.printStackTrace();
 fail();
 }
 }
 
 public void testRemoteHost() {
 try {
 SolrServer server = createNewSolrServer(SOLR_URL2);
 
 SolrQuery query = new SolrQuery();
 query.setQuery(QUERY1);
 QueryResponse qr = server.query(query);
 SolrDocumentList sdl = qr.getResults();
 assertTrue(sdl.getNumFound() == 0);
 
 query = new SolrQuery();
 query.setQuery(QUERY2);
 qr = server.query(query);
 sdl = qr.getResults();
 assertTrue(sdl.getNumFound()  0);
 } catch (Exception ex) {
 // expected
 ex.printStackTrace();
 fail();
 }
 }
 
 public void testDistrbutedSearch() {
 try {
 SolrServer server = createNewSolrServer(SOLR_URL1);
 

Re: Distributed Search Test

2008-08-27 Thread Yonik Seeley
On Wed, Aug 27, 2008 at 12:33 PM, Ronald Aubin [EMAIL PROTECTED] wrote:
  Thanks for your reply.  I'm not sure if I understand completely.   Do you
 mean that each solr server should be given a different shard list and not a
 list containing all shards?

You could use the same shard list (as long as it doesn't contain
localhost), or you could use different ones (as long as localhost was
correctly substituted for the host you are talking to).  I'd recommend
avoiding localhost in the shard list unless all of your shards
happen to be on the local host.

Example: If you have hosta, hostb, then
querying hosta with shards=hosta,hostb or shards=localhost,hostb will
work (they are equivalent)
querying hostb with shards=hosta,hostb or shards=hosta,localhost will
work (they are equivalent)
BUT
querying hostb with shards=localhost,hostb is equivalent to shards=hostb,hostb

-Yonik


 So in my case:
 1) host fred should be given a shard list containing only locahost,
 2)  localhost should be given a shard list of fred

 I'll give it a try.

 Thanks again

 Ron

 On Wed, Aug 27, 2008 at 12:21 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

 It fails because you are using localhost as part of a shard name.
 When you send the request to fred it uses the fred shard and the
 localhost shard (which is the same as fred!)

 -Yonik

 On Wed, Aug 27, 2008 at 12:07 PM, Ronald Aubin [EMAIL PROTECTED]
 wrote:
  Hello,
 I have been performing some simple distributed search tests and don't
  understand why distributed search seems to work in some circumstances but
  not others.
 
  In my setup I have compiled the example server using the solr trunk
  downloaded on Aug 22nd.  I am running a sample server instance on 2
 separate
  hosts (localhost and fred).  I've added a portion of  the sample docs
  [a-n]*.xml to the local host solr server, and added the other portion,
  [m-z]*.xml sample docs to host fred.
 
  Assuming that I have setup things correctly, I would expect to receive a
 see
  non zero length SolrDocumentList for any distributed search that matches
  syntax in the example docs.
 
  Specifically when I test the contents of each server separately ( using
 the
  included TestCase ) the tests pass. This confirms that each server has
  different documents.  However when I do the distributed tests, it seems
 the
  tests pass or fail based on the initial URL passed in the
  createNewSolrServer(String URL).  I realize a real junit should be self
  contained, unlike this one.
 
  junit test  testDistrbutedSearch() passes, while testDistrbutedSearch2()
  fails. Why?
 
  My understanding is that each host should send a query to all shards and
  collate the responses, and return them to the client. Is this true?
 
  Ron
 
 
  Here is my TestCase;
 
  package org.apache.solr.client.solrj.ron;
 
  import junit.framework.TestCase;
 
  import org.apache.solr.client.solrj.SolrQuery;
  import org.apache.solr.client.solrj.SolrServer;
  import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
  import org.apache.solr.client.solrj.response.QueryResponse;
  import org.apache.solr.client.solrj.response.SolrPingResponse;
  import org.apache.solr.common.SolrDocumentList;
  import org.apache.solr.common.params.ShardParams;
 
  public class SolrExampleDistributedTest extends TestCase {
 
 int port = 8983;
 static final String context = /solr;
 
 static String SOLR_SHARD1 = localhost:8983/solr;
 static String SOLR_SHARD2 = fred:8983/solr;
 static String SOLR_SHARDS = SOLR_SHARD1 + , + SOLR_SHARD2;
 static String HTTP_PREFIX = http://;;
 static String SOLR_URL1 = HTTP_PREFIX + SOLR_SHARD1;
 static String SOLR_URL2 = HTTP_PREFIX + SOLR_SHARD2;
 static String QUERY1 = Samsung;
 static String QUERY2 = solr;
 
 @Override
 public void setUp() throws Exception {
 super.setUp();
 
 }
 
 public SolrExampleDistributedTest(String name) {
 super(name);
 }
 
 @Override
 public void tearDown() throws Exception {
 super.tearDown();
 }
 
 protected SolrServer createNewSolrServer(String url) {
 try {
 
 CommonsHttpSolrServer s = new CommonsHttpSolrServer(url);
 s.setConnectionTimeout(100); // 1/10th sec
 s.setDefaultMaxConnectionsPerHost(100);
 s.setMaxTotalConnections(100);
 return s;
 } catch (Exception ex) {
 throw new RuntimeException(ex);
 }
 }
 
 public void testLocalhost() {
 try {
 SolrServer server = createNewSolrServer(SOLR_URL1);
 
 SolrQuery query = new SolrQuery();
 query.setQuery(QUERY1);
 QueryResponse qr = server.query(query);
 SolrDocumentList sdl = qr.getResults();
 assertTrue(sdl.getNumFound()  0);
 
 query = new SolrQuery();
 query.setQuery(QUERY2);
 qr = server.query(query);
 sdl = qr.getResults();
 

Replacing FAST functionality at sesam.no

2008-08-27 Thread Glenn-Erik Sandbakken
At sesam.no we want to replace a FAST (fast.no) Query Matching Server
with a Solr index.

The index we are trying to replace is not a regular index, but specially
configured to perform phrases (and sub-phrases) matches against several
large lists (like an index with only a 'title' field).

I'm not sure of a correct, or logical, name for the behavior we are
after, but it is like a combination between Shingles and exact matching.

Some examples should explain it well.

Lets say we have the following list:
 one two three
 one two
 two three
 one
 two
 three
 three two
 two one
 one three
 three one

For the query one two three, we need hits against, and only against:
 one two three
 one two
 two three
 one
 two
 three

For the query one two, we need hits against, and only against:
 one two
 one
 two

For the query one three four (or four one three), we need hits
against, and only against:
 one three
 one
 three

For the query one two sesam three, we need hits against, and only
against:
 one two
 one
 two
 three

We have been testing out solr with the ShingleFilter for this, but
without luck.
I am unsure whether the reason is misconfiguration in schema.xml or that
the ShingleFilter actually don't support this type of behavior.
Attached our current schema.xml
(it is different from when I made this post to the solr-dev mailinglist,
the shingle fieldType is of class solr.StrField)
Attached is screenshots of the solr/admin/analysis.jsp against this
configuration.

I'd like to know if the SchingleFilter is at all able to do what we
want.
 If it is: How can I configure schema.xml?
 If not: does there exist any other solutions that we can incorporate
into solr which will give us this behavior?

If there is no existing solution to this, we will probably end up
writing our own methods for it, extending the ShingleFilter, gladly
contributing to the solr project =)

Thanks for a great product,
Glenn-Erik



schema.xml
Description: XML document


odd 500 error

2008-08-27 Thread Andrew Nagy
Hello - I stumbled across an odd error which my intuition is telling me is a 
bug.

Here is my installation:
Solr Specification Version: 1.2.2008.08.13.13.05.16
Lucene Implementation Version: 2.4-dev 685576 - 2008-08-13 10:55:25

I did the following query today:
author:(r*a* AND fisher)

And get the following 500 error:

maxClauseCount is set to 1024

org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 
1024
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:165)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:156)
at 
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:63)
at org.apache.lucene.search.WildcardQuery.rewrite(WildcardQuery.java:54)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
at 
org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:163)
at org.apache.lucene.search.Query.weight(Query.java:94)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:175)
at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.lucene.search.Searcher.search(Searcher.java:105)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:167)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1156)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:829)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)


Thanks
Andrew


Re: odd 500 error

2008-08-27 Thread Yonik Seeley
On Wed, Aug 27, 2008 at 2:21 PM, Andrew Nagy [EMAIL PROTECTED] wrote:
 Hello - I stumbled across an odd error which my intuition is telling me is a 
 bug.

Unfortunately, wildcard queries can expand to an undefined number of terms.
This was the reason ConstantScorePrefixQuery and
ConstantScoreRangeQuery were introduced, but I never got around to
ConstantScoreWildcardQuery.  So this is a known limitation.

-Yonik


 Here is my installation:
 Solr Specification Version: 1.2.2008.08.13.13.05.16
 Lucene Implementation Version: 2.4-dev 685576 - 2008-08-13 10:55:25

 I did the following query today:
 author:(r*a* AND fisher)

 And get the following 500 error:

 maxClauseCount is set to 1024

 org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set 
 to 1024
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:165)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:156)
at 
 org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:63)
at 
 org.apache.lucene.search.WildcardQuery.rewrite(WildcardQuery.java:54)
at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
at 
 org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:163)
at org.apache.lucene.search.Query.weight(Query.java:94)
at org.apache.lucene.search.Searcher.createWeight(Searcher.java:175)
at org.apache.lucene.search.Searcher.search(Searcher.java:126)
at org.apache.lucene.search.Searcher.search(Searcher.java:105)
at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:966)
at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)
at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)
at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)
at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:167)
at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1156)
at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341)
at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1088)
at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:360)
at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:729)
at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:206)
at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:505)
at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:829)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:211)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:380)
at 
 org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:395)
at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:488)


 Thanks
 Andrew



Re: Replacing FAST functionality at sesam.no

2008-08-27 Thread Otis Gospodnetic
The screenshot didn't make it (some attachments gets stripped)


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
 From: Glenn-Erik Sandbakken [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, August 27, 2008 1:44:53 PM
 Subject: Replacing FAST functionality at sesam.no
 
 At sesam.no we want to replace a FAST (fast.no) Query Matching Server
 with a Solr index.
 
 The index we are trying to replace is not a regular index, but specially
 configured to perform phrases (and sub-phrases) matches against several
 large lists (like an index with only a 'title' field).
 
 I'm not sure of a correct, or logical, name for the behavior we are
 after, but it is like a combination between Shingles and exact matching.
 
 Some examples should explain it well.
 
 Lets say we have the following list:
  one two three
  one two
  two three
  one
  two
  three
  three two
  two one
  one three
  three one
 
 For the query one two three, we need hits against, and only against:
  one two three
  one two
  two three
  one
  two
  three
 
 For the query one two, we need hits against, and only against:
  one two
  one
  two
 
 For the query one three four (or four one three), we need hits
 against, and only against:
  one three
  one
  three
 
 For the query one two sesam three, we need hits against, and only
 against:
  one two
  one
  two
  three
 
 We have been testing out solr with the ShingleFilter for this, but
 without luck.
 I am unsure whether the reason is misconfiguration in schema.xml or that
 the ShingleFilter actually don't support this type of behavior.
 Attached our current schema.xml
 (it is different from when I made this post to the solr-dev mailinglist,
 the shingle fieldType is of class solr.StrField)
 Attached is screenshots of the solr/admin/analysis.jsp against this
 configuration.
 
 I'd like to know if the SchingleFilter is at all able to do what we
 want.
 If it is: How can I configure schema.xml?
 If not: does there exist any other solutions that we can incorporate
 into solr which will give us this behavior?
 
 If there is no existing solution to this, we will probably end up
 writing our own methods for it, extending the ShingleFilter, gladly
 contributing to the solr project =)
 
 Thanks for a great product,
 Glenn-Erik



Beginners question: adding a plugin

2008-08-27 Thread Jaco
Hello,

I'm pretty new to Solr, and not a Java expert, and trying to create my own
plug in according to the instructions given in
http://wiki.apache.org/solr/SolrPlugins. I want to integrate an external
stemmer for the Dutch language by creating a new FilterFactory that will
invoke the external stemmer for a TokenStream.

First thing I want to do is just make sure I can get the plug in running.
Here's what I did:
- Take a copy of DutchStemFilterFactory.java, rename it to
TestStemFilterFactory, renamed the class to TestStemFilterFactory
- Successfully compiled the java using javac, and add the .class file to a
jar file
- Put the jar file in SOLR_HOME/lib
- Put a line filter class=solr.TestStemFilterFactory / in my analyzer
definition in schema.xml
- Restart tomcat

In the Tomcat log, there is an indication that the file is found:

27-Aug-2008 20:58:25 org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/D:/Programs/Solr/lib/Test.jar' to Solr classloader

But then I get errors being reported by Tomcat further down the log file:

SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.TestStemFilterFactory'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:256)
at
org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:261)
at
org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)

Caused by: java.lang.ClassNotFoundException: solr.TestStemFilterFactory
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
.

Probably some configuration issue somewhere, but I am in the dark here (as
said: not a Java expert...). I've tried to find information in mailing list
archives on this, but no luck so far. I'm Running Solr nightly build of
20.08.2008, tomcat 5.5.26 on Windows.

Any help would be much appreciated!

Cheers,

Jaco.


Re: Replacing FAST functionality at sesam.no

2008-08-27 Thread Svein Parnas


On 27. aug.. 2008, at 19.44, Glenn-Erik Sandbakken wrote:


At sesam.no we want to replace a FAST (fast.no) Query Matching Server
with a Solr index.

The index we are trying to replace is not a regular index, but  
specially
configured to perform phrases (and sub-phrases) matches against  
several

large lists (like an index with only a 'title' field).

I'm not sure of a correct, or logical, name for the behavior we are
after, but it is like a combination between Shingles and exact  
matching.


Some examples should explain it well.


In order to do this, you can´t use the ShingleFilter during indexing  
since a document like one two three and a query like one two four  
will match since they have the shingle one two in common.


You will get what you want, I think, if you don´t tokenize during  
indexing (some normalization will be required if your lists aren't  
normalized to begin with) and apply the ShingleFilter only to the  
queries.


Svein



Question about autocomplete feature

2008-08-27 Thread Aleksey Gogolev

Hello.

I'm trying to implement autocomplete feature using the snippet posted
by Dan.
(http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200807.mbox/[EMAIL 
PROTECTED])

Here is the snippet:

fieldType name=autocomplete class=solr.TextField
analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.EdgeNGramFilterFactory
maxGramSize=100 minGramSize=1 /
/analyzer
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z0-9]) replacement= replace=all /
filter class=solr.PatternReplaceFilterFactory
pattern=^(.{20})(.*)? replacement=$1 replace=all /
/analyzer
/fieldType
...
field name=ac type=autocomplete indexed=true stored=true
required=false /

First I decided to make it working for solr example. So I pasted the
snippet to schema.xml. Then I edited exampledocs/hd.xml, I added the
ac field to each doc. Value of ac field is a copy of name filed:

add
doc
  field name=idSP2514N/field
  field name=nameSamsung SpinPoint P12 SP2514N - hard drive - 250 GB - 
ATA-133/field
  field name=acSamsung SpinPoint P12 SP2514N - hard drive - 250 GB - 
ATA-133/field
  field name=manuSamsung Electronics Co. Ltd./field
  field name=catelectronics/field
  field name=cathard drive/field
  field name=features7200RPM, 8MB cache, IDE Ultra ATA-133/field
  field name=featuresNoiseGuard, SilentSeek technology, Fluid Dynamic 
Bearing (FDB) motor/field
  field name=price92/field
  field name=popularity6/field
  field name=inStocktrue/field
/doc

doc
  field name=id6H500F0/field
  field name=nameMaxtor DiamondMax 11 - hard drive - 500 GB - 
SATA-300/field
  field name=acMaxtor DiamondMax 11 - hard drive - 500 GB - SATA-300/field
  field name=manuMaxtor Corp./field
  field name=catelectronics/field
  field name=cathard drive/field
  field name=featuresSATA 3.0Gb/s, NCQ/field
  field name=features8.5ms seek/field
  field name=features16MB cache/field
  field name=price350/field
  field name=popularity6/field
  field name=inStocktrue/field
/doc
/add

Then I clean solr index, posted hd.xml and restarted solr server. But
when I'm trying to search for samsu (the part of word samsung) I
still get no result. Seems like solr treats ac field like the
regular field. 

What did I do wrong?

Thanks in advance.

--
Aleksey Gogolev
developer, 
dev.co.ua
Aleksey



copyField: String vs Text Field

2008-08-27 Thread Jake Conk
Hello,

I was wondering if there was an added advantage in using copyField /
to copy a string field to a text field?

If the field is copied to a text field then why not just make the
field a text field and eliminate copying its data?

If you are going to use full text searching on that field which you
cant do with string fields wouldn't it just make sense to keep it a
text field since it has the same abilities as a string field and more?

... Or is the reason because string fields have better performance on
matching exact strings than text fields?

Thanks,

- Jake


Re: Beginners question: adding a plugin

2008-08-27 Thread Grant Ingersoll
Instead of solr.TestStemFilterFactory, put the fully qualified  
classname for the TestStemFilterFactory, i.e.  
com.my.great.stemmer.TestStemFilterFactory.  The solr.FactoryName  
notation is just shorthand for org.apache.solr.BlahBlahBlah


-Grant

On Aug 27, 2008, at 3:27 PM, Jaco wrote:


Hello,

I'm pretty new to Solr, and not a Java expert, and trying to create  
my own

plug in according to the instructions given in
http://wiki.apache.org/solr/SolrPlugins. I want to integrate an  
external
stemmer for the Dutch language by creating a new FilterFactory that  
will

invoke the external stemmer for a TokenStream.

First thing I want to do is just make sure I can get the plug in  
running.

Here's what I did:
- Take a copy of DutchStemFilterFactory.java, rename it to
TestStemFilterFactory, renamed the class to TestStemFilterFactory
- Successfully compiled the java using javac, and add the .class  
file to a

jar file
- Put the jar file in SOLR_HOME/lib
- Put a line filter class=solr.TestStemFilterFactory / in my  
analyzer

definition in schema.xml
- Restart tomcat

In the Tomcat log, there is an indication that the file is found:

27-Aug-2008 20:58:25 org.apache.solr.core.SolrResourceLoader
createClassLoader
INFO: Adding 'file:/D:/Programs/Solr/lib/Test.jar' to Solr classloader

But then I get errors being reported by Tomcat further down the log  
file:


SEVERE: org.apache.solr.common.SolrException: Error loading class
'solr.TestStemFilterFactory'
   at
org 
.apache 
.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:256)

   at
org 
.apache 
.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:261)

   at
org 
.apache 
.solr 
.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:83)

   at
org 
.apache 
.solr 
.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:140)


Caused by: java.lang.ClassNotFoundException:  
solr.TestStemFilterFactory

   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
.

Probably some configuration issue somewhere, but I am in the dark  
here (as
said: not a Java expert...). I've tried to find information in  
mailing list
archives on this, but no luck so far. I'm Running Solr nightly build  
of

20.08.2008, tomcat 5.5.26 on Windows.

Any help would be much appreciated!

Cheers,

Jaco.


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: Wrong sort by score

2008-08-27 Thread Chris Hostetter

: It seems like the debug information is using the custom similarity as it
: should - the bug isn't there.
: I see in the explain information the right tf value (I modified it to be 1
: in my custom similarity).
: The numbers in the explain seem to add up and make sense.
: Is it possible that the score itself is wrong (the one that I get from fl)?

the score in the doclist is by definition the correct score - the 
debug info follows a differnet code path and sometimes that code path 
isn't in sink with the the actual searching/scoring code for differnet 
query types (although i was pretty confident that the test i added to 
Lucene-Java a whle back tested this for anything you can see in Solr 
without getting into crazy contrib Query classes)

it would help if you could post:

1) the full debugQuery output from a query where you see this 
disconnect, showing the all query toString info, and the score 
explanations
2) the corrisponding scores you see in the doclist
3) some more details about how your custom similarity works (can you post 
the code)
4) info on how you've configured dismax and what request params you are 
using (the output from using echoParams=all would be good)




-Hoss



Re: copyField: String vs Text Field

2008-08-27 Thread Yonik Seeley
On Wed, Aug 27, 2008 at 7:47 PM, Jake Conk [EMAIL PROTECTED] wrote:
 Thanks for the reply. Does that mean that if I were to edit the data
 then the field it was copied to will not be updated?

You can't really edit a document in Lucene or Solr, really just
overwrite an old document with an entirely new version.

 I assume it does
 get deleted if I delete the record right? I understand how it can make
 searching simpler by copying fields to one but would that really make
 it faster? How?

Searching a single field for a term is faster than searching multiple
fields for a term.
That's really only one use case though... the other being to have a
single stored field that is analyzed multiple different ways.

-Yonik

 Thanks,
 - Jake

 On Wed, Aug 27, 2008 at 2:22 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 Jake, copyField exists to decouple document values (on the update
 size) from how they are indexed.

 From the example schema:
  !-- copyField commands copy one field to another at the time a document
is added to the index.  It's used either to index the same
 field differently,
or to add multiple fields to the same field for easier/faster 
 searching.
  --

 -Yonik

 On Wed, Aug 27, 2008 at 4:46 PM, Jake Conk [EMAIL PROTECTED] wrote:
 Hello,

 I was wondering if there was an added advantage in using copyField /
 to copy a string field to a text field?

 If the field is copied to a text field then why not just make the
 field a text field and eliminate copying its data?

 If you are going to use full text searching on that field which you
 cant do with string fields wouldn't it just make sense to keep it a
 text field since it has the same abilities as a string field and more?

 ... Or is the reason because string fields have better performance on
 matching exact strings than text fields?

 Thanks,

 - Jake





Re: copyField: String vs Text Field

2008-08-27 Thread Walter Underwood
On 8/27/08 5:54 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
 
 That's really only one use case though... the other being to have a
 single stored field that is analyzed multiple different ways.

We are the other use case. We take a title and put it in three
fields: one merely lowercased, one stemmed and stopped, and one
phonetic. At query time, we search all three with decreasing
weights. An exact match is weighted more than a stemmed and
stopped match, and so on.

wunder
--
Search Guy, Netflix




Re: dataimporthandler and mysql connector jar

2008-08-27 Thread Shalin Shekhar Mangar
On Thu, Aug 28, 2008 at 5:11 AM, Chris Hostetter
[EMAIL PROTECTED]wrote:


 code freeze may be overstating it ... the point of the freeze is to hold
 off on new fatures and other misc refactorings and focus on bug fixes and
 documentation improvements.


Ah ok. I was under the impression that only blocker bugs should make it
there.



 This sounds like a bug, and assuming the fix isn't insanely invasive
 there's no reason not to make bug fixes on the 1.3 branch (and merge with
 the trunk).



That's great. There are a couple of small bugs which could make it to 1.3
then.


-- 
Regards,
Shalin Shekhar Mangar.