a bug of solr distributed search

2010-07-21 Thread Li Li
in QueryComponent.mergeIds. It will remove document which has
duplicated uniqueKey with others. In current implementation, it use
the first encountered.
  String prevShard = uniqueDoc.put(id, srsp.getShard());
  if (prevShard != null) {
// duplicate detected
numFound--;
collapseList.remove(id+);
docs.set(i, null);//remove it.
// For now, just always use the first encountered since we
can't currently
// remove the previous one added to the priority queue.
If we switched
// to the Java5 PriorityQueue, this would be easier.
continue;
// make which duplicate is used deterministic based on shard
// if (prevShard.compareTo(srsp.shard) = 0) {
//  TODO: remove previous from priority queue
//  continue;
// }
  }

 It iterate ove ShardResponse by
for (ShardResponse srsp : sreq.responses)
But the sreq.responses may be different. That is -- shard1's result
and shard2's result may interchange position
So when an uniqueKey(such as url) occurs in both shard1 and shard2.
which one will be used is unpredicatable. But the socre of these 2
docs are different because of different idf.
So the same query will get different result.
One possible solution is to sort ShardResponse srsp  by shard name.


Re: Any there any known issues may cause the index sync between the master/slave abnormal?

2010-07-21 Thread Peter Karich
Hi!

 Any there any known issues may cause the index sync between the
 master/slave abnormal?

What do you mean here? Corrupt indices? Please, describe your problems
in more detail.

 And is there any API to call to force sync the index between the
 master and slave, or force to delete the old index on the slave?

Syncing can be done via HTTP:
http://wiki.apache.org/solr/SolrReplication

Regards,
Peter.


Re:Re: Any there any known issues may cause the index sync between the master/slave abnormal?

2010-07-21 Thread Chengyang


Hi Peter,
Thanks your reponse. I will check the 
http://wiki.apache.org/solr/SolrReplication first.
I mean the slave node did not delete the old index and finally cause the disk 
usage to large for the  slave node.
I am thinking to manually force the slave node to refresh the index.

Regards,
James.



Hi!

 Any there any known issues may cause the index sync between the
 master/slave abnormal?

What do you mean here? Corrupt indices? Please, describe your problems
in more detail.

 And is there any API to call to force sync the index between the
 master and slave, or force to delete the old index on the slave?

Syncing can be done via HTTP:

http://wiki.apache.org/solr/SolrReplication
Regards,
Peter.


Re: Any there any known issues may cause the index sync between the master/slave abnormal?

2010-07-21 Thread Peter Karich
Hi James,

triggering an optimize (on the salve) helped us to shrink the disc usage
of the slaves.
But I think, the slaves will clean them up automatically on the next
replication (if you don't mind the double-size-index)

Regards,
Peter.



 Hi Peter,
 Thanks your reponse. I will check the
 http://wiki.apache.org/solr/SolrReplication first.
 I mean the slave node did not delete the old index and finally cause
 the disk usage to large for the slave node.
 I am thinking to manually force the slave node to refresh the index.

 Regards,
 James.



 Hi!

 Any there any known issues may cause the index sync between the
 master/slave abnormal?

 What do you mean here? Corrupt indices? Please, describe your problems
 in more detail.

 And is there any API to call to force sync the index between the
 master and slave, or force to delete the old index on the slave?

 Syncing can be done via HTTP:

 http://wiki.apache.org/solr/SolrReplication
 Regards,
 Peter.




Re: a bug of solr distributed search

2010-07-21 Thread MitchK

Li Li,

this is the intended behaviour, not a bug.
Otherwise you could get back the same record in a response for several
times, which may not be intended by the user.

Kind regards,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983675.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: a bug of solr distributed search

2010-07-21 Thread Li Li
But users will think there is something wrong with it when he/she
search the same query but got different result.

2010/7/21 MitchK mitc...@web.de:

 Li Li,

 this is the intended behaviour, not a bug.
 Otherwise you could get back the same record in a response for several
 times, which may not be intended by the user.

 Kind regards,
 - Mitch
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983675.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: set field with value 0 to the end

2010-07-21 Thread Grijesh.singh

Integer field can be empty also, I think u have set required=true if that
remove required=true,
and u can live field without data at the time of indexing.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/set-field-with-value-0-to-the-end-tp980580p983728.html
Sent from the Solr - User mailing list archive at Nabble.com.


nested query and number of matched records

2010-07-21 Thread MitchK

Hello community,

I got a situation, where I know that some types of documents contain very
extensive information and other types are giving more general information.
Since I don't know whether a user searches for general or extensive
information (and I don't want to ask him when he uses the default search), I
want to give him a response back like this:

10 documents are type: short
1 document, if there is one, is type: extensive

An example query would look like this:
q={!dismax fq=type:short}my cool query OR {!dismax fq=type:extensive}my cool
query
The problem with this one will be, that I can not specify to retrive up to
10 short-documents and at most one extensive.

I think this will not work and if I want to create such a search, I need to
do two different queries. But before I waste performance, I wanted to ask.

Thank you!
Kind regards,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/nested-query-and-number-of-matched-records-tp983756p983756.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: a bug of solr distributed search

2010-07-21 Thread MitchK

Ah, okay. I understand your problem. Why should doc x be at position 1 when
searching for the first time, and when I search for the 2nd time it occurs
at position 8 - right?

I am not sure, but I think you can't prevent this without custom coding or
making a document's occurence unique.

Kind regards,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: nested query and number of matched records

2010-07-21 Thread MitchK

Oh,... I just see, there is no direct question ;-).

How can I specify the number of returned documents in the desired way
*within* one request?

- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/nested-query-and-number-of-matched-records-tp983756p983773.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: a bug of solr distributed search

2010-07-21 Thread Li Li
yes. This will make user think our search engine has some bug.
from the comments of the codes, it needs more things to do
  if (prevShard != null) {
// For now, just always use the first encountered since we
can't currently
// remove the previous one added to the priority queue.
If we switched
// to the Java5 PriorityQueue, this would be easier.
continue;
// make which duplicate is used deterministic based on shard
// if (prevShard.compareTo(srsp.shard) = 0) {
//  TODO: remove previous from priority queue
//  continue;
// }
  }

2010/7/21 MitchK mitc...@web.de:

 Ah, okay. I understand your problem. Why should doc x be at position 1 when
 searching for the first time, and when I search for the 2nd time it occurs
 at position 8 - right?

 I am not sure, but I think you can't prevent this without custom coding or
 making a document's occurence unique.

 Kind regards,
 - Mitch
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983771.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: a bug of solr distributed search

2010-07-21 Thread MitchK

I don't know much about the code. 
Maybe you can tell me to what file you are referring?

However, from the comments one can see, that the problem is known but one
decided to let it happen, because of System requirements in the Java
version. 

- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p983880.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: LocalSolr distance in km?

2010-07-21 Thread Saïd Radhouani
Hi,

What resource are you using for LocalSolr?
Using the SpatialTierQParser, you can choose between km or mile: 
http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/   
Or, if you are using the LocalSolrQueryComponent 
(http://www.gissearch.com/localsolr), and you can't choose between the two 
units, you can use the radius parameter and the conversion from mile to Km (1 
kilometer = 0.621371192 mile), e.g., 
http://...select?qt=geolat=xx.xxlong=yy.yyq=*:*radius=0.621371192

HTP
-S

On Jul 21, 2010, at 6:14 AM, Chamnap Chhorn wrote:

 Hi,
 
 I want to do a geo query with LocalSolr. However, It seems it supports only
 miles **when calculating distances. Is there a quick way to use this search
 component with solr using Km instead?
 The other thing I want it to calculate distance start from 500 meters up.
 How could I do this?
 
 -- 
 Chhorn Chamnap
 http://chamnapchhorn.blogspot.com/



Re: nested query and number of matched records

2010-07-21 Thread Grijesh.singh

I Think Solr does not provide any thing like that U want.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/nested-query-and-number-of-matched-records-tp983756p983938.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrconfig.xml and xinclude

2010-07-21 Thread fiedzia

I am trying to export some config options common to all cores into single
file,
which would be included using xinclude. The only problem is how to include
childrens of given node.


common_solrconfig.xml looks like that:
?xml version=1.0 encoding=UTF-8 ?
config
 lib dir=/solr/lib /
/config


solrconfig.xml looks like that:
?xml version=1.0 encoding=UTF-8 ?
config
!-- xinclude here --
/config


now all of the following attemps have failed:

xi:include href=/solr/common_solrconfig.xml 
xmlns:xi=http://www.w3.org/2001/XInclude;/xi:include
xi:include href=/solr/common_solrconfig.xml xpointer=config/* 
xmlns:xi=http://www.w3.org/2001/XInclude;/xi:include
xi:include href=/solr/common_solrconfig.xml xpointer=xpointer(config/*) 
xmlns:xi=http://www.w3.org/2001/XInclude;/xi:include

xi:include href=/solr/common_solrconfig.xml xpointer=element(config/*)
xmlns:xi=http://www.w3.org/2001/XInclude;/xi:include


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/solrconfig-xml-and-xinclude-tp984058p984058.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Load cores without restarting/reloading Solr

2010-07-21 Thread Andrew McCombe
Hi Peter

We are using the packaged Ubuntu Server  (10.04 LTS) versions of Tomcat6 and
Solr1.4 and running a single instance of Solr with multiple cores.

Regards
Andrew

On 20 July 2010 19:47, Peter Karich peat...@yahoo.de wrote:

 Hi Andrew,

 the whole tomcat shouldn't fail on restart if only one core fails.
 We are using the setup described here:
 http://wiki.apache.org/solr/SolrTomcat

 With the help of several different Tomcat Context xml files (under
 conf/Catalina/localhost/) the cores should be independent webapps:
 A different data directory (+config) and even a different solr version
 is possible.

 Or are you using the same setup?

 Regards,
 Peter.

  Hi
 
  Sorry, it wasn't very clear was it? [?]
 
  Yes, I use a 'template' core that isn't used and create a copy of this on
  the command line. I then edit the newcore/conf/solrconfig.xml and set the
  data path, add data-import sections etc and then I edit the
  solr.home/solr.xml and add the core name  directory to that.  I then go
 to
  the Tomcat manager/html and reload Solr.
 
  The problem I get is that if I have broken something in the new core Solr
  (correctly) doesn't reload and the other cores aren't then working.
 
  I don't need replication just yet but I will be looking into that
  eventually.
 
  Regards
  Andrew
 
 
  On 20 July 2010 10:32, Peter Karich peat...@yahoo.de wrote:
 
 
  Hi Andrew,
 
  I didn't correctly understand what you are trying to do with 'copying'?
  Just use one core as a template or use it to replicate data?
 
  You can reload only one application via:
  http://localhost/manager/html/reload?path=/yourapp
  (if you do this often you need to increase the PermGen space)
 
  You can replicate a core:
  http://wiki.apache.org/solr/SolrReplication
 
  Regards,
  Peter.
 
 
  Hi
 
  We have a few cores set up for separate sites and one of these is in
 use
  constantly.  When I add a new core I can currently copying one of the
 
  other
 
  cores and renaming it, changing the conf etc and then reloading Solr
 via
 
  the
 
  tomcat manager.  However, if something goes wrong then the other cores
 
  stop
 
  working until I have resolved the problem.
 
  My questions are:
 
  1) Is using a separate core for different sites the correct method?
 
  2) Is there a way of creating a core and starting it without having to
  reload Solr or restart tomcat?
 
  3) I've looked at the Solr Cores CREATE handler but from what I gather,
 I
  need to create the core folder and edit the solr.xml first before
 loading
  the core with action=CREATE. Is that correct?
 
  Regards
  Andrew
 
 
 




Re: nested query and number of matched records

2010-07-21 Thread kenf_nc

parallel calls. simultaneously query for type:short rows=10  and
type:extensive rows=1  and merge your results.  This would also let you
separate your short docs from your extensive docs into different solr
instances if you wished...depending on your document architecture this could
speed up one or the other.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/nested-query-and-number-of-matched-records-tp983756p984280.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: nested query and number of matched records

2010-07-21 Thread Chantal Ackermann
Sure SOLR supports this: use facets on the field type:

add to your regular query:

facet.query=truefacet.field=type

see http://wiki.apache.org/solr/SimpleFacetParameters


On Wed, 2010-07-21 at 15:48 +0200, kenf_nc wrote:
 parallel calls. simultaneously query for type:short rows=10  and
 type:extensive rows=1  and merge your results.  This would also let you
 separate your short docs from your extensive docs into different solr
 instances if you wished...depending on your document architecture this could
 speed up one or the other.





Re: nested query and number of matched records

2010-07-21 Thread kenf_nc

That just gives a count of documents by type. The use-case, I believe, is to
return from a search, 10 documents of type 'short' and 1 document of type
'extensive'. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/nested-query-and-number-of-matched-records-tp983756p984539.html
Sent from the Solr - User mailing list archive at Nabble.com.


faceted search with job title

2010-07-21 Thread Savannah Beckett
Hi,
  I am currently using nutch to crawl some job pages from job boards.  They are 
in my solr index now.  I want to do faceted search with the job titles.  How?  
The job titles can be in any locations of the page, e.g. title, header, 
content...   If I use indexfilter in Nutch to search the content for job title, 
there are hundred of thousands of job titles, I can't hard code them all.  Do 
you have a better idea?  I think I need the job title in a separate field in 
the 
index to make it work with solr faceted search, am I right?
Thanks.


  

RE: faceted search with job title

2010-07-21 Thread Dave Searle
You'd probably need to do some post processing on the pages and set up rules 
for each website to grab that specific bit of data. You could load the html 
into an xml parser, then use xpath to grab content from a particular tag with a 
class or id, based on the particular website



-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
Sent: 21 July 2010 16:38
To: solr-user@lucene.apache.org
Subject: faceted search with job title

Hi,
  I am currently using nutch to crawl some job pages from job boards.  They are 
in my solr index now.  I want to do faceted search with the job titles.  How?  
The job titles can be in any locations of the page, e.g. title, header, 
content...   If I use indexfilter in Nutch to search the content for job title, 
there are hundred of thousands of job titles, I can't hard code them all.  Do 
you have a better idea?  I think I need the job title in a separate field in 
the 
index to make it work with solr faceted search, am I right?
Thanks.


  


RE: Securing Solr 1.4 in a glassfish container AS NEW THREAD

2010-07-21 Thread Sharp, Jonathan

Some further information --

I tried indexing a batch of PDFs with the client and Solr CELL, setting
the credentials in the httpclient. For some reason after successfully
indexing several hundred files I start getting a SolrException:
Unauthorized and an info message (for every subsequent file):

INFO basic authentication scheme selected
Org.apache.commons.httpclient.HttpMethodDirector process
WWWAuthChallenge
INFO Failure authenticating with BASIC 'realm'@host:port

I increased session timeout in web.xml with no change. I'm looking
through the httpclient authentication now.

-Jon

-Original Message-
From: Sharp, Jonathan 
Sent: Friday, July 16, 2010 8:59 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Securing Solr 1.4 in a glassfish container AS NEW THREAD

Hi Bilgin,

Thanks for the snippet -- that helps a lot.

-Jon

-Original Message-
From: Bilgin Ibryam [mailto:bibr...@gmail.com] 
Sent: Friday, July 16, 2010 1:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Securing Solr 1.4 in a glassfish container AS NEW THREAD

Hi Jon,

SolrJ (CommonsHttpSolrServer) internally uses apache http client to
connect
to solr. You can check there for some documentation.
I secured solr also with BASIC auth-method and use the following snippet
to
access it from solrJ:

  //set username and password
  ((CommonsHttpSolrServer)
server).getHttpClient().getParams().setAuthenticationPreemptive(true);
  Credentials defaultcreds = new
UsernamePasswordCredentials(username,
secret);
  ((CommonsHttpSolrServer)
server).getHttpClient().getState().setCredentials(new
AuthScope(localhost,
80, AuthScope.ANY_REALM), defaultcreds);

HTH
Bilgin Ibryam



On Fri, Jul 16, 2010 at 2:35 AM, Sharp, Jonathan jsh...@coh.org wrote:

 Hi All,

 I am considering securing Solr with basic auth in glassfish using the
 container, by adding to web.xml and adding sun-web.xml file to the
 distributed WAR as below.

 If using SolrJ to index files, how can I provide the credentials for
 authentication to the http-client (or can someone point me in the
direction
 of the right documentation to do that or that will help me make the
 appropriate modifications) ?

 Also any comment on the below is appreciated.

 Add this to web.xml
 ---
   login-config
   auth-methodBASIC/auth-method
   realm-nameSomeRealm/realm-name
   /login-config
   security-constraint
   web-resource-collection
   web-resource-nameAdmin Pages/web-resource-name
   url-pattern/admin/url-pattern
   url-pattern/admin/*/url-pattern


http-methodGET/http-methodhttp-methodPOST/http-methodhttp-metho
dPUT/http-methodhttp-methodTRACE/http-methodhttp-methodHEAD/htt
p-methodhttp-methodOPTIONS/http-methodhttp-methodDELETE/http-met
hod
   /web-resource-collection
   auth-constraint
   role-nameSomeAdminRole/role-name
   /auth-constraint
   /security-constraint
   security-constraint
   web-resource-collection
   web-resource-nameUpdate Servlet/web-resource-name
   url-pattern/update/*/url-pattern


http-methodGET/http-methodhttp-methodPOST/http-methodhttp-metho
dPUT/http-methodhttp-methodTRACE/http-methodhttp-methodHEAD/htt
p-methodhttp-methodOPTIONS/http-methodhttp-methodDELETE/http-met
hod
   /web-resource-collection
   auth-constraint
   role-nameSomeUpdateRole/role-name
   /auth-constraint
   /security-constraint
   security-constraint
   web-resource-collection
   web-resource-nameSelect Servlet/web-resource-name
   url-pattern/select/*/url-pattern


http-methodGET/http-methodhttp-methodPOST/http-methodhttp-metho
dPUT/http-methodhttp-methodTRACE/http-methodhttp-methodHEAD/htt
p-methodhttp-methodOPTIONS/http-methodhttp-methodDELETE/http-met
hod
   /web-resource-collection
   auth-constraint
   role-nameSomeSearchRole/role-name
   /auth-constraint
   /security-constraint
 ---

 Also add this as sun-web.xml

 
 ?xml version=1.0 encoding=UTF-8?
 !DOCTYPE sun-web-app PUBLIC -//Sun Microsystems, Inc.//DTD
Application
 Server 9.0 Servlet 2.5//EN 
 http://www.sun.com/software/appserver/dtds/sun-web-app_2_5-0.dtd;
 sun-web-app error-url=
  context-root/Solr/context-root
  jsp-config
   property name=keepgenerated value=true
 descriptionKeep a copy of the generated servlet class' java
 code./description
   /property
  /jsp-config
  security-role-mapping
 role-nameSomeAdminRole/role-name
 group-nameSomeAdminGroup/group-name
  /security-role-mapping
  security-role-mapping
 role-nameSomeUpdateRole/role-name
 group-nameSomeUpdateGroup/group-name
  /security-role-mapping
  security-role-mapping
 role-nameSomeSearchRole/role-name
 group-nameSomeSearchGroup/group-name
  /security-role-mapping
 /sun-web-app
 --

 -Jon


 

Re: a bug of solr distributed search

2010-07-21 Thread Siva Kommuri
How about sorting over the score? Would that be possible?

On Jul 21, 2010, at 12:13 AM, Li Li wrote:

 in QueryComponent.mergeIds. It will remove document which has
 duplicated uniqueKey with others. In current implementation, it use
 the first encountered.
  String prevShard = uniqueDoc.put(id, srsp.getShard());
  if (prevShard != null) {
// duplicate detected
numFound--;
collapseList.remove(id+);
docs.set(i, null);//remove it.
// For now, just always use the first encountered since we
 can't currently
// remove the previous one added to the priority queue.
 If we switched
// to the Java5 PriorityQueue, this would be easier.
continue;
// make which duplicate is used deterministic based on shard
// if (prevShard.compareTo(srsp.shard) = 0) {
//  TODO: remove previous from priority queue
//  continue;
// }
  }
 
 It iterate ove ShardResponse by
 for (ShardResponse srsp : sreq.responses)
 But the sreq.responses may be different. That is -- shard1's result
 and shard2's result may interchange position
 So when an uniqueKey(such as url) occurs in both shard1 and shard2.
 which one will be used is unpredicatable. But the socre of these 2
 docs are different because of different idf.
 So the same query will get different result.
 One possible solution is to sort ShardResponse srsp  by shard name.



Re: faceted search with job title

2010-07-21 Thread Savannah Beckett
mmm...there must be better way...each job board has different format.  If there 
are constantly new job boards being crawled, I don't think I can manually look 
for specific sequence of tags that leads to job title.  Most of them don't even 
have class or id.  There is no guarantee that the job title will be in the 
title 
tag, or header tag.  Something else can be in the title.  Should I do this in a 
class that extends IndexFilter in Nutch?
Thanks. 





From: Dave Searle dave.sea...@magicalia.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wed, July 21, 2010 8:42:55 AM
Subject: RE: faceted search with job title

You'd probably need to do some post processing on the pages and set up rules 
for 
each website to grab that specific bit of data. You could load the html into an 
xml parser, then use xpath to grab content from a particular tag with a class 
or 
id, based on the particular website



-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
Sent: 21 July 2010 16:38
To: solr-user@lucene.apache.org
Subject: faceted search with job title

Hi,
  I am currently using nutch to crawl some job pages from job boards.  They are 
in my solr index now.  I want to do faceted search with the job titles.  How?  
The job titles can be in any locations of the page, e.g. title, header, 
content...   If I use indexfilter in Nutch to search the content for job title, 
there are hundred of thousands of job titles, I can't hard code them all.  Do 
you have a better idea?  I think I need the job title in a separate field in 
the 

index to make it work with solr faceted search, am I right?
Thanks.


  

Re: a bug of solr distributed search

2010-07-21 Thread MitchK

It already was sorted by score.

The problem here is the following:
Shard_A and shard_B contain doc_X and doc_X.
If you are querying for something, doc_X could have a score of 1.0 at
shard_A and a score of 12.0 at shard_B.

You can never be sure which doc Solr sees first. In the bad case, Solr sees
the doc_X firstly at shard_A and ignores it at shard_B. That means, that the
doc maybe would occur at page 10 in pagination, although it *should* occur
at page 1 or 2.

Kind regards,
- Mitch
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p984743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: nested query and number of matched records

2010-07-21 Thread MitchK

Thank you three for your feedback!

Chantal, unfortuntately kenf is right. Facetting won't work in this special
case. 


 parallel calls.
 
Yes, this will be the solution. However, this would lead to a second
HTTP-request and I hoped to be able to avoid it.


Chantal Ackermann wrote:
 
 Sure SOLR supports this: use facets on the field type:
 
 add to your regular query:
 
 facet.query=truefacet.field=type
 
 see http://wiki.apache.org/solr/SimpleFacetParameters
 
 
 On Wed, 2010-07-21 at 15:48 +0200, kenf_nc wrote:
 parallel calls. simultaneously query for type:short rows=10  and
 type:extensive rows=1  and merge your results.  This would also let you
 separate your short docs from your extensive docs into different solr
 instances if you wished...depending on your document architecture this
 could
 speed up one or the other.
 
 
 
 
 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/nested-query-and-number-of-matched-records-tp983756p984750.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: faceted search with job title

2010-07-21 Thread Nagelberg, Kallin
Yeah you should definitely just setup a custom parser for each site.. should be 
easy to extract title using groovy's xml parsing along with tagsoup for sloppy 
html. If you can't find the pattern for each site leading to the job title how 
can you expect solr to? Humans have the advantage here :P

-Kallin Nagelberg

-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
Sent: Wednesday, July 21, 2010 12:20 PM
To: solr-user@lucene.apache.org
Cc: dave.sea...@magicalia.com
Subject: Re: faceted search with job title

mmm...there must be better way...each job board has different format.  If there 
are constantly new job boards being crawled, I don't think I can manually look 
for specific sequence of tags that leads to job title.  Most of them don't even 
have class or id.  There is no guarantee that the job title will be in the 
title 
tag, or header tag.  Something else can be in the title.  Should I do this in a 
class that extends IndexFilter in Nutch?
Thanks. 





From: Dave Searle dave.sea...@magicalia.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wed, July 21, 2010 8:42:55 AM
Subject: RE: faceted search with job title

You'd probably need to do some post processing on the pages and set up rules 
for 
each website to grab that specific bit of data. You could load the html into an 
xml parser, then use xpath to grab content from a particular tag with a class 
or 
id, based on the particular website



-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
Sent: 21 July 2010 16:38
To: solr-user@lucene.apache.org
Subject: faceted search with job title

Hi,
  I am currently using nutch to crawl some job pages from job boards.  They are 
in my solr index now.  I want to do faceted search with the job titles.  How?  
The job titles can be in any locations of the page, e.g. title, header, 
content...   If I use indexfilter in Nutch to search the content for job title, 
there are hundred of thousands of job titles, I can't hard code them all.  Do 
you have a better idea?  I think I need the job title in a separate field in 
the 

index to make it work with solr faceted search, am I right?
Thanks.


  


Re: help finding illegal chars in XML doc

2010-07-21 Thread robert mena
Hi Chris,

Thanks for your reply. I could not find in the log files any mention to
that.  By the way I only have _MM_DD.request.log files in my directory.

Do I have to enable any specific log or level to catch those errors?

On Sun, Jul 18, 2010 at 3:45 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : SimplePostTool: FATAL: Solr returned an error:
 : Illegal_character_CTRLCHAR_code_27__at_rowcol_unknownsource_37022847
 :
 : I've tried to track where this problem is located without luck.

 check your solr logs, it will contain the unmunged version of the error
 message (the the version of jetty used in the 1.4.1 example setup seems to
 think all punctuation should be removed from error messages) complete with
 the row/column of your XML message that had the problem (it's either
 3,7022847; or 370,22847; or 3702,2847; etc...



 -Hoss




boosting particular field values

2010-07-21 Thread Justin Lolofie
I'm using dismax request handler, solr 1.4.

I would like to boost the weight of certain fields according to their
values... this appears to work:

bq=category:electronics^5.5

However, I think this boosting only affects sorting the results that
have already matched? So if I only get 10 rows back, I might not get
any records back that are category electronics. If I get 100 rows, I
can see that bq is working. However, I only want to get 10 rows.

How does one affect the kinds of results that are matched to begin
with? bq is the wrong thing to use, right?

Thanks for any help,
Justin


RE: boosting particular field values

2010-07-21 Thread Markus Jelsma
function queries match all documents


http://wiki.apache.org/solr/FunctionQuery#Using_FunctionQuery

 
-Original message-
From: Justin Lolofie jta...@gmail.com
Sent: Wed 21-07-2010 20:24
To: solr-user@lucene.apache.org; 
Subject: boosting particular field values

I'm using dismax request handler, solr 1.4.

I would like to boost the weight of certain fields according to their
values... this appears to work:

bq=category:electronics^5.5

However, I think this boosting only affects sorting the results that
have already matched? So if I only get 10 rows back, I might not get
any records back that are category electronics. If I get 100 rows, I
can see that bq is working. However, I only want to get 10 rows.

How does one affect the kinds of results that are matched to begin
with? bq is the wrong thing to use, right?

Thanks for any help,
Justin


Re: Solr searching performance issues, using large documents

2010-07-21 Thread Peter Spam
From the mailing list archive, Koji wrote:

 1. Provide another field for highlighting and use copyField to copy plainText 
 to the highlighting field.

and Lance wrote: 
http://www.mail-archive.com/solr-user@lucene.apache.org/msg35548.html

 If you want to highlight field X, doing the 
 termOffsets/termPositions/termVectors will make highlighting that field 
 faster. You should make a separate field and apply these options to that 
 field.
 
 Now: doing a copyfield adds a value to a multiValued field. For a text 
 field, you get a multi-valued text field. You should only copy one value to 
 the highlighted field, so just copyField the document to your special field. 
 To enforce this, I would add multiValued=false to that field, just to avoid 
 mistakes.
 
 So, all_text should be indexed without the term* attributes, and should not 
 be stored. Then your document stored in a separate field that you use for 
 highlighting and has the term* attributes.

I've been experimenting with this, and here's what I've tried:

   field name=body type=text_pl indexed=true stored=false 
multiValued=true termVectors=true termPositions=true termOff
sets=true /
   field name=body_all type=text_pl indexed=false stored=true 
multiValued=true /
   copyField source=body dest=body_all/

... but it's still very slow (10+ seconds).  Why is it better to have two 
fields (one indexed but not stored, and the other not indexed but stored) 
rather than just one field that's both indexed and stored?


From the Perf wiki page http://wiki.apache.org/solr/SolrPerformanceFactors

 If you aren't always using all the stored fields, then enabling lazy field 
 loading can be a huge boon, especially if compressed fields are used.

What does this mean?  How do you load a field lazily?

Thanks for your time, guys - this has started to become frustrating, since it 
works so well, but is very slow!


-Pete

On Jul 20, 2010, at 5:36 PM, Peter Spam wrote:

 Data set: About 4,000 log files (will eventually grow to millions).  Average 
 log file is 850k.  Largest log file (so far) is about 70MB.
 
 Problem: When I search for common terms, the query time goes from under 2-3 
 seconds to about 60 seconds.  TermVectors etc are enabled.  When I disable 
 highlighting, performance improves a lot, but is still slow for some queries 
 (7 seconds).  Thanks in advance for any ideas!
 
 
 -Peter
 
 
 -
 
 4GB RAM server
 % java -Xms2048M -Xmx3072M -jar start.jar
 
 -
 
 schema.xml changes:
 
fieldType name=text_pl class=solr.TextField
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/ 
   filter class=solr.WordDelimiterFilterFactory generateWordParts=0 
 generateNumberParts=0 catenateWords=0 catenateNumbers=0 catenateAll=0 
 splitOnCaseChange=0/
  /analyzer
/fieldType
 
 ...
 
   field name=body type=text_pl indexed=true stored=true 
 multiValued=false termVectors=true termPositions=true 
 termOffsets=true /
field name=timestamp type=date indexed=true stored=true 
 default=NOW multiValued=false/
   field name=version type=string indexed=true stored=true 
 multiValued=false/
   field name=device type=string indexed=true stored=true 
 multiValued=false/
   field name=filename type=string indexed=true stored=true 
 multiValued=false/
   field name=filesize type=long indexed=true stored=true 
 multiValued=false/
   field name=pversion type=int indexed=true stored=true 
 multiValued=false/
   field name=first2md5 type=string indexed=false stored=true 
 multiValued=false/
   field name=ckey type=string indexed=true stored=true 
 multiValued=false/
 
 ...
 
 dynamicField name=* type=ignored multiValued=true /
 defaultSearchFieldbody/defaultSearchField
 solrQueryParser defaultOperator=AND/
 
 -
 
 solrconfig.xml changes:
 
maxFieldLength2147483647/maxFieldLength
ramBufferSizeMB128/ramBufferSizeMB
 
 -
 
 The query:
 
 rowStr = rows=10
 facet = 
 facet=truefacet.limit=10facet.field=devicefacet.field=ckeyfacet.field=version
 fields = fl=id,score,filename,version,device,first2md5,filesize,ckey
 termvectors = tv=trueqt=tvrhtv.all=true
 hl = hl=truehl.fl=bodyhl.snippets=1hl.fragsize=400
 regexv = (?m)^.*\n.*\n.*$
 hl_regex = hl.regex.pattern= + CGI::escape(regexv) + 
 hl.regex.slop=1hl.fragmenter=regexhl.regex.maxAnalyzedChars=2147483647hl.maxAnalyzedChars=2147483647
 justq = 'q=' + CGI::escape('body:' + fuzzy + p['q'].to_s.gsub(/\\/, 
 

Re: faceted search with job title

2010-07-21 Thread Savannah Beckett
I don't see how it can be done without writing sax or dom code for each job 
board, it is non-maintainable if there are a lot of new job boards being 
crawled.  Maybe I should use regex match?  Then I just need to substitute the 
regex pattern for each job board without writing any new sax or dom code.  But 
is regex pattern flexible enough for all job boards?
Thanks.





From: Nagelberg, Kallin knagelb...@globeandmail.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wed, July 21, 2010 10:39:32 AM
Subject: RE: faceted search with job title

Yeah you should definitely just setup a custom parser for each site.. should be 
easy to extract title using groovy's xml parsing along with tagsoup for sloppy 
html. If you can't find the pattern for each site leading to the job title how 
can you expect solr to? Humans have the advantage here :P

-Kallin Nagelberg

-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
Sent: Wednesday, July 21, 2010 12:20 PM
To: solr-user@lucene.apache.org
Cc: dave.sea...@magicalia.com
Subject: Re: faceted search with job title

mmm...there must be better way...each job board has different format.  If there 
are constantly new job boards being crawled, I don't think I can manually look 
for specific sequence of tags that leads to job title.  Most of them don't even 
have class or id.  There is no guarantee that the job title will be in the 
title 

tag, or header tag.  Something else can be in the title.  Should I do this in a 
class that extends IndexFilter in Nutch?
Thanks. 





From: Dave Searle dave.sea...@magicalia.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Wed, July 21, 2010 8:42:55 AM
Subject: RE: faceted search with job title

You'd probably need to do some post processing on the pages and set up rules 
for 

each website to grab that specific bit of data. You could load the html into an 
xml parser, then use xpath to grab content from a particular tag with a class 
or 

id, based on the particular website



-Original Message-
From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
Sent: 21 July 2010 16:38
To: solr-user@lucene.apache.org
Subject: faceted search with job title

Hi,
  I am currently using nutch to crawl some job pages from job boards.  They are 
in my solr index now.  I want to do faceted search with the job titles.  How?  
The job titles can be in any locations of the page, e.g. title, header, 
content...   If I use indexfilter in Nutch to search the content for job title, 
there are hundred of thousands of job titles, I can't hard code them all.  Do 
you have a better idea?  I think I need the job title in a separate field in 
the 


index to make it work with solr faceted search, am I right?
Thanks.


  

Dismax query response field number

2010-07-21 Thread scrapy

 

 Hi,

It seems that not all field are returned from query response when i use DISMAX? 
Only first 10??

Any idea? 

Here is my solrconfig:

 requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
   str name=fl*/str
 float name=tie0.01/float
 str name=qf
text^0.5 content^1.1 title^1.5
 /str
 str name=pf
text^0.2 content^1.1 title^1.5
 /str
 str name=bf
recip(price,1,1000,1000)^0.3
 /str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps100/int
 str name=q.alt*:*/str
 !-- example highlighter config, enable per-query with hl=true --
 str name=hl.fltext features name/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.name.hl.fragsize0/str
 !-- instructs Solr to return the field itself if no query terms are
  found --
 str name=f.name.hl.alternateFieldname/str
 str name=f.text.hl.fragmenterregex/str !-- defined below --
/lst
  /requestHandler




Re: boosting particular field values

2010-07-21 Thread Justin Lolofie
I might have misunderstood, but I think I cant do string literals in
function queries, right?

myfield:something^3.0

I tried it anyway using solr 1.4, doesnt seem to work.

On Wed, Jul 21, 2010 at 1:48 PM, Markus Jelsma markus.jel...@buyways.nl wrote:
 function queries match all documents


 http://wiki.apache.org/solr/FunctionQuery#Using_FunctionQuery


 -Original message-
 From: Justin Lolofie jta...@gmail.com
 Sent: Wed 21-07-2010 20:24
 To: solr-user@lucene.apache.org;
 Subject: boosting particular field values

 I'm using dismax request handler, solr 1.4.

 I would like to boost the weight of certain fields according to their
 values... this appears to work:

 bq=category:electronics^5.5

 However, I think this boosting only affects sorting the results that
 have already matched? So if I only get 10 rows back, I might not get
 any records back that are category electronics. If I get 100 rows, I
 can see that bq is working. However, I only want to get 10 rows.

 How does one affect the kinds of results that are matched to begin
 with? bq is the wrong thing to use, right?

 Thanks for any help,
 Justin



Count hits per document?

2010-07-21 Thread Peter Spam
If I search for foo, I get back a list of documents.  Any way to get a 
per-document hit count?  Thanks!


-Pete


Re: Using hl.regex.pattern to print complete lines

2010-07-21 Thread Peter Spam
Still not working ... any ideas?


-Pete

On Jul 14, 2010, at 11:56 AM, Peter Spam wrote:

 Any other thoughts, Chris?  I've been messing with this a bit, and can't seem 
 to get (?m)^.*$ to do what I want.
 
 1) I don't care how many characters it returns, I'd like entire lines all the 
 time
 2) I just want it to always return 3 lines: the line before, the actual line, 
 and the line after.
 3) This should be like grep -C1
 
 Thanks for your time!
 
 
 -Pete
 
 On Jul 9, 2010, at 12:08 AM, Peter Spam wrote:
 
 Ah, this makes sense.  I've changed my regex to (?m)^.*$, and it works 
 better, but I still get fragments before and after some returns.
 Thanks for the hint!
 
 
 -Pete
 
 On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:
 
 
 : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
 : is available that is for getting entire field contents with search terms
 : highlighted. To use it, set hl.useFastVectorHighlighter to true.
 
 He doesn't want the entire field -- his stored field values contain 
 multi-line strings (using newline characters) and he wants to make 
 fragments per line (ie: bounded by newline characters, or the start/end 
 of the entire field value)
 
 Peter: i haven't looked at the code, but i expect that the problem is that 
 the java regex engine isn't being used in a way that makes ^ and $ match 
 any line boundary -- they are probably only matching the start/end of the 
 field (and . is probably only matching non-newline characters)
 
 java regexes support embedded flags (ie: (?xyz)your regex) so you might 
 try that (i don't remember what the correct modifier flag is for the 
 multiline mode off the top of my head)
 
 -Hoss
 
 
 



Re: faceted search with job title

2010-07-21 Thread Dave Searle
You could grab your xpath rules from a db too. This is what I did for a price 
scrapping app I did a while ago. New sites were added with a set of rules using 
a web ui  You could certainly use regex of course, but IMO that's more complex 
than writing a simple xpath. Using JavaScript or some dom traversal code, you 
could quite easily create a click and point tool to generate rules very simply 
and quickly. 

On 21 Jul 2010, at 23:10, Savannah Beckett savannah_becket...@yahoo.com wrote:

 And I will have to recompile the dom or sax code each time I add a job board 
 for 
 crawling.  Regex patten is only a string which can be stored in a text file 
 or 
 db, and retrieved based on the job board.  What do you think?
 
 
 
 
 
 From: Nagelberg, Kallin knagelb...@globeandmail.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Wed, July 21, 2010 10:39:32 AM
 Subject: RE: faceted search with job title
 
 Yeah you should definitely just setup a custom parser for each site.. should 
 be 
 easy to extract title using groovy's xml parsing along with tagsoup for 
 sloppy 
 html. If you can't find the pattern for each site leading to the job title 
 how 
 can you expect solr to? Humans have the advantage here :P
 
 -Kallin Nagelberg
 
 -Original Message-
 From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
 Sent: Wednesday, July 21, 2010 12:20 PM
 To: solr-user@lucene.apache.org
 Cc: dave.sea...@magicalia.com
 Subject: Re: faceted search with job title
 
 mmm...there must be better way...each job board has different format.  If 
 there 
 are constantly new job boards being crawled, I don't think I can manually 
 look 
 for specific sequence of tags that leads to job title.  Most of them don't 
 even 
 have class or id.  There is no guarantee that the job title will be in the 
 title 
 
 tag, or header tag.  Something else can be in the title.  Should I do this in 
 a 
 class that extends IndexFilter in Nutch?
 Thanks. 
 
 
 
 
 
 From: Dave Searle dave.sea...@magicalia.com
 To: solr-user@lucene.apache.org solr-user@lucene.apache.org
 Sent: Wed, July 21, 2010 8:42:55 AM
 Subject: RE: faceted search with job title
 
 You'd probably need to do some post processing on the pages and set up rules 
 for 
 
 each website to grab that specific bit of data. You could load the html into 
 an 
 xml parser, then use xpath to grab content from a particular tag with a class 
 or 
 
 id, based on the particular website
 
 
 
 -Original Message-
 From: Savannah Beckett [mailto:savannah_becket...@yahoo.com] 
 Sent: 21 July 2010 16:38
 To: solr-user@lucene.apache.org
 Subject: faceted search with job title
 
 Hi,
   I am currently using nutch to crawl some job pages from job boards.  They 
 are 
 in my solr index now.  I want to do faceted search with the job titles.  How? 
  
 The job titles can be in any locations of the page, e.g. title, header, 
 content...   If I use indexfilter in Nutch to search the content for job 
 title, 
 there are hundred of thousands of job titles, I can't hard code them all.  Do 
 you have a better idea?  I think I need the job title in a separate field in 
 the 
 
 
 index to make it work with solr faceted search, am I right?
 Thanks.
 
 


Clustering results limit?

2010-07-21 Thread Darren Govoni
Hi,
 I am attempting to cluster a query. It kinda works, but where my
(regular) query returns 500 results the cluster only shows 1-10 hits for
each cluster (5 clusters). Never more than 10 docs and I know its not
right. What could be happening here? It should be showing dozens of
documents per cluster. 

thanks,
Darren


Re: Dismax query response field number

2010-07-21 Thread Lance Norskog
Fields or documents? It will return all of the fields that are 'stored'.

The default number of documents to return is 10. Returning all of the
documents is very slow, so you have to request that with the rows=
parameter.

On Wed, Jul 21, 2010 at 3:32 PM,  scr...@asia.com wrote:



  Hi,

 It seems that not all field are returned from query response when i use 
 DISMAX? Only first 10??

 Any idea?

 Here is my solrconfig:

  requestHandler name=dismax class=solr.SearchHandler 
    lst name=defaults
     str name=defTypedismax/str
     str name=echoParamsexplicit/str
       str name=fl*/str
     float name=tie0.01/float
     str name=qf
        text^0.5 content^1.1 title^1.5
     /str
     str name=pf
        text^0.2 content^1.1 title^1.5
     /str
     str name=bf
        recip(price,1,1000,1000)^0.3
     /str
     str name=mm
        2lt;-1 5lt;-2 6lt;90%
     /str
     int name=ps100/int
     str name=q.alt*:*/str
     !-- example highlighter config, enable per-query with hl=true --
     str name=hl.fltext features name/str
     !-- for this field, we want no fragmenting, just highlighting --
     str name=f.name.hl.fragsize0/str
     !-- instructs Solr to return the field itself if no query terms are
          found --
     str name=f.name.hl.alternateFieldname/str
     str name=f.text.hl.fragmenterregex/str !-- defined below --
    /lst
  /requestHandler






-- 
Lance Norskog
goks...@gmail.com


Re: Count hits per document?

2010-07-21 Thread Lance Norskog
You have to store the termvectors when you index, and then retrieve
them when you do a query. Highlighting does exactly this; the easy way
to do this is to ask for highlighting and search for the highlighted
words, and count them.

On Wed, Jul 21, 2010 at 4:21 PM, Peter Spam ps...@mac.com wrote:
 If I search for foo, I get back a list of documents.  Any way to get a 
 per-document hit count?  Thanks!


 -Pete




-- 
Lance Norskog
goks...@gmail.com


Re: Using hl.regex.pattern to print complete lines

2010-07-21 Thread Lance Norskog
Java regex might be different from all other regex, so writing a test
program and experimenting is the only way. Once you decide that this
expression really is what you want, and that it does not achieve what
you expect, you might have found a bug in highlighting.

Lucene/Solr highlighting has always been a difficult area, and might
not do everything right.

On Wed, Jul 21, 2010 at 4:20 PM, Peter Spam ps...@mac.com wrote:
 Still not working ... any ideas?


 -Pete

 On Jul 14, 2010, at 11:56 AM, Peter Spam wrote:

 Any other thoughts, Chris?  I've been messing with this a bit, and can't 
 seem to get (?m)^.*$ to do what I want.

 1) I don't care how many characters it returns, I'd like entire lines all 
 the time
 2) I just want it to always return 3 lines: the line before, the actual 
 line, and the line after.
 3) This should be like grep -C1

 Thanks for your time!


 -Pete

 On Jul 9, 2010, at 12:08 AM, Peter Spam wrote:

 Ah, this makes sense.  I've changed my regex to (?m)^.*$, and it works 
 better, but I still get fragments before and after some returns.
 Thanks for the hint!


 -Pete

 On Jul 8, 2010, at 6:27 PM, Chris Hostetter wrote:


 : If you can use the latest branch_3x or trunk, hl.fragListBuilder=single
 : is available that is for getting entire field contents with search terms
 : highlighted. To use it, set hl.useFastVectorHighlighter to true.

 He doesn't want the entire field -- his stored field values contain
 multi-line strings (using newline characters) and he wants to make
 fragments per line (ie: bounded by newline characters, or the start/end
 of the entire field value)

 Peter: i haven't looked at the code, but i expect that the problem is that
 the java regex engine isn't being used in a way that makes ^ and $ match
 any line boundary -- they are probably only matching the start/end of the
 field (and . is probably only matching non-newline characters)

 java regexes support embedded flags (ie: (?xyz)your regex) so you might
 try that (i don't remember what the correct modifier flag is for the
 multiline mode off the top of my head)

 -Hoss








-- 
Lance Norskog
goks...@gmail.com


how to change the default path of Solr Tomcat

2010-07-21 Thread Eben

Hi everyone,
I really need your help

this is the default address that I got from the solr:
http://172.16.17.126:8983/solr/

the question is how to change that path to be:
http://172.16.17.126:8983/search/

Please I really need your help
thanks a lot before



Re: a bug of solr distributed search

2010-07-21 Thread Li Li
I think what Siva mean is that when there are docs with the same url,
leave the doc whose score is large.
This is the right solution.
But itshows a problem of distrubted search without common idf. A doc
will get different score in different shard.
2010/7/22 MitchK mitc...@web.de:

 It already was sorted by score.

 The problem here is the following:
 Shard_A and shard_B contain doc_X and doc_X.
 If you are querying for something, doc_X could have a score of 1.0 at
 shard_A and a score of 12.0 at shard_B.

 You can never be sure which doc Solr sees first. In the bad case, Solr sees
 the doc_X firstly at shard_A and ignores it at shard_B. That means, that the
 doc maybe would occur at page 10 in pagination, although it *should* occur
 at page 1 or 2.

 Kind regards,
 - Mitch
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/a-bug-of-solr-distributed-search-tp983533p984743.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: how to change the default path of Solr Tomcat

2010-07-21 Thread kenf_nc

Your environment may be different, but this is how I did it. (Apache Tomcat
on Windows 2008)

go to \program files\apache...\Tomcat\conf\catalina\localhost
rename solr.xml to search.xml
recycle Tomcat service

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-change-the-default-path-of-Solr-Tomcat-tp985881p985937.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to change the default path of Solr Tomcat

2010-07-21 Thread Eben

firstly, I really appreciate your respond to my question Ken

I'm using Tomcat on Linux Debian
I can't find the solr.xml in  \program 
files\apache...\Tomcat\conf\catalina\localhost

there are only 2 files in localhost folder:
host-manager.xml and manager.xml

any solutions?

On 7/22/2010 10:41 AM, kenf_nc wrote:

Your environment may be different, but this is how I did it. (Apache Tomcat
on Windows 2008)

go to \program files\apache...\Tomcat\conf\catalina\localhost
rename solr.xml to search.xml
recycle Tomcat service

   





Re: how to change the default path of Solr Tomcat

2010-07-21 Thread Girish Pandit
it seems like you are using Default server (Jetty with port 8983), also 
it looks like you are trying to run it with command java -jar 
start.jar if so then under same directory there is another directory 
called webapps go in there, rename solr.war to search.war bounce 
server and you should be good to go!



Eben wrote:

firstly, I really appreciate your respond to my question Ken

I'm using Tomcat on Linux Debian
I can't find the solr.xml in  \program 
files\apache...\Tomcat\conf\catalina\localhost

there are only 2 files in localhost folder:
host-manager.xml and manager.xml

any solutions?

On 7/22/2010 10:41 AM, kenf_nc wrote:
Your environment may be different, but this is how I did it. (Apache 
Tomcat

on Windows 2008)

go to \program files\apache...\Tomcat\conf\catalina\localhost
rename solr.xml to search.xml
recycle Tomcat service

   








Re: how to change the default path of Solr Tomcat

2010-07-21 Thread K Wong
Check: /var/lib/tomcat5.5/conf/Catalina/localhost/

Are you using Tomcat on a custom port (the default tomcat port is
8080)? Check your ports ($ sudo netstat -nlp)

Maybe try searching the file system for the solr.xml file?

$ sudo find / -name solr.xml

Hope this helps.

K


On Wed, Jul 21, 2010 at 8:22 PM, Eben e...@tokobagus.com wrote:
 firstly, I really appreciate your respond to my question Ken

 I'm using Tomcat on Linux Debian
 I can't find the solr.xml in  \program
 files\apache...\Tomcat\conf\catalina\localhost
 there are only 2 files in localhost folder:
 host-manager.xml and manager.xml

 any solutions?

 On 7/22/2010 10:41 AM, kenf_nc wrote:

 Your environment may be different, but this is how I did it. (Apache
 Tomcat
 on Windows 2008)

 go to \program files\apache...\Tomcat\conf\catalina\localhost
 rename solr.xml to search.xml
 recycle Tomcat service







Re: set field with value 0 to the end

2010-07-21 Thread Grijesh.singh

why using  default=0 its optional remove that from field definition
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/set-field-with-value-0-to-the-end-tp980580p986115.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to change the default path of Solr Tomcat

2010-07-21 Thread Eben

Hi Wong,
I'm using Default server (Jetty with port 8983)
Girish solution already solve my problem
thanks for your response Wong :)

On 7/22/2010 11:57 AM, K Wong wrote:

Check: /var/lib/tomcat5.5/conf/Catalina/localhost/

Are you using Tomcat on a custom port (the default tomcat port is
8080)? Check your ports ($ sudo netstat -nlp)

Maybe try searching the file system for the solr.xml file?

$ sudo find / -name solr.xml

Hope this helps.

K


On Wed, Jul 21, 2010 at 8:22 PM, Ebene...@tokobagus.com  wrote:
   

firstly, I really appreciate your respond to my question Ken

I'm using Tomcat on Linux Debian
I can't find the solr.xml in  \program
files\apache...\Tomcat\conf\catalina\localhost
there are only 2 files in localhost folder:
host-manager.xml and manager.xml

any solutions?

On 7/22/2010 10:41 AM, kenf_nc wrote:
 

Your environment may be different, but this is how I did it. (Apache
Tomcat
on Windows 2008)

go to \program files\apache...\Tomcat\conf\catalina\localhost
rename solr.xml to search.xml
recycle Tomcat service


   



 





facet.query with facet.date

2010-07-21 Thread ruphus

Hello,
I need to create two date facets displaying counts of a particular fields
values.
With normal facets, this can be done with facet.query, but this parameter is
not available to facet.date .

Is this possbile? I'd really prefer to avoid performing two queries.

Thanks
William
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/facet-query-with-facet-date-tp986206p986206.html
Sent from the Solr - User mailing list archive at Nabble.com.