Re: a bug in commit script?

2007-09-21 Thread Yu-Hui Jin
Got it. So what's the easiest way to get this patch?  Sorry i'm new to this.


regards,
-Hui

On 9/20/07, Bill Au [EMAIL PROTECTED] wrote:

 That would be my bad.  I noticed the problem while fixing SOLR-282
 which is not related.  I fixed both problems in stead of opening a
 different bug for the response format issue.  I will update the change
 log.

 Bill

 On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:
  :
  : It seems there's a small bug in the bin/commit script for solr 1.2.
 
  A fix was already commited to the trunk for this as part of SOLR-282
 (but
  there doesn't seem to be a note about it in the changelog)
 
 
  -Hoss
 
 




-- 
Regards,

-Hui


Re: rsync start and enable for multiple solr instances within one tomcat

2007-09-21 Thread Yu-Hui Jin
Bill,

Thanks for the explanation. That helps my understanding on rsync and the
replication in general.


regards,

-Hui

On 9/20/07, Bill Au [EMAIL PROTECTED] wrote:

 The solr that you are referring to in your third question in the
 name of the rsync area which is map to the solr data directory.  This
 is defined in the rsyncd configuration file which is generated on the
 fly as Chris has pointed out.  Take a look at rsyncd-start.

 snappuller rsync the index from this 'solr' area (the command you have
 quoted) on the master.  The name of the rsync area had nothing to do
 with the name of the index.  We set up this area for rsyncd so that
 one is restricted within this area when trying to access files on the
 master going through rsyncd.

 The name of the rsyncd area does not have to be 'solr'.  It can be
 anything as long as the value in rsyncd-start matches the value in
 snappuller.

 Bill

 On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:
 
  : So just to help my knowledge, where does this virtual setting of this
 solr
  : string happen? Should it be in some config file or sth?
 
  rsyncd-start creates an rsync config file on the fly ... much of it is
  constants, but it fills in the rsync port using a variable from your
  config.
 
 
 
 
  -Hoss
 
 




-- 
Regards,

-Hui


RE: clarification needed for the Ranking score

2007-09-21 Thread Dilip.TS
Hi,
  Can we use the ranking as follows when searching the term'Java' present in
different fields as per the relevance scenarios mentioned in the previous
mail.

   q= courseTitle:Java^1 AND courseTag:Java^1000 AND
courseDescription:Java^100; courseTitle asc, courseDescription asc,
courseTag asc;


-Original Message-
From: Dilip.TS [mailto:[EMAIL PROTECTED]
Sent: Friday, September 21, 2007 10:40 AM
To: SOLR
Subject: clarification needed for the Ranking score


Hi,
I need a clarification regarding the SOLR Ranking.


consider the scenario  for searching for courses based on following
relevance:

a.  Courses with the term in the courseTitle, courseTag and in the
courseDescription would appear first
b.  Courses with the term in the courseTitle and in the courseDescription
would appear next
c.  Courses with the term only in the courseTitle appear next.
d.  Courses with the term only in the courseDescription appear next.
e.  Courses with the term only in the courseTag appear last.

 Let me know if my understanding is correct with the following solution

 + (basequery) courseTitle^1 courseTag^1000 courseDescription^100;
courseTitle asc,  courseDescription asc,courseTag asc;

How do we set the relevancy while performing a search? is there any
configuration to set it in the solrconfig files?
Also how do we set the Term Proximity?
Could you clarify?

Thanks in advance


Regards,
Dilip TS



RE: Strange behavior when searching with accents

2007-09-21 Thread Thorsten Scherler
On Thu, 2007-09-20 at 11:13 -0700, Lance Norskog wrote:
 English and French are messy, so heuristic methods are the only possible.
 Spanish is rigorously clean, and stemming should be done from the declension
 rules and irregular conjugation tables. This involves large (fast) tables in
 ram rather than small (slow) string-shuffling.
 

Interesting do you a link for some documentation how to implement this?

salu2

 Lance Norskog
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of
 Bertrand Delacretaz
 Sent: Thursday, September 20, 2007 8:11 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Strange behavior when searching with accents
 
 On 9/20/07, Thorsten Scherler [EMAIL PROTECTED]
 wrote:
  ...Betrand, does the French Snowball work fine?...
 
 I've seen some weirdnesses, like tennis and tenir (means to hold) both
 stemmed to ten, but in all of our (simple) tests it was ok.
 
 The application where we're using it does not require high precision though,
 so it looked good enough and we didn't do create very extensive tests for
 it.
 
 -Bertrand
 
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Re: Term extraction

2007-09-21 Thread Pieter Berkel
Thanks for the response guys:

Grant: I had a brief look at LingPipe, it looks quite interesting but I'm
concerned that the licensing may prevent me from using it in my project.
Michael: I have used the Yahoo API in the past but due to it's generic
nature, I wasn't entirely happy with the results in my test cases.
Yonik: This is the approach I had in mind, will it still work if I put the
SynonymFilter after the word-delimiter filter in the schema config? Ideally
I want to strip out the underscore char before it gets indexed, is that
possible by using a PatternReplaceFilterFactory after the SynonymFilter?

Cheers,
Piete



On 21/09/2007, Yonik Seeley [EMAIL PROTECTED] wrote:

 On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote:
  However, I'd like to be able to
  analyze documents more intelligently to recognize phrase keywords such
 as
  open source, Microsoft Office, Bill Gates rather than splitting
 each
  word into separate tokens (the field is never used in search queries so
  matching is not an issue).  I've been looking at SynonymFilterFactory as
 a
  possible solution to this problem but haven't been able to work out the
  specifics of how to configure it for phrase mappings.

 SynonymFilter works out-of-the-box with multi-token synonyms...

 Microsoft Office = microsoft_office
 Bill Gates, William Gates = bill_gates

 Just don't use a word-delimiter filter if you use underscore to join
 words.

 -Yonik



Weird bug in query

2007-09-21 Thread Alexandru Badiu

Hello,

I have a problem I'm not sure how to debug. I am running Solr 1.2.1  
under Jetty. I have the following two queries:
- q:articol_tag:pilonul ii AND articol_tag:facultative which  
returns x rows
- q:articol_tag:facultative AND articol_tag:pilonul ii which  
doesn't return any rows


I'm really stumped by this issue. Is this a Solr bug? I can provide  
offlist the url of the Solr installation if someone wants to see this  
behaviour.


Thanks,
Alexandru Badiu


Scripts not working on cron - always asking for password

2007-09-21 Thread Daniel Alheiros
Hi

I'm having problems trying to setup my schedulled tasks. Sorry if it's
something Linux related, as I'm not a Linux expert...

I created a scripts.conf file (for my slave server) containing:
user=solr
solr_hostname=10.133.132.159
solr_port=8080
rsyncd_port=20280
data_dir=/var/solr2-v1.2.0/home/data
webapp_name=solr
master_host=10.133.132.159
master_data_dir=/var/solr3-v1.2.0/home/data
master_status_dir=/var/solr3-v1.2.0/home/logs

I have two solr instances running on the same machine, each one has its own
data dir.

My cron configuration is:
# master
2 5 * * * /var/solr3-v1.2.0/home/bin/snapcleaner -D 1
2 4 * * * /var/solr3-v1.2.0/home/bin/optimize
# slave
*/5 * * * * 
/var/solr2-v1.2.0/home/bin/snappuller;/var/solr2-v1.2.0/home/bin/snapinstall
er
*/9 * * * * /var/solr3-v1.2.0/home/bin/snapcleaner -N 2

It's pretty weird for me because it always asks me for a password when I run
any of them manually (and after passwords are provided they work properly).
Where should I add this password in order to avoid it? I couldn't find it in
the documentation. When I try to run manually the snappuller it asks for
password 5 times when it has a new snapshot to get and 2 times when it
doesn't have a new snapshot.

Here goes the error log in the snappuller.log file:
2007/09/21 13:00:01 started by solr
2007/09/21 13:00:01 command: /var/solr2-v1.2.0/home/bin/snappuller
2007/09/21 13:00:01 failed to ssh to master 10.133.132.159

My OS is a RHEL.

Regards,
Daniel

On 20/9/07 07:14, Yu-Hui Jin [EMAIL PROTECTED] wrote:

 Thanks, it works now.
 
 
 regards,
 -Hui
 
 
 On 9/19/07, Pieter Berkel [EMAIL PROTECTED]  wrote:
 
 If you don't need to pass any command line arguments to snapshooter,
 remove
 (or comment out) this line from solrconfig.xml:
 
 arr name=args strarg1/str strarg2/str /arr
 
 By the same token, if you're not setting environment variables either,
 remove the following line as well:
 
 arr name=env strMYVAR=val1/str /arr
 
 Once you alter / remove those two lines, snapshooter should function as
 expected.
 
 cheers,
 Piete
 
 
 
 On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote:
 
 Hi, Pieter,
 
 Thanks!  Now the exception is gone. However, There's no snapshot file
 created in the data directory. Strangely, the snapshooter.log seems to
 complete successfully.  Any idea what else I'm missing?
 
 $ cat var/SolrHome/solr/logs/snapshooter.log
 2007/09/19 20:16:17 started by solruser
 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1
 arg2
 2007/09/19 20:16:17 taking snapshot
 var/SolrHome/solr/data/snapshot.20070919201617
 2007/09/19 20:16:17 ended (elapsed time: 0 sec)
 
 Thanks,
 
 -Hui
 
 
 
 
 On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote:
 
 See this recent thread for some helpful info:
 
 
 http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.
 html#a12167792
 
 
 You'll probably want to configure your exe with an absolute path
 rather
 than
 the dir:
 
   str name=exe/var/SolrHome/solr/bin/snapshooter/str
   str name=dir./str
 
 In order to get the snapshooter working correctly.
 
 cheers,
 Piete
 
 
 
 On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote:
 
 Hi, there,
 
 I used an absolute path for the dir param in the solrconfig.xml as
 below:
 
 listener event=postCommit class=solr.RunExecutableListener
   str name=exesnapshooter/str
   str name=dir/var/SolrHome/solr/bin/str
   bool name=waittrue/bool
   arr name=args strarg1/str strarg2/str /arr
   arr name=env strMYVAR=val1/str /arr
 /listener
 
 However, I got snapshooter: not found  exception thrown in
 catalina.out.
 I don't see why this doesn't work. Anything I'm missing?
 
 
 Many thanks,
 
 -Hui
 
 
 
 
 
 --
 Regards,
 
 -Hui
 
 
 
 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal 
views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on 
it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.



Re: Scripts not working on cron - always asking for password

2007-09-21 Thread Thorsten Scherler
On Fri, 2007-09-21 at 13:02 +0100, Daniel Alheiros wrote:
 Hi
 
 I'm having problems trying to setup my schedulled tasks. Sorry if it's
 something Linux related, as I'm not a Linux expert...
 
 I created a scripts.conf file (for my slave server) containing:
 user=solr
 solr_hostname=10.133.132.159
 solr_port=8080
 rsyncd_port=20280
 data_dir=/var/solr2-v1.2.0/home/data
 webapp_name=solr
 master_host=10.133.132.159
 master_data_dir=/var/solr3-v1.2.0/home/data
 master_status_dir=/var/solr3-v1.2.0/home/logs
 
 I have two solr instances running on the same machine, each one has its own
 data dir.
 
 My cron configuration is:
 # master
 2 5 * * * /var/solr3-v1.2.0/home/bin/snapcleaner -D 1
 2 4 * * * /var/solr3-v1.2.0/home/bin/optimize
 # slave
 */5 * * * * 
 /var/solr2-v1.2.0/home/bin/snappuller;/var/solr2-v1.2.0/home/bin/snapinstall
 er
 */9 * * * * /var/solr3-v1.2.0/home/bin/snapcleaner -N 2
 
 It's pretty weird for me because it always asks me for a password when I run
 any of them manually (and after passwords are provided they work properly).

Well you just gave the answer. Make sure the user that is executing the
cron has sufficient rights. The cron job will not be able to have a
dialog.

The prude force method is:

sudo chown USER:USER /var/solr3-v1.2.0

That should do the job.

salu2

 Where should I add this password in order to avoid it? I couldn't find it in
 the documentation. When I try to run manually the snappuller it asks for
 password 5 times when it has a new snapshot to get and 2 times when it
 doesn't have a new snapshot.
 
 Here goes the error log in the snappuller.log file:
 2007/09/21 13:00:01 started by solr
 2007/09/21 13:00:01 command: /var/solr2-v1.2.0/home/bin/snappuller
 2007/09/21 13:00:01 failed to ssh to master 10.133.132.159
 
 My OS is a RHEL.
 
 Regards,
 Daniel
 
 On 20/9/07 07:14, Yu-Hui Jin [EMAIL PROTECTED] wrote:
 
  Thanks, it works now.
  
  
  regards,
  -Hui
  
  
  On 9/19/07, Pieter Berkel [EMAIL PROTECTED]  wrote:
  
  If you don't need to pass any command line arguments to snapshooter,
  remove
  (or comment out) this line from solrconfig.xml:
  
  arr name=args strarg1/str strarg2/str /arr
  
  By the same token, if you're not setting environment variables either,
  remove the following line as well:
  
  arr name=env strMYVAR=val1/str /arr
  
  Once you alter / remove those two lines, snapshooter should function as
  expected.
  
  cheers,
  Piete
  
  
  
  On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote:
  
  Hi, Pieter,
  
  Thanks!  Now the exception is gone. However, There's no snapshot file
  created in the data directory. Strangely, the snapshooter.log seems to
  complete successfully.  Any idea what else I'm missing?
  
  $ cat var/SolrHome/solr/logs/snapshooter.log
  2007/09/19 20:16:17 started by solruser
  2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1
  arg2
  2007/09/19 20:16:17 taking snapshot
  var/SolrHome/solr/data/snapshot.20070919201617
  2007/09/19 20:16:17 ended (elapsed time: 0 sec)
  
  Thanks,
  
  -Hui
  
  
  
  
  On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote:
  
  See this recent thread for some helpful info:
  
  
  http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879.
  html#a12167792
  
  
  You'll probably want to configure your exe with an absolute path
  rather
  than
  the dir:
  
str name=exe/var/SolrHome/solr/bin/snapshooter/str
str name=dir./str
  
  In order to get the snapshooter working correctly.
  
  cheers,
  Piete
  
  
  
  On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote:
  
  Hi, there,
  
  I used an absolute path for the dir param in the solrconfig.xml as
  below:
  
  listener event=postCommit class=solr.RunExecutableListener
str name=exesnapshooter/str
str name=dir/var/SolrHome/solr/bin/str
bool name=waittrue/bool
arr name=args strarg1/str strarg2/str /arr
arr name=env strMYVAR=val1/str /arr
  /listener
  
  However, I got snapshooter: not found  exception thrown in
  catalina.out.
  I don't see why this doesn't work. Anything I'm missing?
  
  
  Many thanks,
  
  -Hui
  
  
  
  
  
  --
  Regards,
  
  -Hui
  
  
  
  
 
 
 http://www.bbc.co.uk/
 This e-mail (and any attachments) is confidential and may contain personal 
 views which are not the views of the BBC unless specifically stated.
 If you have received it in error, please delete it from your system.
 Do not use, copy or disclose the information in any way nor act in reliance 
 on it and notify the sender immediately.
 Please note that the BBC monitors e-mails sent or received.
 Further communication will signify your consent to this.
   
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



RE: Synonyms expressions sens

2007-09-21 Thread Laurent Gilles
Thanks for the advice Grant,

I've tried putting '_' into synonyms, but step by step I've realised that it
what always more intrusive into Solr source code...
But I've found another solution, that I want to expose here in order to have
external advice and perhaps pointing out some bugs or side effect I've not
seen.
I do not touch the source code but I only change my synonym.txt and the way
I manage indexes on schema.xml.

Giving a synonyms list like :

capital punishement, death sentence, death penalty
10, dix, X
17, Dix sept, XVII
18, dix huit, XVIII
Rock, jazz, modern music = modern music
Coluche, colucci = colucci
Coluche, coluci = coluci
Coluche, colucchi = colucchi
coluche, michel colucci = michel colucci

I was faced with two major problems with index time synonym expansion (@
expand=true:
- Possibility of synonyms mix (10, dix, X with 17, Dix sept, XVII or
18, dix huit, XVIII)
- Possibility of query that could match some unexpected result due to
language ambiguity, and in a more generic way, due to the fact that
expansion put new token in document that will be matched at wuery time (ex:
query capitale will match a document with  death sentence ..)

So here what I've done:

A single line in synonym file could by seen as a family of synonyms, or
switcheable term and expressions.
So instead of injecting (into document at index time) for a single match,
all the possibilities founded in the synonyms list, I've changed the list in
order to give an ID for each synonyms families and the index time synonyms
filter is no more configured with expand=true but with expand=false in order
to replace a matched term with the ID of his family.

Then at query time, I reintroduced the synonyms filter with expand=false in
order to replace in the query the matched synonyms with their corresponding
ID

Her my synonyms list used with expand=false

SynFamily1, capital punishement, death sentence, death penalty
SynFamily2, 10, dix, x
SynFamily89, 17, xvii, dix sept
SynFamily112, 18, xviii, dix huit
rock, modern music = HierFamily2017
jazz, modern music = HierFamily2014
coluche, collucci = HierFamily1537
coluche, colluche = HierFamily1538
coluche, colucchi = HierFamily1541
coluche, colucci = HierFamily1542
coluche, coluchi = HierFamily1543
coluche, coluci = HierFamily1544

It seems to work fine since now a query capital will not match a document
that originally contains death sentence since the synonyms expansion is
limited to the one-token ID SynFamily1, and in order to match such a
document, a query like capital punishement must been made.

The synonyms mixing also seems to have disappeared (document containing dix
huit will not match for a query 10)

My question is, do I've missed something ? The solution seems to much simple
and since I'm working on fulltext search engine I've always faced side
effects problems after logic modification, so I'm a little sceptic... :) 

Voila !

Thanks for your time

Laurent



-Message d'origine-
De : Grant Ingersoll [mailto:[EMAIL PROTECTED] 
Envoyé : mardi 11 septembre 2007 14:53
À : solr-user@lucene.apache.org
Objet : Re: Synonyms expressions sens

Inline...
On Sep 11, 2007, at 7:27 AM, Laurent Gilles wrote:

 Hi,



 I'm actually facing a relevancy issue with multiword synonyms.



 Let's expose it by a test case:



 Giving the following synonyms definitions:

 

 capital punishement, death sentence, death penalty

 



 And a [EMAIL PROTECTED] defined at index time, so the  
 document:

 

 The prisoner escaped just before the death sentence had been set.

 



 Will be indexed like

 

 The prisoner escaped just before the (death sentence | death penalty |
 capital punishment) had been set.

 



 Now, if a user asks for capital, the system will match  
 capital (that
 could mean 'Paris, capital of France') into the index time synonyms  
 expanded
 document, which doesn't have sense.

 I was expecting that in order to match, I'll have to give the entire
 expression capital punishment to match a document that contains   
 death
 sentence and not only a part of the expression.



 It seems to be the normal Solr behaviour, but what I'm actually  
 facing is a
 relevance problem with the given results, since a given word  
 contained in an
 expression could have a completely different meaning compared with  
 the same
 isolated word.







 Is their a trick or a way to match synonym complete expression and  
 not the
 words the expands have added into documents ?


Ah, the ambiguity of language :-)

I can think of a couple of different suggestions to try:
1. Index your phrase 

Re: Problem getting the FacetCount

2007-09-21 Thread Yonik Seeley
On 9/21/07, Amitha Talasila [EMAIL PROTECTED] wrote:
 But when we make a facet query like,
 http://localhost:8983/solr/select?q=ipodrows=0facet=truefacet.limit=-1fac
 et.query=weight:{0m TO 100m}, the facet count is coming as 0.We are indexing
 it as a string field because if the user searches for 12m he needs to see
 that result. Can anyone suggest a better way of querying this?

In a string field, 12m is greater than 100m, so won't be in the range.
You need to index that field as a numeric type where range queries
work: use type sint or sfloat.

As for the m, you should have a frontend that allows input in the
form desire and converts it to a valid query to solr.

-Yonik


Re: Term extraction

2007-09-21 Thread Yonik Seeley
On 9/21/07, Pieter Berkel [EMAIL PROTECTED] wrote:
 Yonik: This is the approach I had in mind, will it still work if I put the
 SynonymFilter after the word-delimiter filter in the schema config?

SynonymFilter doesn't currently have the capability to handle multiple
tokens at the same position in the input.  You could simply remove the
WordDelimiterFilter unless you need it.

 Ideally
 I want to strip out the underscore char before it gets indexed

Why's that?

You could just define your synonyms like that initially:
Bill Gates, William Gates = billgates

-Yonik


Re: Scripts not working on cron - always asking for password

2007-09-21 Thread Daniel Alheiros
Hi Thorsten,

Thanks for your answer, but I've done it before and it still didn't work. I
was running everything before as root and it didn't work either.

Now I've created a solr user, part of the root group, changed the ownership
of all solr stuff, and changed file permissions to 775 (so any user on the
root group should be able to do anything on any files)

Any other suggestion?

Regards,
Daniel

On 21/9/07 13:12, Thorsten Scherler
[EMAIL PROTECTED] wrote:

 On Fri, 2007-09-21 at 13:02 +0100, Daniel Alheiros wrote:
 Hi
 
 I'm having problems trying to setup my schedulled tasks. Sorry if it's
 something Linux related, as I'm not a Linux expert...
 
 I created a scripts.conf file (for my slave server) containing:
 user=solr
 solr_hostname=10.133.132.159
 solr_port=8080
 rsyncd_port=20280
 data_dir=/var/solr2-v1.2.0/home/data
 webapp_name=solr
 master_host=10.133.132.159
 master_data_dir=/var/solr3-v1.2.0/home/data
 master_status_dir=/var/solr3-v1.2.0/home/logs
 
 I have two solr instances running on the same machine, each one has its own
 data dir.
 
 My cron configuration is:
 # master
 2 5 * * * /var/solr3-v1.2.0/home/bin/snapcleaner -D 1
 2 4 * * * /var/solr3-v1.2.0/home/bin/optimize
 # slave
 */5 * * * * 
 /var/solr2-v1.2.0/home/bin/snappuller;/var/solr2-v1.2.0/home/bin/snapinstall
 er
 */9 * * * * /var/solr3-v1.2.0/home/bin/snapcleaner -N 2
 
 It's pretty weird for me because it always asks me for a password when I run
 any of them manually (and after passwords are provided they work properly).
 
 Well you just gave the answer. Make sure the user that is executing the
 cron has sufficient rights. The cron job will not be able to have a
 dialog.
 
 The prude force method is:
 
 sudo chown USER:USER /var/solr3-v1.2.0
 
 That should do the job.
 
 salu2
 
 Where should I add this password in order to avoid it? I couldn't find it in
 the documentation. When I try to run manually the snappuller it asks for
 password 5 times when it has a new snapshot to get and 2 times when it
 doesn't have a new snapshot.
 
 Here goes the error log in the snappuller.log file:
 2007/09/21 13:00:01 started by solr
 2007/09/21 13:00:01 command: /var/solr2-v1.2.0/home/bin/snappuller
 2007/09/21 13:00:01 failed to ssh to master 10.133.132.159
 
 My OS is a RHEL.
 
 Regards,
 Daniel
 
 On 20/9/07 07:14, Yu-Hui Jin [EMAIL PROTECTED] wrote:
 
 Thanks, it works now.
 
 
 regards,
 -Hui
 
 
 On 9/19/07, Pieter Berkel [EMAIL PROTECTED]  wrote:
 
 If you don't need to pass any command line arguments to snapshooter,
 remove
 (or comment out) this line from solrconfig.xml:
 
 arr name=args strarg1/str strarg2/str /arr
 
 By the same token, if you're not setting environment variables either,
 remove the following line as well:
 
 arr name=env strMYVAR=val1/str /arr
 
 Once you alter / remove those two lines, snapshooter should function as
 expected.
 
 cheers,
 Piete
 
 
 
 On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote:
 
 Hi, Pieter,
 
 Thanks!  Now the exception is gone. However, There's no snapshot file
 created in the data directory. Strangely, the snapshooter.log seems to
 complete successfully.  Any idea what else I'm missing?
 
 $ cat var/SolrHome/solr/logs/snapshooter.log
 2007/09/19 20:16:17 started by solruser
 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1
 arg2
 2007/09/19 20:16:17 taking snapshot
 var/SolrHome/solr/data/snapshot.20070919201617
 2007/09/19 20:16:17 ended (elapsed time: 0 sec)
 
 Thanks,
 
 -Hui
 
 
 
 
 On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote:
 
 See this recent thread for some helpful info:
 
 
 http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf426487
 9.
 html#a12167792
 
 
 You'll probably want to configure your exe with an absolute path
 rather
 than
 the dir:
 
   str name=exe/var/SolrHome/solr/bin/snapshooter/str
   str name=dir./str
 
 In order to get the snapshooter working correctly.
 
 cheers,
 Piete
 
 
 
 On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote:
 
 Hi, there,
 
 I used an absolute path for the dir param in the solrconfig.xml as
 below:
 
 listener event=postCommit class=solr.RunExecutableListener
   str name=exesnapshooter/str
   str name=dir/var/SolrHome/solr/bin/str
   bool name=waittrue/bool
   arr name=args strarg1/str strarg2/str /arr
   arr name=env strMYVAR=val1/str /arr
 /listener
 
 However, I got snapshooter: not found  exception thrown in
 catalina.out.
 I don't see why this doesn't work. Anything I'm missing?
 
 
 Many thanks,
 
 -Hui
 
 
 
 
 
 --
 Regards,
 
 -Hui
 
 
 
 
 
 
 http://www.bbc.co.uk/
 This e-mail (and any attachments) is confidential and may contain personal
 views which are not the views of the BBC unless specifically stated.
 If you have received it in error, please delete it from your system.
 Do not use, copy or disclose the information in any way nor act in reliance
 on it and notify the sender immediately.
 Please note that the BBC monitors e-mails sent or 

Re: Faceting question

2007-09-21 Thread Cric Digs
lol I agree with you Hoss - sorry for that

Here's the thing:
I need additional information from the index - such as the id related to a
facet field. For example, say I am faceting on author names for a book
store, I would also like to get the author id along with the author name to
show a link (next to the author name) to say the author's bio page. The
author id is stored in the index but how do i get that back with the facet
results?


On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:


 : I'm using faceting to get some results. I also want to get another field
 -
 : the id field along with it. Is it possible to get that somehow in the
 facet
 : results?

 you're going to have to elaborate on what it is you are trying to do ... i
 genuinely have no idea what you are asking (and i think i'm usually pretty
 good at reading between the lines and guessing what people mean).



 -Hoss




olap with solr (math operations on facets)

2007-09-21 Thread Rafael Rossini
Hi all,

I´m considering on doing something like a light-weight olap server with
lucene/solr. To achieve that I´d have to do some math operantions on facets.
Is that possible?
For example, my documents would be a purchase row, like (id,
value, id_department, id_store, id_region ...). If I did a facet query for
id_deparment the server would return me something like: deparment1: 500,
deparment2: 400... Is it possible to get the sum, or avg or any math
operation on the field value? Than the server would return me: deparment1:
100 (the sum of each value)    Is it clear?


[]s
Rossini


RE: Faceting question

2007-09-21 Thread Binkley, Peter
Faceting works on the terms in an index, so you can't get information
beyond those terms without doing extra work. You could build an extra
index used only for faceting that concatenates the information you need
from other fields, and then parse it out in your application: e.g.
Tolkien, J.R.R.|35421.

If you're doing this so that you can do precise follow-on searches,
though (where a user clicks on a link in a list of facets), you might
want to think about whether the author name gives you everything you
need. You may have two authors with the same name, who would show up as
a single facet if you don't tack the id on; but even if you do, how is
the user going to distinguish them? They'll just see two links, maybe
with opaque id numbers. So maybe the bare author name is good enough. (I
had a similar situation and found that getting away from a
relational-database approach and going with what Solr does best was the
best solution).

Hope that helps,

Peter 

-Original Message-
From: Cric Digs [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 7:36 AM
To: solr-user@lucene.apache.org
Subject: Re: Faceting question

lol I agree with you Hoss - sorry for that

Here's the thing:
I need additional information from the index - such as the id related to
a facet field. For example, say I am faceting on author names for a book
store, I would also like to get the author id along with the author name
to show a link (next to the author name) to say the author's bio page.
The author id is stored in the index but how do i get that back with the
facet results?


On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:


 : I'm using faceting to get some results. I also want to get another 
 field
 -
 : the id field along with it. Is it possible to get that somehow in 
 the facet
 : results?

 you're going to have to elaborate on what it is you are trying to do 
 ... i genuinely have no idea what you are asking (and i think i'm 
 usually pretty good at reading between the lines and guessing what
people mean).



 -Hoss




RE: Faceting question

2007-09-21 Thread cricdigs


Thanks Peter. That will be my work-around, but I was hoping to find a more
elegant solution ;)
I am not that knowledgeable about the solr architecture but if there is a
way it can be done in a more elegant way I might be willing to put the extra
time to code it..


Binkley, Peter wrote:
 
 Faceting works on the terms in an index, so you can't get information
 beyond those terms without doing extra work. You could build an extra
 index used only for faceting that concatenates the information you need
 from other fields, and then parse it out in your application: e.g.
 Tolkien, J.R.R.|35421.
 
 If you're doing this so that you can do precise follow-on searches,
 though (where a user clicks on a link in a list of facets), you might
 want to think about whether the author name gives you everything you
 need. You may have two authors with the same name, who would show up as
 a single facet if you don't tack the id on; but even if you do, how is
 the user going to distinguish them? They'll just see two links, maybe
 with opaque id numbers. So maybe the bare author name is good enough. (I
 had a similar situation and found that getting away from a
 relational-database approach and going with what Solr does best was the
 best solution).
 
 Hope that helps,
 
 Peter 
 
 -Original Message-
 From: Cric Digs [mailto:[EMAIL PROTECTED] 
 Sent: Friday, September 21, 2007 7:36 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Faceting question
 
 lol I agree with you Hoss - sorry for that
 
 Here's the thing:
 I need additional information from the index - such as the id related to
 a facet field. For example, say I am faceting on author names for a book
 store, I would also like to get the author id along with the author name
 to show a link (next to the author name) to say the author's bio page.
 The author id is stored in the index but how do i get that back with the
 facet results?
 
 
 On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:


 : I'm using faceting to get some results. I also want to get another 
 field
 -
 : the id field along with it. Is it possible to get that somehow in 
 the facet
 : results?

 you're going to have to elaborate on what it is you are trying to do 
 ... i genuinely have no idea what you are asking (and i think i'm 
 usually pretty good at reading between the lines and guessing what
 people mean).



 -Hoss


 
 

-- 
View this message in context: 
http://www.nabble.com/Faceting-question-tf4489342.html#a12824623
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Triggering snapshooter through web admin interface

2007-09-21 Thread Bill Au
Lance, do start a new thread if you run into this problem again and please
include as much info as possible.
Once a snapshot has been taken, the files it contains should not change so I
am not sure why tar was telling you
a file had changed while it was being copied.

Bill

On 9/19/07, Chris Hostetter [EMAIL PROTECTED] wrote:


 lance: since the topic you are describing is not directly related to
 triggering a snapshot from the web interface can you please start a new
 thread with a unique subejct describing in more details exactly what it
 was you were doing and the problem you encountered?

 this will make it easier for your problem to get visibility (some people
 don't read every thread, and archive searching is frequently done by
 thread, so people looking for similar problems may not realize this new
 thread is burried inside an old one)

 -Hoss

 : Date: Wed, 19 Sep 2007 11:33:30 -0700
 : From: Lance Norskog [EMAIL PROTECTED]
 : Reply-To: solr-user@lucene.apache.org
 : To: solr-user@lucene.apache.org
 : Subject: RE: Triggering snapshooter through web admin interface
 :
 : Is there a ticket for this yet? I have a bug report and request: I just
 did
 : a snapshot while indexing 700 records/sec. and got an inconsistency. I
 was
 : tarring off the snapshot and tar reported that a file changed while it
 was
 : being copied. The error rolled off my screen, so I cannot report the
 file
 : name or extension.
 :
 : If a solr command to do a snapshot is implemented, please make sure that
 it
 : is 100% consistent.
 :
 : Thanks,
 :
 : Lance Norskog




Re: rsync start and enable for multiple solr instances within one tomcat

2007-09-21 Thread Bill Au
You are welcome.

Bill

On 9/21/07, Yu-Hui Jin [EMAIL PROTECTED] wrote:

 Bill,

 Thanks for the explanation. That helps my understanding on rsync and the
 replication in general.


 regards,

 -Hui

 On 9/20/07, Bill Au [EMAIL PROTECTED] wrote:
 
  The solr that you are referring to in your third question in the
  name of the rsync area which is map to the solr data directory.  This
  is defined in the rsyncd configuration file which is generated on the
  fly as Chris has pointed out.  Take a look at rsyncd-start.
 
  snappuller rsync the index from this 'solr' area (the command you have
  quoted) on the master.  The name of the rsync area had nothing to do
  with the name of the index.  We set up this area for rsyncd so that
  one is restricted within this area when trying to access files on the
  master going through rsyncd.
 
  The name of the rsyncd area does not have to be 'solr'.  It can be
  anything as long as the value in rsyncd-start matches the value in
  snappuller.
 
  Bill
 
  On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:
  
   : So just to help my knowledge, where does this virtual setting of
 this
  solr
   : string happen? Should it be in some config file or sth?
  
   rsyncd-start creates an rsync config file on the fly ... much of it is
   constants, but it fills in the rsync port using a variable from your
   config.
  
  
  
  
   -Hoss
  
  
 



 --
 Regards,

 -Hui



Re: a bug in commit script?

2007-09-21 Thread Bill Au
You should be able to run the latest version of the scripts against Solr 1.2.
Just grab a copy for subversion:

http://svn.apache.org/viewvc/lucene/solr/trunk/src/scripts/

Bill

On 9/21/07, Yu-Hui Jin [EMAIL PROTECTED] wrote:

 Got it. So what's the easiest way to get this patch?  Sorry i'm new to
 this.


 regards,
 -Hui

 On 9/20/07, Bill Au [EMAIL PROTECTED] wrote:
 
  That would be my bad.  I noticed the problem while fixing SOLR-282
  which is not related.  I fixed both problems in stead of opening a
  different bug for the response format issue.  I will update the change
  log.
 
  Bill
 
  On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote:
   :
   : It seems there's a small bug in the bin/commit script for solr 1.2.
  
   A fix was already commited to the trunk for this as part of SOLR-282
  (but
   there doesn't seem to be a note about it in the changelog)
  
  
   -Hoss
  
  
 



 --
 Regards,

 -Hui



Re: clarification needed for the Ranking score

2007-09-21 Thread Walter Underwood
This would probably work, but the approach has a subtle flaw.
If a query has one word that matches a lot of titles, but a
phrase that matches a description, the best result will be shown
far too low, after all the titles.

A better approach is to weight the titles a bit higher than the
description, probalby 2X to 10X higher. At Infoseek, we weighted
the title 8X higher. At Inktomi, with a completely different search
engine, we used 7.5X. So I'd start with this:

  courseTitle^8 courseTag^4 courseDescription

Also, I see that you are displaying the titles alphabetically,
so the weights are meaningless. Maybe you should be using LIKE
in MySQL if you want to do set matching and sorting.

wunder

On 9/20/07 10:10 PM, Dilip.TS [EMAIL PROTECTED] wrote:

 Hi,
 I need a clarification regarding the SOLR Ranking.
 
 
 consider the scenario  for searching for courses based on following
 relevance:
 
 a. Courses with the term in the courseTitle, courseTag and in the
 courseDescription would appear first
 b. Courses with the term in the courseTitle and in the courseDescription
 would appear next
 c. Courses with the term only in the courseTitle appear next.
 d. Courses with the term only in the courseDescription appear next.
 e. Courses with the term only in the courseTag appear last.
 
  Let me know if my understanding is correct with the following solution
 
  + (basequery) courseTitle^1 courseTag^1000 courseDescription^100;
 courseTitle asc,  courseDescription asc,courseTag asc;
 
 How do we set the relevancy while performing a search? is there any
 configuration to set it in the solrconfig files?
 Also how do we set the Term Proximity?
 Could you clarify?
 
 Thanks in advance
 
 
 Regards,
 Dilip TS
 



Re: Weird bug in query

2007-09-21 Thread Mike Klaas


On 21-Sep-07, at 1:44 AM, Alexandru Badiu wrote:


Hello,

I have a problem I'm not sure how to debug. I am running Solr 1.2.1  
under Jetty. I have the following two queries:
- q:articol_tag:pilonul ii AND articol_tag:facultative which  
returns x rows
- q:articol_tag:facultative AND articol_tag:pilonul ii which  
doesn't return any rows


These are differet queries.  You query consists of three clauses (the  
'ii' is not part of articol_tag).  This is what you are querying, in  
(the much clearer) REQUIRED/OPTIONAL syntax (+ == clause is required):


articol_tag:pilonul ii AND articol_tag:facultative
==
+default field:ii +articol_tag:facultative articol_tag:pilonul

articol_tag:facultative AND articol_tag:pilonul ii
==
+articol_tag:facultative +articol_tag:pilonul default field:ii

try:
articol_tag:facultative AND articol_tag:pilonul ii

-Mike



RE: Strange behavior when searching with accents

2007-09-21 Thread Lance Norskog
I have no links but it can all be done with synonym tables.

I'm sure somewhere on the net there are full lists of the Spanish regular
and irregular verbs (verbs which do not follow the conjugation rules). Then
using basic text processing you could generate all of the declensions for
the most common regular verbs. 

And then a custom stemmer would do the basics like adjective-mente -
adjective.

Lance

-Original Message-
From: Thorsten Scherler [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 12:08 AM
To: solr-user@lucene.apache.org
Subject: RE: Strange behavior when searching with accents

On Thu, 2007-09-20 at 11:13 -0700, Lance Norskog wrote:
 English and French are messy, so heuristic methods are the only possible.
 Spanish is rigorously clean, and stemming should be done from the 
 declension rules and irregular conjugation tables. This involves large 
 (fast) tables in ram rather than small (slow) string-shuffling.
 

Interesting do you a link for some documentation how to implement this?

salu2

 Lance Norskog
 
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf 
 Of Bertrand Delacretaz
 Sent: Thursday, September 20, 2007 8:11 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Strange behavior when searching with accents
 
 On 9/20/07, Thorsten Scherler 
 [EMAIL PROTECTED]
 wrote:
  ...Betrand, does the French Snowball work fine?...
 
 I've seen some weirdnesses, like tennis and tenir (means to hold) 
 both stemmed to ten, but in all of our (simple) tests it was ok.
 
 The application where we're using it does not require high precision 
 though, so it looked good enough and we didn't do create very 
 extensive tests for it.
 
 -Bertrand
 
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Re: Scripts not working on cron - always asking for password

2007-09-21 Thread Mike Klaas

On 21-Sep-07, at 7:44 AM, Daniel Alheiros wrote:


Hi

Problem solved... I had to create a private/public key for my users  
and add

it to the authorized_keys on my server...

I've used instructions on this page, quite simple actually (after  
you know

what you need to do...).

http://www.ece.uci.edu/~chou/ssh-key.html

Shouldn't this kind of information be present on the SOLR  
documentation? I'm
going to write it in my installation procedures, so I can  
contribute it back

to SOLR wiki if you think it's appropriate.


I wouldn't mind listing a brief note and a link, but trying to cover  
too many unix basics will clutter up the documentation.


-Mike


Re: olap with solr (math operations on facets)

2007-09-21 Thread Mike Klaas

On 21-Sep-07, at 8:27 AM, Rafael Rossini wrote:


Hi all,

I´m considering on doing something like a light-weight olap  
server with
lucene/solr. To achieve that I´d have to do some math operantions  
on facets.

Is that possible?
For example, my documents would be a purchase row, like (id,
value, id_department, id_store, id_region ...). If I did a facet  
query for
id_deparment the server would return me something like: deparment1:  
500,

deparment2: 400... Is it possible to get the sum, or avg or any math
operation on the field value? Than the server would return me:  
deparment1:

100 (the sum of each value)    Is it clear?


Currently this is not possible out of the box with Solr.

-Mike

Re: olap with solr (math operations on facets)

2007-09-21 Thread Mike Klaas

On 21-Sep-07, at 2:42 PM, Rafael Rossini wrote:

Thanks for the reply Mike. Is there any plans on doing some like  
this? Or

some direction anyone could give?


Probably the easiest thing to do is write a custom request handlers  
that iterates over the field cache and computes the statistics you  
want (loading the docs would probably be too slow).


Check out SimpleFacets.java to see how it uses the FieldCache.

-Mike


Re: olap with solr (math operations on facets)

2007-09-21 Thread Rafael Rossini
Thanks for the reply Mike. Is there any plans on doing some like this? Or
some direction anyone could give?

[]s
Rossini


On 9/21/07, Mike Klaas [EMAIL PROTECTED] wrote:

 On 21-Sep-07, at 8:27 AM, Rafael Rossini wrote:

  Hi all,
 
  I´m considering on doing something like a light-weight olap
  server with
  lucene/solr. To achieve that I´d have to do some math operantions
  on facets.
  Is that possible?
  For example, my documents would be a purchase row, like (id,
  value, id_department, id_store, id_region ...). If I did a facet
  query for
  id_deparment the server would return me something like: deparment1:
  500,
  deparment2: 400... Is it possible to get the sum, or avg or any math
  operation on the field value? Than the server would return me:
  deparment1:
  100 (the sum of each value)    Is it clear?

 Currently this is not possible out of the box with Solr.

 -Mike