Re: a bug in commit script?
Got it. So what's the easiest way to get this patch? Sorry i'm new to this. regards, -Hui On 9/20/07, Bill Au [EMAIL PROTECTED] wrote: That would be my bad. I noticed the problem while fixing SOLR-282 which is not related. I fixed both problems in stead of opening a different bug for the response format issue. I will update the change log. Bill On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : : It seems there's a small bug in the bin/commit script for solr 1.2. A fix was already commited to the trunk for this as part of SOLR-282 (but there doesn't seem to be a note about it in the changelog) -Hoss -- Regards, -Hui
Re: rsync start and enable for multiple solr instances within one tomcat
Bill, Thanks for the explanation. That helps my understanding on rsync and the replication in general. regards, -Hui On 9/20/07, Bill Au [EMAIL PROTECTED] wrote: The solr that you are referring to in your third question in the name of the rsync area which is map to the solr data directory. This is defined in the rsyncd configuration file which is generated on the fly as Chris has pointed out. Take a look at rsyncd-start. snappuller rsync the index from this 'solr' area (the command you have quoted) on the master. The name of the rsync area had nothing to do with the name of the index. We set up this area for rsyncd so that one is restricted within this area when trying to access files on the master going through rsyncd. The name of the rsyncd area does not have to be 'solr'. It can be anything as long as the value in rsyncd-start matches the value in snappuller. Bill On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : So just to help my knowledge, where does this virtual setting of this solr : string happen? Should it be in some config file or sth? rsyncd-start creates an rsync config file on the fly ... much of it is constants, but it fills in the rsync port using a variable from your config. -Hoss -- Regards, -Hui
RE: clarification needed for the Ranking score
Hi, Can we use the ranking as follows when searching the term'Java' present in different fields as per the relevance scenarios mentioned in the previous mail. q= courseTitle:Java^1 AND courseTag:Java^1000 AND courseDescription:Java^100; courseTitle asc, courseDescription asc, courseTag asc; -Original Message- From: Dilip.TS [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 10:40 AM To: SOLR Subject: clarification needed for the Ranking score Hi, I need a clarification regarding the SOLR Ranking. consider the scenario for searching for courses based on following relevance: a. Courses with the term in the courseTitle, courseTag and in the courseDescription would appear first b. Courses with the term in the courseTitle and in the courseDescription would appear next c. Courses with the term only in the courseTitle appear next. d. Courses with the term only in the courseDescription appear next. e. Courses with the term only in the courseTag appear last. Let me know if my understanding is correct with the following solution + (basequery) courseTitle^1 courseTag^1000 courseDescription^100; courseTitle asc, courseDescription asc,courseTag asc; How do we set the relevancy while performing a search? is there any configuration to set it in the solrconfig files? Also how do we set the Term Proximity? Could you clarify? Thanks in advance Regards, Dilip TS
RE: Strange behavior when searching with accents
On Thu, 2007-09-20 at 11:13 -0700, Lance Norskog wrote: English and French are messy, so heuristic methods are the only possible. Spanish is rigorously clean, and stemming should be done from the declension rules and irregular conjugation tables. This involves large (fast) tables in ram rather than small (slow) string-shuffling. Interesting do you a link for some documentation how to implement this? salu2 Lance Norskog -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bertrand Delacretaz Sent: Thursday, September 20, 2007 8:11 AM To: solr-user@lucene.apache.org Subject: Re: Strange behavior when searching with accents On 9/20/07, Thorsten Scherler [EMAIL PROTECTED] wrote: ...Betrand, does the French Snowball work fine?... I've seen some weirdnesses, like tennis and tenir (means to hold) both stemmed to ten, but in all of our (simple) tests it was ok. The application where we're using it does not require high precision though, so it looked good enough and we didn't do create very extensive tests for it. -Bertrand -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Term extraction
Thanks for the response guys: Grant: I had a brief look at LingPipe, it looks quite interesting but I'm concerned that the licensing may prevent me from using it in my project. Michael: I have used the Yahoo API in the past but due to it's generic nature, I wasn't entirely happy with the results in my test cases. Yonik: This is the approach I had in mind, will it still work if I put the SynonymFilter after the word-delimiter filter in the schema config? Ideally I want to strip out the underscore char before it gets indexed, is that possible by using a PatternReplaceFilterFactory after the SynonymFilter? Cheers, Piete On 21/09/2007, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: However, I'd like to be able to analyze documents more intelligently to recognize phrase keywords such as open source, Microsoft Office, Bill Gates rather than splitting each word into separate tokens (the field is never used in search queries so matching is not an issue). I've been looking at SynonymFilterFactory as a possible solution to this problem but haven't been able to work out the specifics of how to configure it for phrase mappings. SynonymFilter works out-of-the-box with multi-token synonyms... Microsoft Office = microsoft_office Bill Gates, William Gates = bill_gates Just don't use a word-delimiter filter if you use underscore to join words. -Yonik
Weird bug in query
Hello, I have a problem I'm not sure how to debug. I am running Solr 1.2.1 under Jetty. I have the following two queries: - q:articol_tag:pilonul ii AND articol_tag:facultative which returns x rows - q:articol_tag:facultative AND articol_tag:pilonul ii which doesn't return any rows I'm really stumped by this issue. Is this a Solr bug? I can provide offlist the url of the Solr installation if someone wants to see this behaviour. Thanks, Alexandru Badiu
Scripts not working on cron - always asking for password
Hi I'm having problems trying to setup my schedulled tasks. Sorry if it's something Linux related, as I'm not a Linux expert... I created a scripts.conf file (for my slave server) containing: user=solr solr_hostname=10.133.132.159 solr_port=8080 rsyncd_port=20280 data_dir=/var/solr2-v1.2.0/home/data webapp_name=solr master_host=10.133.132.159 master_data_dir=/var/solr3-v1.2.0/home/data master_status_dir=/var/solr3-v1.2.0/home/logs I have two solr instances running on the same machine, each one has its own data dir. My cron configuration is: # master 2 5 * * * /var/solr3-v1.2.0/home/bin/snapcleaner -D 1 2 4 * * * /var/solr3-v1.2.0/home/bin/optimize # slave */5 * * * * /var/solr2-v1.2.0/home/bin/snappuller;/var/solr2-v1.2.0/home/bin/snapinstall er */9 * * * * /var/solr3-v1.2.0/home/bin/snapcleaner -N 2 It's pretty weird for me because it always asks me for a password when I run any of them manually (and after passwords are provided they work properly). Where should I add this password in order to avoid it? I couldn't find it in the documentation. When I try to run manually the snappuller it asks for password 5 times when it has a new snapshot to get and 2 times when it doesn't have a new snapshot. Here goes the error log in the snappuller.log file: 2007/09/21 13:00:01 started by solr 2007/09/21 13:00:01 command: /var/solr2-v1.2.0/home/bin/snappuller 2007/09/21 13:00:01 failed to ssh to master 10.133.132.159 My OS is a RHEL. Regards, Daniel On 20/9/07 07:14, Yu-Hui Jin [EMAIL PROTECTED] wrote: Thanks, it works now. regards, -Hui On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: If you don't need to pass any command line arguments to snapshooter, remove (or comment out) this line from solrconfig.xml: arr name=args strarg1/str strarg2/str /arr By the same token, if you're not setting environment variables either, remove the following line as well: arr name=env strMYVAR=val1/str /arr Once you alter / remove those two lines, snapshooter should function as expected. cheers, Piete On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote: Hi, Pieter, Thanks! Now the exception is gone. However, There's no snapshot file created in the data directory. Strangely, the snapshooter.log seems to complete successfully. Any idea what else I'm missing? $ cat var/SolrHome/solr/logs/snapshooter.log 2007/09/19 20:16:17 started by solruser 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 arg2 2007/09/19 20:16:17 taking snapshot var/SolrHome/solr/data/snapshot.20070919201617 2007/09/19 20:16:17 ended (elapsed time: 0 sec) Thanks, -Hui On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: See this recent thread for some helpful info: http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879. html#a12167792 You'll probably want to configure your exe with an absolute path rather than the dir: str name=exe/var/SolrHome/solr/bin/snapshooter/str str name=dir./str In order to get the snapshooter working correctly. cheers, Piete On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote: Hi, there, I used an absolute path for the dir param in the solrconfig.xml as below: listener event=postCommit class=solr.RunExecutableListener str name=exesnapshooter/str str name=dir/var/SolrHome/solr/bin/str bool name=waittrue/bool arr name=args strarg1/str strarg2/str /arr arr name=env strMYVAR=val1/str /arr /listener However, I got snapshooter: not found exception thrown in catalina.out. I don't see why this doesn't work. Anything I'm missing? Many thanks, -Hui -- Regards, -Hui http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: Scripts not working on cron - always asking for password
On Fri, 2007-09-21 at 13:02 +0100, Daniel Alheiros wrote: Hi I'm having problems trying to setup my schedulled tasks. Sorry if it's something Linux related, as I'm not a Linux expert... I created a scripts.conf file (for my slave server) containing: user=solr solr_hostname=10.133.132.159 solr_port=8080 rsyncd_port=20280 data_dir=/var/solr2-v1.2.0/home/data webapp_name=solr master_host=10.133.132.159 master_data_dir=/var/solr3-v1.2.0/home/data master_status_dir=/var/solr3-v1.2.0/home/logs I have two solr instances running on the same machine, each one has its own data dir. My cron configuration is: # master 2 5 * * * /var/solr3-v1.2.0/home/bin/snapcleaner -D 1 2 4 * * * /var/solr3-v1.2.0/home/bin/optimize # slave */5 * * * * /var/solr2-v1.2.0/home/bin/snappuller;/var/solr2-v1.2.0/home/bin/snapinstall er */9 * * * * /var/solr3-v1.2.0/home/bin/snapcleaner -N 2 It's pretty weird for me because it always asks me for a password when I run any of them manually (and after passwords are provided they work properly). Well you just gave the answer. Make sure the user that is executing the cron has sufficient rights. The cron job will not be able to have a dialog. The prude force method is: sudo chown USER:USER /var/solr3-v1.2.0 That should do the job. salu2 Where should I add this password in order to avoid it? I couldn't find it in the documentation. When I try to run manually the snappuller it asks for password 5 times when it has a new snapshot to get and 2 times when it doesn't have a new snapshot. Here goes the error log in the snappuller.log file: 2007/09/21 13:00:01 started by solr 2007/09/21 13:00:01 command: /var/solr2-v1.2.0/home/bin/snappuller 2007/09/21 13:00:01 failed to ssh to master 10.133.132.159 My OS is a RHEL. Regards, Daniel On 20/9/07 07:14, Yu-Hui Jin [EMAIL PROTECTED] wrote: Thanks, it works now. regards, -Hui On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: If you don't need to pass any command line arguments to snapshooter, remove (or comment out) this line from solrconfig.xml: arr name=args strarg1/str strarg2/str /arr By the same token, if you're not setting environment variables either, remove the following line as well: arr name=env strMYVAR=val1/str /arr Once you alter / remove those two lines, snapshooter should function as expected. cheers, Piete On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote: Hi, Pieter, Thanks! Now the exception is gone. However, There's no snapshot file created in the data directory. Strangely, the snapshooter.log seems to complete successfully. Any idea what else I'm missing? $ cat var/SolrHome/solr/logs/snapshooter.log 2007/09/19 20:16:17 started by solruser 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 arg2 2007/09/19 20:16:17 taking snapshot var/SolrHome/solr/data/snapshot.20070919201617 2007/09/19 20:16:17 ended (elapsed time: 0 sec) Thanks, -Hui On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: See this recent thread for some helpful info: http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf4264879. html#a12167792 You'll probably want to configure your exe with an absolute path rather than the dir: str name=exe/var/SolrHome/solr/bin/snapshooter/str str name=dir./str In order to get the snapshooter working correctly. cheers, Piete On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote: Hi, there, I used an absolute path for the dir param in the solrconfig.xml as below: listener event=postCommit class=solr.RunExecutableListener str name=exesnapshooter/str str name=dir/var/SolrHome/solr/bin/str bool name=waittrue/bool arr name=args strarg1/str strarg2/str /arr arr name=env strMYVAR=val1/str /arr /listener However, I got snapshooter: not found exception thrown in catalina.out. I don't see why this doesn't work. Anything I'm missing? Many thanks, -Hui -- Regards, -Hui http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
RE: Synonyms expressions sens
Thanks for the advice Grant, I've tried putting '_' into synonyms, but step by step I've realised that it what always more intrusive into Solr source code... But I've found another solution, that I want to expose here in order to have external advice and perhaps pointing out some bugs or side effect I've not seen. I do not touch the source code but I only change my synonym.txt and the way I manage indexes on schema.xml. Giving a synonyms list like : capital punishement, death sentence, death penalty 10, dix, X 17, Dix sept, XVII 18, dix huit, XVIII Rock, jazz, modern music = modern music Coluche, colucci = colucci Coluche, coluci = coluci Coluche, colucchi = colucchi coluche, michel colucci = michel colucci I was faced with two major problems with index time synonym expansion (@ expand=true: - Possibility of synonyms mix (10, dix, X with 17, Dix sept, XVII or 18, dix huit, XVIII) - Possibility of query that could match some unexpected result due to language ambiguity, and in a more generic way, due to the fact that expansion put new token in document that will be matched at wuery time (ex: query capitale will match a document with death sentence ..) So here what I've done: A single line in synonym file could by seen as a family of synonyms, or switcheable term and expressions. So instead of injecting (into document at index time) for a single match, all the possibilities founded in the synonyms list, I've changed the list in order to give an ID for each synonyms families and the index time synonyms filter is no more configured with expand=true but with expand=false in order to replace a matched term with the ID of his family. Then at query time, I reintroduced the synonyms filter with expand=false in order to replace in the query the matched synonyms with their corresponding ID Her my synonyms list used with expand=false SynFamily1, capital punishement, death sentence, death penalty SynFamily2, 10, dix, x SynFamily89, 17, xvii, dix sept SynFamily112, 18, xviii, dix huit rock, modern music = HierFamily2017 jazz, modern music = HierFamily2014 coluche, collucci = HierFamily1537 coluche, colluche = HierFamily1538 coluche, colucchi = HierFamily1541 coluche, colucci = HierFamily1542 coluche, coluchi = HierFamily1543 coluche, coluci = HierFamily1544 It seems to work fine since now a query capital will not match a document that originally contains death sentence since the synonyms expansion is limited to the one-token ID SynFamily1, and in order to match such a document, a query like capital punishement must been made. The synonyms mixing also seems to have disappeared (document containing dix huit will not match for a query 10) My question is, do I've missed something ? The solution seems to much simple and since I'm working on fulltext search engine I've always faced side effects problems after logic modification, so I'm a little sceptic... :) Voila ! Thanks for your time Laurent -Message d'origine- De : Grant Ingersoll [mailto:[EMAIL PROTECTED] Envoyé : mardi 11 septembre 2007 14:53 À : solr-user@lucene.apache.org Objet : Re: Synonyms expressions sens Inline... On Sep 11, 2007, at 7:27 AM, Laurent Gilles wrote: Hi, I'm actually facing a relevancy issue with multiword synonyms. Let's expose it by a test case: Giving the following synonyms definitions: capital punishement, death sentence, death penalty And a [EMAIL PROTECTED] defined at index time, so the document: The prisoner escaped just before the death sentence had been set. Will be indexed like The prisoner escaped just before the (death sentence | death penalty | capital punishment) had been set. Now, if a user asks for capital, the system will match capital (that could mean 'Paris, capital of France') into the index time synonyms expanded document, which doesn't have sense. I was expecting that in order to match, I'll have to give the entire expression capital punishment to match a document that contains death sentence and not only a part of the expression. It seems to be the normal Solr behaviour, but what I'm actually facing is a relevance problem with the given results, since a given word contained in an expression could have a completely different meaning compared with the same isolated word. Is their a trick or a way to match synonym complete expression and not the words the expands have added into documents ? Ah, the ambiguity of language :-) I can think of a couple of different suggestions to try: 1. Index your phrase
Re: Problem getting the FacetCount
On 9/21/07, Amitha Talasila [EMAIL PROTECTED] wrote: But when we make a facet query like, http://localhost:8983/solr/select?q=ipodrows=0facet=truefacet.limit=-1fac et.query=weight:{0m TO 100m}, the facet count is coming as 0.We are indexing it as a string field because if the user searches for 12m he needs to see that result. Can anyone suggest a better way of querying this? In a string field, 12m is greater than 100m, so won't be in the range. You need to index that field as a numeric type where range queries work: use type sint or sfloat. As for the m, you should have a frontend that allows input in the form desire and converts it to a valid query to solr. -Yonik
Re: Term extraction
On 9/21/07, Pieter Berkel [EMAIL PROTECTED] wrote: Yonik: This is the approach I had in mind, will it still work if I put the SynonymFilter after the word-delimiter filter in the schema config? SynonymFilter doesn't currently have the capability to handle multiple tokens at the same position in the input. You could simply remove the WordDelimiterFilter unless you need it. Ideally I want to strip out the underscore char before it gets indexed Why's that? You could just define your synonyms like that initially: Bill Gates, William Gates = billgates -Yonik
Re: Scripts not working on cron - always asking for password
Hi Thorsten, Thanks for your answer, but I've done it before and it still didn't work. I was running everything before as root and it didn't work either. Now I've created a solr user, part of the root group, changed the ownership of all solr stuff, and changed file permissions to 775 (so any user on the root group should be able to do anything on any files) Any other suggestion? Regards, Daniel On 21/9/07 13:12, Thorsten Scherler [EMAIL PROTECTED] wrote: On Fri, 2007-09-21 at 13:02 +0100, Daniel Alheiros wrote: Hi I'm having problems trying to setup my schedulled tasks. Sorry if it's something Linux related, as I'm not a Linux expert... I created a scripts.conf file (for my slave server) containing: user=solr solr_hostname=10.133.132.159 solr_port=8080 rsyncd_port=20280 data_dir=/var/solr2-v1.2.0/home/data webapp_name=solr master_host=10.133.132.159 master_data_dir=/var/solr3-v1.2.0/home/data master_status_dir=/var/solr3-v1.2.0/home/logs I have two solr instances running on the same machine, each one has its own data dir. My cron configuration is: # master 2 5 * * * /var/solr3-v1.2.0/home/bin/snapcleaner -D 1 2 4 * * * /var/solr3-v1.2.0/home/bin/optimize # slave */5 * * * * /var/solr2-v1.2.0/home/bin/snappuller;/var/solr2-v1.2.0/home/bin/snapinstall er */9 * * * * /var/solr3-v1.2.0/home/bin/snapcleaner -N 2 It's pretty weird for me because it always asks me for a password when I run any of them manually (and after passwords are provided they work properly). Well you just gave the answer. Make sure the user that is executing the cron has sufficient rights. The cron job will not be able to have a dialog. The prude force method is: sudo chown USER:USER /var/solr3-v1.2.0 That should do the job. salu2 Where should I add this password in order to avoid it? I couldn't find it in the documentation. When I try to run manually the snappuller it asks for password 5 times when it has a new snapshot to get and 2 times when it doesn't have a new snapshot. Here goes the error log in the snappuller.log file: 2007/09/21 13:00:01 started by solr 2007/09/21 13:00:01 command: /var/solr2-v1.2.0/home/bin/snappuller 2007/09/21 13:00:01 failed to ssh to master 10.133.132.159 My OS is a RHEL. Regards, Daniel On 20/9/07 07:14, Yu-Hui Jin [EMAIL PROTECTED] wrote: Thanks, it works now. regards, -Hui On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: If you don't need to pass any command line arguments to snapshooter, remove (or comment out) this line from solrconfig.xml: arr name=args strarg1/str strarg2/str /arr By the same token, if you're not setting environment variables either, remove the following line as well: arr name=env strMYVAR=val1/str /arr Once you alter / remove those two lines, snapshooter should function as expected. cheers, Piete On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote: Hi, Pieter, Thanks! Now the exception is gone. However, There's no snapshot file created in the data directory. Strangely, the snapshooter.log seems to complete successfully. Any idea what else I'm missing? $ cat var/SolrHome/solr/logs/snapshooter.log 2007/09/19 20:16:17 started by solruser 2007/09/19 20:16:17 command: /var/SolrHome/solr/bin/snapshooter arg1 arg2 2007/09/19 20:16:17 taking snapshot var/SolrHome/solr/data/snapshot.20070919201617 2007/09/19 20:16:17 ended (elapsed time: 0 sec) Thanks, -Hui On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: See this recent thread for some helpful info: http://www.nabble.com/solr-doesn%27t-find-exe-in-postCommit-event-tf426487 9. html#a12167792 You'll probably want to configure your exe with an absolute path rather than the dir: str name=exe/var/SolrHome/solr/bin/snapshooter/str str name=dir./str In order to get the snapshooter working correctly. cheers, Piete On 20/09/2007, Yu-Hui Jin [EMAIL PROTECTED] wrote: Hi, there, I used an absolute path for the dir param in the solrconfig.xml as below: listener event=postCommit class=solr.RunExecutableListener str name=exesnapshooter/str str name=dir/var/SolrHome/solr/bin/str bool name=waittrue/bool arr name=args strarg1/str strarg2/str /arr arr name=env strMYVAR=val1/str /arr /listener However, I got snapshooter: not found exception thrown in catalina.out. I don't see why this doesn't work. Anything I'm missing? Many thanks, -Hui -- Regards, -Hui http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or
Re: Faceting question
lol I agree with you Hoss - sorry for that Here's the thing: I need additional information from the index - such as the id related to a facet field. For example, say I am faceting on author names for a book store, I would also like to get the author id along with the author name to show a link (next to the author name) to say the author's bio page. The author id is stored in the index but how do i get that back with the facet results? On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : I'm using faceting to get some results. I also want to get another field - : the id field along with it. Is it possible to get that somehow in the facet : results? you're going to have to elaborate on what it is you are trying to do ... i genuinely have no idea what you are asking (and i think i'm usually pretty good at reading between the lines and guessing what people mean). -Hoss
olap with solr (math operations on facets)
Hi all, I´m considering on doing something like a light-weight olap server with lucene/solr. To achieve that I´d have to do some math operantions on facets. Is that possible? For example, my documents would be a purchase row, like (id, value, id_department, id_store, id_region ...). If I did a facet query for id_deparment the server would return me something like: deparment1: 500, deparment2: 400... Is it possible to get the sum, or avg or any math operation on the field value? Than the server would return me: deparment1: 100 (the sum of each value) Is it clear? []s Rossini
RE: Faceting question
Faceting works on the terms in an index, so you can't get information beyond those terms without doing extra work. You could build an extra index used only for faceting that concatenates the information you need from other fields, and then parse it out in your application: e.g. Tolkien, J.R.R.|35421. If you're doing this so that you can do precise follow-on searches, though (where a user clicks on a link in a list of facets), you might want to think about whether the author name gives you everything you need. You may have two authors with the same name, who would show up as a single facet if you don't tack the id on; but even if you do, how is the user going to distinguish them? They'll just see two links, maybe with opaque id numbers. So maybe the bare author name is good enough. (I had a similar situation and found that getting away from a relational-database approach and going with what Solr does best was the best solution). Hope that helps, Peter -Original Message- From: Cric Digs [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 7:36 AM To: solr-user@lucene.apache.org Subject: Re: Faceting question lol I agree with you Hoss - sorry for that Here's the thing: I need additional information from the index - such as the id related to a facet field. For example, say I am faceting on author names for a book store, I would also like to get the author id along with the author name to show a link (next to the author name) to say the author's bio page. The author id is stored in the index but how do i get that back with the facet results? On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : I'm using faceting to get some results. I also want to get another field - : the id field along with it. Is it possible to get that somehow in the facet : results? you're going to have to elaborate on what it is you are trying to do ... i genuinely have no idea what you are asking (and i think i'm usually pretty good at reading between the lines and guessing what people mean). -Hoss
RE: Faceting question
Thanks Peter. That will be my work-around, but I was hoping to find a more elegant solution ;) I am not that knowledgeable about the solr architecture but if there is a way it can be done in a more elegant way I might be willing to put the extra time to code it.. Binkley, Peter wrote: Faceting works on the terms in an index, so you can't get information beyond those terms without doing extra work. You could build an extra index used only for faceting that concatenates the information you need from other fields, and then parse it out in your application: e.g. Tolkien, J.R.R.|35421. If you're doing this so that you can do precise follow-on searches, though (where a user clicks on a link in a list of facets), you might want to think about whether the author name gives you everything you need. You may have two authors with the same name, who would show up as a single facet if you don't tack the id on; but even if you do, how is the user going to distinguish them? They'll just see two links, maybe with opaque id numbers. So maybe the bare author name is good enough. (I had a similar situation and found that getting away from a relational-database approach and going with what Solr does best was the best solution). Hope that helps, Peter -Original Message- From: Cric Digs [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 7:36 AM To: solr-user@lucene.apache.org Subject: Re: Faceting question lol I agree with you Hoss - sorry for that Here's the thing: I need additional information from the index - such as the id related to a facet field. For example, say I am faceting on author names for a book store, I would also like to get the author id along with the author name to show a link (next to the author name) to say the author's bio page. The author id is stored in the index but how do i get that back with the facet results? On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : I'm using faceting to get some results. I also want to get another field - : the id field along with it. Is it possible to get that somehow in the facet : results? you're going to have to elaborate on what it is you are trying to do ... i genuinely have no idea what you are asking (and i think i'm usually pretty good at reading between the lines and guessing what people mean). -Hoss -- View this message in context: http://www.nabble.com/Faceting-question-tf4489342.html#a12824623 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Triggering snapshooter through web admin interface
Lance, do start a new thread if you run into this problem again and please include as much info as possible. Once a snapshot has been taken, the files it contains should not change so I am not sure why tar was telling you a file had changed while it was being copied. Bill On 9/19/07, Chris Hostetter [EMAIL PROTECTED] wrote: lance: since the topic you are describing is not directly related to triggering a snapshot from the web interface can you please start a new thread with a unique subejct describing in more details exactly what it was you were doing and the problem you encountered? this will make it easier for your problem to get visibility (some people don't read every thread, and archive searching is frequently done by thread, so people looking for similar problems may not realize this new thread is burried inside an old one) -Hoss : Date: Wed, 19 Sep 2007 11:33:30 -0700 : From: Lance Norskog [EMAIL PROTECTED] : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: RE: Triggering snapshooter through web admin interface : : Is there a ticket for this yet? I have a bug report and request: I just did : a snapshot while indexing 700 records/sec. and got an inconsistency. I was : tarring off the snapshot and tar reported that a file changed while it was : being copied. The error rolled off my screen, so I cannot report the file : name or extension. : : If a solr command to do a snapshot is implemented, please make sure that it : is 100% consistent. : : Thanks, : : Lance Norskog
Re: rsync start and enable for multiple solr instances within one tomcat
You are welcome. Bill On 9/21/07, Yu-Hui Jin [EMAIL PROTECTED] wrote: Bill, Thanks for the explanation. That helps my understanding on rsync and the replication in general. regards, -Hui On 9/20/07, Bill Au [EMAIL PROTECTED] wrote: The solr that you are referring to in your third question in the name of the rsync area which is map to the solr data directory. This is defined in the rsyncd configuration file which is generated on the fly as Chris has pointed out. Take a look at rsyncd-start. snappuller rsync the index from this 'solr' area (the command you have quoted) on the master. The name of the rsync area had nothing to do with the name of the index. We set up this area for rsyncd so that one is restricted within this area when trying to access files on the master going through rsyncd. The name of the rsyncd area does not have to be 'solr'. It can be anything as long as the value in rsyncd-start matches the value in snappuller. Bill On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : So just to help my knowledge, where does this virtual setting of this solr : string happen? Should it be in some config file or sth? rsyncd-start creates an rsync config file on the fly ... much of it is constants, but it fills in the rsync port using a variable from your config. -Hoss -- Regards, -Hui
Re: a bug in commit script?
You should be able to run the latest version of the scripts against Solr 1.2. Just grab a copy for subversion: http://svn.apache.org/viewvc/lucene/solr/trunk/src/scripts/ Bill On 9/21/07, Yu-Hui Jin [EMAIL PROTECTED] wrote: Got it. So what's the easiest way to get this patch? Sorry i'm new to this. regards, -Hui On 9/20/07, Bill Au [EMAIL PROTECTED] wrote: That would be my bad. I noticed the problem while fixing SOLR-282 which is not related. I fixed both problems in stead of opening a different bug for the response format issue. I will update the change log. Bill On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : : It seems there's a small bug in the bin/commit script for solr 1.2. A fix was already commited to the trunk for this as part of SOLR-282 (but there doesn't seem to be a note about it in the changelog) -Hoss -- Regards, -Hui
Re: clarification needed for the Ranking score
This would probably work, but the approach has a subtle flaw. If a query has one word that matches a lot of titles, but a phrase that matches a description, the best result will be shown far too low, after all the titles. A better approach is to weight the titles a bit higher than the description, probalby 2X to 10X higher. At Infoseek, we weighted the title 8X higher. At Inktomi, with a completely different search engine, we used 7.5X. So I'd start with this: courseTitle^8 courseTag^4 courseDescription Also, I see that you are displaying the titles alphabetically, so the weights are meaningless. Maybe you should be using LIKE in MySQL if you want to do set matching and sorting. wunder On 9/20/07 10:10 PM, Dilip.TS [EMAIL PROTECTED] wrote: Hi, I need a clarification regarding the SOLR Ranking. consider the scenario for searching for courses based on following relevance: a. Courses with the term in the courseTitle, courseTag and in the courseDescription would appear first b. Courses with the term in the courseTitle and in the courseDescription would appear next c. Courses with the term only in the courseTitle appear next. d. Courses with the term only in the courseDescription appear next. e. Courses with the term only in the courseTag appear last. Let me know if my understanding is correct with the following solution + (basequery) courseTitle^1 courseTag^1000 courseDescription^100; courseTitle asc, courseDescription asc,courseTag asc; How do we set the relevancy while performing a search? is there any configuration to set it in the solrconfig files? Also how do we set the Term Proximity? Could you clarify? Thanks in advance Regards, Dilip TS
Re: Weird bug in query
On 21-Sep-07, at 1:44 AM, Alexandru Badiu wrote: Hello, I have a problem I'm not sure how to debug. I am running Solr 1.2.1 under Jetty. I have the following two queries: - q:articol_tag:pilonul ii AND articol_tag:facultative which returns x rows - q:articol_tag:facultative AND articol_tag:pilonul ii which doesn't return any rows These are differet queries. You query consists of three clauses (the 'ii' is not part of articol_tag). This is what you are querying, in (the much clearer) REQUIRED/OPTIONAL syntax (+ == clause is required): articol_tag:pilonul ii AND articol_tag:facultative == +default field:ii +articol_tag:facultative articol_tag:pilonul articol_tag:facultative AND articol_tag:pilonul ii == +articol_tag:facultative +articol_tag:pilonul default field:ii try: articol_tag:facultative AND articol_tag:pilonul ii -Mike
RE: Strange behavior when searching with accents
I have no links but it can all be done with synonym tables. I'm sure somewhere on the net there are full lists of the Spanish regular and irregular verbs (verbs which do not follow the conjugation rules). Then using basic text processing you could generate all of the declensions for the most common regular verbs. And then a custom stemmer would do the basics like adjective-mente - adjective. Lance -Original Message- From: Thorsten Scherler [mailto:[EMAIL PROTECTED] Sent: Friday, September 21, 2007 12:08 AM To: solr-user@lucene.apache.org Subject: RE: Strange behavior when searching with accents On Thu, 2007-09-20 at 11:13 -0700, Lance Norskog wrote: English and French are messy, so heuristic methods are the only possible. Spanish is rigorously clean, and stemming should be done from the declension rules and irregular conjugation tables. This involves large (fast) tables in ram rather than small (slow) string-shuffling. Interesting do you a link for some documentation how to implement this? salu2 Lance Norskog -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Bertrand Delacretaz Sent: Thursday, September 20, 2007 8:11 AM To: solr-user@lucene.apache.org Subject: Re: Strange behavior when searching with accents On 9/20/07, Thorsten Scherler [EMAIL PROTECTED] wrote: ...Betrand, does the French Snowball work fine?... I've seen some weirdnesses, like tennis and tenir (means to hold) both stemmed to ten, but in all of our (simple) tests it was ok. The application where we're using it does not require high precision though, so it looked good enough and we didn't do create very extensive tests for it. -Bertrand -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Re: Scripts not working on cron - always asking for password
On 21-Sep-07, at 7:44 AM, Daniel Alheiros wrote: Hi Problem solved... I had to create a private/public key for my users and add it to the authorized_keys on my server... I've used instructions on this page, quite simple actually (after you know what you need to do...). http://www.ece.uci.edu/~chou/ssh-key.html Shouldn't this kind of information be present on the SOLR documentation? I'm going to write it in my installation procedures, so I can contribute it back to SOLR wiki if you think it's appropriate. I wouldn't mind listing a brief note and a link, but trying to cover too many unix basics will clutter up the documentation. -Mike
Re: olap with solr (math operations on facets)
On 21-Sep-07, at 8:27 AM, Rafael Rossini wrote: Hi all, I´m considering on doing something like a light-weight olap server with lucene/solr. To achieve that I´d have to do some math operantions on facets. Is that possible? For example, my documents would be a purchase row, like (id, value, id_department, id_store, id_region ...). If I did a facet query for id_deparment the server would return me something like: deparment1: 500, deparment2: 400... Is it possible to get the sum, or avg or any math operation on the field value? Than the server would return me: deparment1: 100 (the sum of each value) Is it clear? Currently this is not possible out of the box with Solr. -Mike
Re: olap with solr (math operations on facets)
On 21-Sep-07, at 2:42 PM, Rafael Rossini wrote: Thanks for the reply Mike. Is there any plans on doing some like this? Or some direction anyone could give? Probably the easiest thing to do is write a custom request handlers that iterates over the field cache and computes the statistics you want (loading the docs would probably be too slow). Check out SimpleFacets.java to see how it uses the FieldCache. -Mike
Re: olap with solr (math operations on facets)
Thanks for the reply Mike. Is there any plans on doing some like this? Or some direction anyone could give? []s Rossini On 9/21/07, Mike Klaas [EMAIL PROTECTED] wrote: On 21-Sep-07, at 8:27 AM, Rafael Rossini wrote: Hi all, I´m considering on doing something like a light-weight olap server with lucene/solr. To achieve that I´d have to do some math operantions on facets. Is that possible? For example, my documents would be a purchase row, like (id, value, id_department, id_store, id_region ...). If I did a facet query for id_deparment the server would return me something like: deparment1: 500, deparment2: 400... Is it possible to get the sum, or avg or any math operation on the field value? Than the server would return me: deparment1: 100 (the sum of each value) Is it clear? Currently this is not possible out of the box with Solr. -Mike