PHP/Solr library
Hi all, I've been exploring http://www.php.net/manual/en/book.solr.php as a way to maintain my index. I already have a PHP script that I use to update a database so I was hoping to be able to update the database at the same time I am updating the index. However, I've been getting the following error when trying to run $solr_client-commit(); Unsuccessful update request. Response Code 0. (null) I've tried looking to see why I'm getting the error but I cannot find a reasonable explanation. My guess is that it is because my index is rather large (22 million records) and thus it is timing out or something like that but I cannot confirm that that is the case nor do I know how to fix it even if it were. Any help here would be greatly appreciated. Thanks, Brian Lamb
Re: MySQL data import
Hi all, Any tips on this one? Thanks, Brian Lamb On Sun, Dec 11, 2011 at 3:54 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I have a few questions about how the MySQL data import works. It seems it creates a separate connection for each entity I create. Is there any way to avoid this? By nature of my schema, I have several multivalued fields. Each one I populate with a separate entity. Is there a better way to do it? For example, could I pull in all the singular data in one sitting and then come back in later and populate with the multivalued items. An alternate approach in some cases would be to do a GROUP_CONCAT and then populate the multivalued column with some transformation. Is that possible? Lastly, is it possible to use copyField to copy three regular fields into one multiValued field and have all the data show up? Thanks, Brian Lamb
URLDataSource delta import
Hi all, According to http://wiki.apache.org/solr/DataImportHandler#Usage_with_XML.2BAC8-HTTP_Datasource a delta-import is not currently implemented for URLDataSource. I say currently because I've noticed that such documentation is out of date in many places. I wanted to see if this feature had been added yet or if there were plans to do so. Thanks, Brian Lamb
Re: MySQL data import
Thanks all. Erick, is there documentation on doing things with SolrJ and a JDBC connection? On Mon, Dec 12, 2011 at 1:34 PM, Erick Erickson erickerick...@gmail.comwrote: You might want to consider just doing the whole thing in SolrJ with a JDBC connection. When things get complex, it's sometimes more straightforward. Best Erick... P.S. Yes, it's pretty standard to have a single field be the destination for several copyField directives. On Mon, Dec 12, 2011 at 12:48 PM, Gora Mohanty g...@mimirtech.com wrote: On Mon, Dec 12, 2011 at 2:24 AM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I have a few questions about how the MySQL data import works. It seems it creates a separate connection for each entity I create. Is there any way to avoid this? Not sure, but I do not think that it is possible. However, from your description below, I think that you are unnecessarily multiplying entities. By nature of my schema, I have several multivalued fields. Each one I populate with a separate entity. Is there a better way to do it? For example, could I pull in all the singular data in one sitting and then come back in later and populate with the multivalued items. Not quite sure as to what you mean. Would it be possible for you to post your schema.xml, and the DIH configuration file? Preferably, put these on pastebin.com, and send us links. Also, you should obfuscate details like access passwords. An alternate approach in some cases would be to do a GROUP_CONCAT and then populate the multivalued column with some transformation. Is that possible? [...] This is how we have been handling it. A complete description would be long, but here is the gist of it: * A transformer will be needed. In this case, we found it easiest to use a Java-based transformer. Thus, your entity should include something like entity name=myname dataSource=mysource transformer=com.mycompany.search.solr.handler.JobsNumericTransformer... ... /entity Here, the class name to be used for the transformer attribute follows the usual Java rules, and the .jar needs to be made available to Solr. * The SELECT statement for the entity looks something like select group_concat( myfield SEPARATOR '@||@')... The separator should be something that does not occur in your normal data stream. * Within the entity, define field column=myfield/ * There are complications involved if NULL values are allowed for the field, in which case you would need to use COALESCE, maybe along with CAST * The transformer would look up myfield, split along the separator, and populate the multi-valued field. This *is* a little complicated, so I would also like to hear about possible alternatives. Regards, Gora
MySQL data import
Hi all, I have a few questions about how the MySQL data import works. It seems it creates a separate connection for each entity I create. Is there any way to avoid this? By nature of my schema, I have several multivalued fields. Each one I populate with a separate entity. Is there a better way to do it? For example, could I pull in all the singular data in one sitting and then come back in later and populate with the multivalued items. An alternate approach in some cases would be to do a GROUP_CONCAT and then populate the multivalued column with some transformation. Is that possible? Lastly, is it possible to use copyField to copy three regular fields into one multiValued field and have all the data show up? Thanks, Brian Lamb
Re: Boosting is slow
Any ideas on this one? On Thu, Nov 17, 2011 at 3:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Sorry, the query is actually: http://localhost:8983/solr/mycore/search/?q=test{!boost b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}start=sort=score+desc,mydate_field+descwt=xslttr=mysite.xsl On Thu, Nov 17, 2011 at 2:59 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I have about 20 million records in my solr index. I'm running into a problem now where doing a boost drastically slows down my search application. A typical query for me looks something like: http://localhost:8983/solr/mycore/search/?q=test {!boost b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))} I've tried several variations on the boost to see if that was the problem but even when doing something simple like: http://localhost:8983/solr/mycore/search/?q=test {!boost b=2} it is still really slow. Is there a different approach I should be taking? Thanks, Brian Lamb
Boosting is slow
Hi all, I have about 20 million records in my solr index. I'm running into a problem now where doing a boost drastically slows down my search application. A typical query for me looks something like: http://localhost:8983/solr/mycore/search/?q=test {!boost b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))} I've tried several variations on the boost to see if that was the problem but even when doing something simple like: http://localhost:8983/solr/mycore/search/?q=test {!boost b=2} it is still really slow. Is there a different approach I should be taking? Thanks, Brian Lamb
Re: Boosting is slow
Sorry, the query is actually: http://localhost:8983/solr/mycore/search/?q=test{!boost b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))}start=sort=score+desc,mydate_field+descwt=xslttr=mysite.xsl On Thu, Nov 17, 2011 at 2:59 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I have about 20 million records in my solr index. I'm running into a problem now where doing a boost drastically slows down my search application. A typical query for me looks something like: http://localhost:8983/solr/mycore/search/?q=test {!boost b=product(sum(log(sum(myfield,1)),1),recip(ms(NOW,mydate_field),3.16e-11,1,8))} I've tried several variations on the boost to see if that was the problem but even when doing something simple like: http://localhost:8983/solr/mycore/search/?q=test {!boost b=2} it is still really slow. Is there a different approach I should be taking? Thanks, Brian Lamb
Autocomplete
Hi all, I've read numerous guides on how to set up autocomplete on solr and it works great the way I have it now. However, my only complaint is that it only matches the beginning of the word. For example, if I try to autocomplete dober, I would only get, Doberman, Doberman Pincher but not Pincher, Doberman. Here is how my schema is configured: fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=autocomplete_text type=edgytext indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true / How can I update my autocomplete so that it will match the middle of a word as well as the beginning of the word? Thanks, Brian Lamb
Re: Autocomplete
I found that if I change filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / to filter class=solr.NGramFilterFactory minGramSize=1 maxGramSize=25 / I can do autocomplete in the middle of a term. Thanks! Brian Lamb On Thu, Sep 1, 2011 at 11:27 AM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I've read numerous guides on how to set up autocomplete on solr and it works great the way I have it now. However, my only complaint is that it only matches the beginning of the word. For example, if I try to autocomplete dober, I would only get, Doberman, Doberman Pincher but not Pincher, Doberman. Here is how my schema is configured: fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=autocomplete_text type=edgytext indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true / How can I update my autocomplete so that it will match the middle of a word as well as the beginning of the word? Thanks, Brian Lamb
Re: Exact match not the first result returned
I implemented both solutions Hoss suggested and was able to achieve the desired results. I would like to go with defType=dismax qf=myname pf=myname_str^100 q=Frank but that doesn't seem to work if I have a query like myname:Frank otherfield:something. So I think I will go with q=+myname:Frank myname_str:Frank^100 Thanks for the help everyone! Brian Lamb On Wed, Jul 27, 2011 at 10:55 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : With your solution, RECORD 1 does appear at the top but I think thats just : blind luck more than anything else because RECORD 3 shows as having the same : score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd : like all three records returned with RECORD 1 being the first listing. with omitNorms RECORD1 and RECORD3 have the same score because only the tf() matters, and both docs contain the term frank exactly twice. the reason RECORD1 isn't scoring higher even though it contains (as you put it matchings 'Fred' exactly is that from a term perspective, RECORD1 doesn't actually match myname:Fred exactly, because there are in fact other terms in that field because it's multivalued. one way to indicate that you (only* want documents where entire field values to match your input (ie: RECORD1 but no other records) would be to use a StrField instead of a TextField or an analyzer that doesn't split up tokens (lie: something using KeywordTokenizer). that way a query on myname:Frank would not match a document where you had indexed the value Frank Stalone by a query for myname:Frank Stalone would. in your case, you don't want *only* the exact field value matches, but you want them boosted, so you could do something like copyField myname into myname_str and then do... q=+myname:Frank myname_str:Frank^100 ...in which case a match on myname is required, but a match on myname_str will greatly increase the score. dismax (and edismax) are really designed for situations like this... defType=dismax qf=myname pf=myname_str^100 q=Frank -Hoss
Re: Exact match not the first result returned
That's a clever idea. I'll put something together and see how it turns out. Thanks for the tip. On Wed, Jul 27, 2011 at 10:55 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : With your solution, RECORD 1 does appear at the top but I think thats just : blind luck more than anything else because RECORD 3 shows as having the same : score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd : like all three records returned with RECORD 1 being the first listing. with omitNorms RECORD1 and RECORD3 have the same score because only the tf() matters, and both docs contain the term frank exactly twice. the reason RECORD1 isn't scoring higher even though it contains (as you put it matchings 'Fred' exactly is that from a term perspective, RECORD1 doesn't actually match myname:Fred exactly, because there are in fact other terms in that field because it's multivalued. one way to indicate that you (only* want documents where entire field values to match your input (ie: RECORD1 but no other records) would be to use a StrField instead of a TextField or an analyzer that doesn't split up tokens (lie: something using KeywordTokenizer). that way a query on myname:Frank would not match a document where you had indexed the value Frank Stalone by a query for myname:Frank Stalone would. in your case, you don't want *only* the exact field value matches, but you want them boosted, so you could do something like copyField myname into myname_str and then do... q=+myname:Frank myname_str:Frank^100 ...in which case a match on myname is required, but a match on myname_str will greatly increase the score. dismax (and edismax) are really designed for situations like this... defType=dismax qf=myname pf=myname_str^100 q=Frank -Hoss
Re: Exact match not the first result returned
Thanks Emmanuel for that explanation. I implemented your solution but I'm not quite there yet. Suppose I also have a record: RECORD 3 arr name=myname strFred G. Anderson/str strFred Anderson/str /arr With your solution, RECORD 1 does appear at the top but I think thats just blind luck more than anything else because RECORD 3 shows as having the same score. So what more can I do to push RECORD 1 up to the top. Ideally, I'd like all three records returned with RECORD 1 being the first listing. Thanks, Brian Lamb On Tue, Jul 26, 2011 at 6:03 PM, Emmanuel Espina espinaemman...@gmail.comwrote: That is caused by the size of the documents. The principle is pretty intuitive if one of your documents is the entire three volumes of The Lord of the Rings, and you search for tree I know that The Lord of the Rings will be in the results, and I haven't memorized the entire text of that book :p It is a matter of probability that if you have a big (big!) text any word will have a greater chance to be found than in a smaller letter. So one can infer that the letter is more relevant than the big text. That is the principle applied here and Lucene does that when building the ranking. The first document is bigger (remember that all the values of a multivalued field are merged into one field in the index, so you can not tell one value from another apart) than the second one. In the first one you have [Fred, coolest, guy, town] and in the second [Fred, Anderson], so the second document is more relevant than the first one. To avoid all this procedure you can set omitNorms to true and that should make the first document more relevant because Fred appears twice (not because Fred appears alone in a value) Regards Emmanuel 2011/7/26 Brian Lamb brian.l...@journalexperts.com Hi all, I am a little confused as to why the scoring is working the way it is: I have a field defined as: field name=myname type=text indexed=true stored=true required=false multivalued=true / And I have several documents where that value is: RECORD 1 arr name=myname strFred/str strFred (the coolest guy in town)/str /arr OR RECORD 2 arr name=myname strFred Anderson/str /arr What happens when I do a search for http://localhost:8983/solr/search/?q=myname:Fred I get RECORD 2 returned before RECORD 1. RECORD 2 5.282213 = (MATCH) fieldWeight(myname:Fred in 256575), product of: 1.0 = tf(termFreq(myname:Fred)=1) 8.451541 = idf(docFreq=7306, maxDocs=12586425) 0.625 = fieldNorm(field=myname, doc=256575) RECORD 1 4.482106 = (MATCH) fieldWeight(myname:Fred in 215), product of: 1.4142135 = tf(termFreq(myname:Fred)=2) 8.451541 = idf(docFreq=7306, maxDocs=12586425) 0.375 = fieldNorm(field=myname, doc=215) So the difference is fieldNorm obviously but I think that's only part of the story. Why is RECORD 2 returned with a higher score than RECORD 1 even though RECORD 1 matches Fred exactly? And how should I do this differently so that I am getting the results I am expecting? Thanks, Brian Lamb
Re: Rounding errors in solr
Is this possible to do? If so, how? On 7/25/11, Brian Lamb brian.l...@journalexperts.com wrote: Yes and that's causing some problems in my application. Is there a way to truncate the 7th decimal place in regards to sorting by the score? On Fri, Jul 22, 2011 at 4:27 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Jul 22, 2011 at 4:11 PM, Brian Lamb brian.l...@journalexperts.com wrote: I've noticed some peculiar scoring issues going on in my application. For example, I have a field that is multivalued and has several records that have the same value. For example, arr name=references strNational Society of Animal Lovers/str strNat. Soc. of Ani. Lov./str /arr I have about 300 records with that exact value. Now, when I do a search for references:(national society animal lovers), I get the following results: id252/id id159/id id82/id id452/id id105/id When I do a search for references:(nat soc ani lov), I get the results ordered differently: id510/id id122/id id501/id id82/id id252/id When I load all the records that match, I notice that at some point, the scores aren't the same but differ by only a little: 1.471928 in one and the one before it was 1.471929 32 bit floats only have 7 decimal digits of precision, and in floating point land (a+b+c) can be slightly different than (c+b+a) -Yonik http://www.lucidimagination.com
Exact match not the first result returned
Hi all, I am a little confused as to why the scoring is working the way it is: I have a field defined as: field name=myname type=text indexed=true stored=true required=false multivalued=true / And I have several documents where that value is: RECORD 1 arr name=myname strFred/str strFred (the coolest guy in town)/str /arr OR RECORD 2 arr name=myname strFred Anderson/str /arr What happens when I do a search for http://localhost:8983/solr/search/?q=myname:Fred I get RECORD 2 returned before RECORD 1. RECORD 2 5.282213 = (MATCH) fieldWeight(myname:Fred in 256575), product of: 1.0 = tf(termFreq(myname:Fred)=1) 8.451541 = idf(docFreq=7306, maxDocs=12586425) 0.625 = fieldNorm(field=myname, doc=256575) RECORD 1 4.482106 = (MATCH) fieldWeight(myname:Fred in 215), product of: 1.4142135 = tf(termFreq(myname:Fred)=2) 8.451541 = idf(docFreq=7306, maxDocs=12586425) 0.375 = fieldNorm(field=myname, doc=215) So the difference is fieldNorm obviously but I think that's only part of the story. Why is RECORD 2 returned with a higher score than RECORD 1 even though RECORD 1 matches Fred exactly? And how should I do this differently so that I am getting the results I am expecting? Thanks, Brian Lamb
Re: Rounding errors in solr
Yes and that's causing some problems in my application. Is there a way to truncate the 7th decimal place in regards to sorting by the score? On Fri, Jul 22, 2011 at 4:27 PM, Yonik Seeley yo...@lucidimagination.comwrote: On Fri, Jul 22, 2011 at 4:11 PM, Brian Lamb brian.l...@journalexperts.com wrote: I've noticed some peculiar scoring issues going on in my application. For example, I have a field that is multivalued and has several records that have the same value. For example, arr name=references strNational Society of Animal Lovers/str strNat. Soc. of Ani. Lov./str /arr I have about 300 records with that exact value. Now, when I do a search for references:(national society animal lovers), I get the following results: id252/id id159/id id82/id id452/id id105/id When I do a search for references:(nat soc ani lov), I get the results ordered differently: id510/id id122/id id501/id id82/id id252/id When I load all the records that match, I notice that at some point, the scores aren't the same but differ by only a little: 1.471928 in one and the one before it was 1.471929 32 bit floats only have 7 decimal digits of precision, and in floating point land (a+b+c) can be slightly different than (c+b+a) -Yonik http://www.lucidimagination.com
Ignore records that are missing a value in a field
Hi all, I have an optional field called common_names. I would like to keep this field optional but at the same, occasionally do a search where I do not include results where there is no value set for this field. Is this possible to do within solr? In other words, I would like to do a search where if there is no value set for common_names, I would not want that record included in the search result. Thanks, Brian Lamb
Rounding errors in solr
Hi all, I've noticed some peculiar scoring issues going on in my application. For example, I have a field that is multivalued and has several records that have the same value. For example, arr name=references strNational Society of Animal Lovers/str strNat. Soc. of Ani. Lov./str /arr I have about 300 records with that exact value. Now, when I do a search for references:(national society animal lovers), I get the following results: id252/id id159/id id82/id id452/id id105/id When I do a search for references:(nat soc ani lov), I get the results ordered differently: id510/id id122/id id501/id id82/id id252/id When I load all the records that match, I notice that at some point, the scores aren't the same but differ by only a little: 1.471928 in one and the one before it was 1.471929 I turned on debugQuery=on and the scores for each of those two records are exactly the same. Therefore, I think there is some kind of rounding error going on. Is there a way I can fix this? Alternatively, can I sort by a rounded version of the score? I tried sort=round(score,5) but I get the following message: Can't determine Sort Order: 'round(score,5) ', pos=5 I also tried sort=sum(score,1) just to see if I was using round incorrectly but I get an error message there too saying score is not a recognized field. Please help! Thanks, Brian Lamb
Records disappearing
Hi all, I'm having some weird behavior with my dataimport script. Because of memory issues, I've taken to doing a delta import as doing a fullimport with clean=false. My dataimport config file is set up like: entity name=findDelta query=SELECT id FROM mytable WHERE date_added gt; '${dataimporter.last_index_time}' OR last_updated gt; '${dataimporter.last_index_time}' rootEntity=false entity name=mytable pk=id query=SELECT * FROM mytable WHERE id = '${findDelta.id}' deletedPkQuery=SELECT id FROM my_delete_table deltaImportQuery=SELECT id FROM mytable WHERE id='${ dataimporter.delta.id}' deltaQuery=SELECT id FROM mytable WHERE date_added gt; '${dataimporter.last_index_time}' OR last_updated gt; '${dataimporter.last_index_time}' field column=id name=id / field column=title name=title / field column=name name=name / field column=summary name=summary / /entity /entity I've found that one (possible more that I haven't noticed) keeps disappearing from the index. I will do a fullimportclean=false and search and the record will be there. I'll search again a few hours later and its there. But then all of a sudden, its gone. I don't know what is triggering that one record's disappearance but it is quite annoying. Any ideas what's going on? Thanks, Brian Lamb
Reject URL requests unless from localhost for dataimport
Hi all, My solr server is currently set up at www.mysite.com:8983/solr. I would like to keep this for the time being but I would like to restrict users from going to www.mysite.com:8983/solr/dataimport. In that case, I would only want to be able to do localhost:8983/solr/dataimport. Is this possible? If so, where should I look for a guide? Thanks, Brian Lamb
Re: Default query parser operator
It could, it would be a little bit clunky but that's the direction I'm heading. On Tue, Jun 7, 2011 at 6:05 PM, lee carroll lee.a.carr...@googlemail.comwrote: Hi Brian could your front end app do this field query logic? (assuming you have an app in front of solr) On 7 June 2011 18:53, Jonathan Rochkind rochk...@jhu.edu wrote: There's no feature in Solr to do what you ask, no. I don't think. On 6/7/2011 1:30 PM, Brian Lamb wrote: Hi Jonathan, Thank you for your reply. Your point about my example is a good one. So let me try to restate using your example. Suppose I want to apply AND to any search terms within field1. Then field1:foo field2:bar field1:baz field2:bom would by written as http://localhost:8983/solr/?q=field1:foo OR field2:bar OR field1:baz OR field2:bom But if they were written together like: http://localhost:8983/solr/?q=field1:(foo baz) field2:(bar bom) I would want it to be http://localhost:8983/solr/?q=field1:(foo AND baz) OR field2:(bar OR bom) But it sounds like you are saying that would not be possible. Thanks, Brian Lamb On Tue, Jun 7, 2011 at 11:27 AM, Jonathan Rochkindrochk...@jhu.edu wrote: Nope, not possible. I'm not even sure what it would mean semantically. If you had default operator OR ordinarily, but default operator AND just for field2, then what would happen if you entered: field1:foo field2:bar field1:baz field2:bom Where the heck would the ANDs and ORs go? The operators are BETWEEN the clauses that specify fields, they don't belong to a field. In general, the operators are part of the query as a whole, not any specific field. In fact, I'd be careful of your example query: q=field1:foo bar field2:baz I don't think that means what you think it means, I don't think the field1 applies to the bar in that case. Although I could be wrong, but you definitely want to check it. You need field1:foo field1:bar, or set the default field for the query to field1, or use parens (although that will change the execution strategy and ranking): q=field1:(foo bar) At any rate, even if there's a way to specify this so it makes sense, no, Solr/lucene doesn't support any such thing. On 6/7/2011 10:56 AM, Brian Lamb wrote: I feel like this should be fairly easy to do but I just don't see anywhere in the documentation on how to do this. Perhaps I am using the wrong search parameters. On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, Is it possible to change the query parser operator for a specific field without having to explicitly type it in the search field? For example, I'd like to use: http://localhost:8983/solr/search/?q=field1:word token field2:parser syntax instead of http://localhost:8983/solr/search/?q=field1:word AND token field2:parser syntax But, I only want it to be applied to field1, not field2 and I want the operator to always be AND unless the user explicitly types in OR. Thanks, Brian Lamb
Re: Default query parser operator
I feel like this should be fairly easy to do but I just don't see anywhere in the documentation on how to do this. Perhaps I am using the wrong search parameters. On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, Is it possible to change the query parser operator for a specific field without having to explicitly type it in the search field? For example, I'd like to use: http://localhost:8983/solr/search/?q=field1:word token field2:parser syntax instead of http://localhost:8983/solr/search/?q=field1:word AND token field2:parser syntax But, I only want it to be applied to field1, not field2 and I want the operator to always be AND unless the user explicitly types in OR. Thanks, Brian Lamb
Re: Default query parser operator
Hi Jonathan, Thank you for your reply. Your point about my example is a good one. So let me try to restate using your example. Suppose I want to apply AND to any search terms within field1. Then field1:foo field2:bar field1:baz field2:bom would by written as http://localhost:8983/solr/?q=field1:foo OR field2:bar OR field1:baz OR field2:bom But if they were written together like: http://localhost:8983/solr/?q=field1:(foo baz) field2:(bar bom) I would want it to be http://localhost:8983/solr/?q=field1:(foo AND baz) OR field2:(bar OR bom) But it sounds like you are saying that would not be possible. Thanks, Brian Lamb On Tue, Jun 7, 2011 at 11:27 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Nope, not possible. I'm not even sure what it would mean semantically. If you had default operator OR ordinarily, but default operator AND just for field2, then what would happen if you entered: field1:foo field2:bar field1:baz field2:bom Where the heck would the ANDs and ORs go? The operators are BETWEEN the clauses that specify fields, they don't belong to a field. In general, the operators are part of the query as a whole, not any specific field. In fact, I'd be careful of your example query: q=field1:foo bar field2:baz I don't think that means what you think it means, I don't think the field1 applies to the bar in that case. Although I could be wrong, but you definitely want to check it. You need field1:foo field1:bar, or set the default field for the query to field1, or use parens (although that will change the execution strategy and ranking): q=field1:(foo bar) At any rate, even if there's a way to specify this so it makes sense, no, Solr/lucene doesn't support any such thing. On 6/7/2011 10:56 AM, Brian Lamb wrote: I feel like this should be fairly easy to do but I just don't see anywhere in the documentation on how to do this. Perhaps I am using the wrong search parameters. On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, Is it possible to change the query parser operator for a specific field without having to explicitly type it in the search field? For example, I'd like to use: http://localhost:8983/solr/search/?q=field1:word token field2:parser syntax instead of http://localhost:8983/solr/search/?q=field1:word AND token field2:parser syntax But, I only want it to be applied to field1, not field2 and I want the operator to always be AND unless the user explicitly types in OR. Thanks, Brian Lamb
Default query parser operator
Hi all, Is it possible to change the query parser operator for a specific field without having to explicitly type it in the search field? For example, I'd like to use: http://localhost:8983/solr/search/?q=field1:word token field2:parser syntax instead of http://localhost:8983/solr/search/?q=field1:word AND token field2:parser syntax But, I only want it to be applied to field1, not field2 and I want the operator to always be AND unless the user explicitly types in OR. Thanks, Brian Lamb
Re: Searching using a PDF
I mean instead of typing http://localhost:8983/?q=mysearch, I would send a PDF file with the contents of mysearch and search based on that. I am leaning toward handling this before it hits solr however. Thanks, Brian Lamb On Wed, Jun 1, 2011 at 3:52 PM, Erick Erickson erickerick...@gmail.comwrote: I'm not quite sure what you mean by regular search. When you index a PDF (Presumably through Tika or Solr Cell) the text is indexed into your index and you can certainly search that. Additionally, there may be meta data indexed in specific fields (e.g. author, date modified, etc). But what does search based on a PDF file mean in your context? Best Erick On Wed, Jun 1, 2011 at 3:41 PM, Brian Lamb brian.l...@journalexperts.com wrote: Is it possible to do a search based on a PDF file? I know its possible to update the index with a PDF but can you do just a regular search with it? Thanks, Brian Lamb
Re: Edgengram
Hi Tomás, Thank you very much for your suggestion. I took another crack at it using your recommendation and it worked ideally. The only thing I had to change was analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / /analyzer to analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory / /analyzer The first did not produce any results but the second worked beautifully. Thanks! Brian Lamb 2011/5/31 Tomás Fernández Löbbe tomasflo...@gmail.com ...or also use the LowerCaseTokenizerFactory at query time for consistency, but not the edge ngram filter. 2011/5/31 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Brian, I don't know if I understand what you are trying to achieve. You want the term query abcdefg to have an idf of 1 insead of 7? I think using the KeywordTokenizerFilterFactory at query time should work. I would be something like: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / /analyzer /fieldType this way, at query time abcdefg won't be turned to a ab abc abcd abcde abcdef abcdefg. At index time it will. Regards, Tomás On Tue, May 31, 2011 at 1:07 PM, Brian Lamb brian.l...@journalexperts.com wrote: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType I believe I used that link when I initially set up the field and it worked great (and I'm still using it in other places). In this particular example however it does not appear to be practical for me. I mentioned that I have a similarity class that returns 1 for the idf and in the case of an edgengram, it returns 1 * length of the search string. Thanks, Brian Lamb On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com wrote: Can you specify the analyzer you are using for your queries? May be you could use a KeywordAnalyzer for your queries so you don't end up matching parts of your query. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ This should help you. On Tue, May 31, 2011 at 8:24 PM, Brian Lamb brian.l...@journalexperts.comwrote: In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type abcdefg. That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, case isn't important either. Thanks, Brian Lamb On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.com wrote: That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb -- Thanks and Regards, DakshinaMurthy BM
Re: Edgengram
I think in my case LowerCaseTokenizerFactory will be sufficient because there will never be spaces in this particular field. But thank you for the useful link! Thanks, Brian Lamb On Wed, Jun 1, 2011 at 11:44 AM, Erick Erickson erickerick...@gmail.comwrote: Be a little careful here. LowerCaseTokenizerFactory is different than KeywordTokenizerFactory. LowerCaseTokenizerFactory will give you more than one term. e.g. the string Intelligence can't be MeaSurEd will give you 5 terms, any of which may match. i.e. intelligence, can, t, be, measured. whereas KeywordTokenizerFactory followed, by, say LowerCaseFilter would give you exactly one token: intelligence can't be measured. So searching for measured would get a hit in the first case but not in the second. Searching for intellig* would hit both. Neither is better, just make sure they do what you want! This page will help a lot: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory as will the admin/analysis page. Best Erick On Wed, Jun 1, 2011 at 10:43 AM, Brian Lamb brian.l...@journalexperts.com wrote: Hi Tomás, Thank you very much for your suggestion. I took another crack at it using your recommendation and it worked ideally. The only thing I had to change was analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / /analyzer to analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory / /analyzer The first did not produce any results but the second worked beautifully. Thanks! Brian Lamb 2011/5/31 Tomás Fernández Löbbe tomasflo...@gmail.com ...or also use the LowerCaseTokenizerFactory at query time for consistency, but not the edge ngram filter. 2011/5/31 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Brian, I don't know if I understand what you are trying to achieve. You want the term query abcdefg to have an idf of 1 insead of 7? I think using the KeywordTokenizerFilterFactory at query time should work. I would be something like: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / /analyzer /fieldType this way, at query time abcdefg won't be turned to a ab abc abcd abcde abcdef abcdefg. At index time it will. Regards, Tomás On Tue, May 31, 2011 at 1:07 PM, Brian Lamb brian.l...@journalexperts.com wrote: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType I believe I used that link when I initially set up the field and it worked great (and I'm still using it in other places). In this particular example however it does not appear to be practical for me. I mentioned that I have a similarity class that returns 1 for the idf and in the case of an edgengram, it returns 1 * length of the search string. Thanks, Brian Lamb On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com wrote: Can you specify the analyzer you are using for your queries? May be you could use a KeywordAnalyzer for your queries so you don't end up matching parts of your query. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ This should help you. On Tue, May 31, 2011 at 8:24 PM, Brian Lamb brian.l...@journalexperts.comwrote: In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type abcdefg. That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, case isn't important either. Thanks, Brian Lamb On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.com wrote: That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May
Searching using a PDF
Is it possible to do a search based on a PDF file? I know its possible to update the index with a PDF but can you do just a regular search with it? Thanks, Brian Lamb
Re: Edgengram
In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type abcdefg. That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, case isn't important either. Thanks, Brian Lamb On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.comwrote: That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb
Re: Edgengram
fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType I believe I used that link when I initially set up the field and it worked great (and I'm still using it in other places). In this particular example however it does not appear to be practical for me. I mentioned that I have a similarity class that returns 1 for the idf and in the case of an edgengram, it returns 1 * length of the search string. Thanks, Brian Lamb On Tue, May 31, 2011 at 11:34 AM, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com wrote: Can you specify the analyzer you are using for your queries? May be you could use a KeywordAnalyzer for your queries so you don't end up matching parts of your query. http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ This should help you. On Tue, May 31, 2011 at 8:24 PM, Brian Lamb brian.l...@journalexperts.comwrote: In this particular case, I will be doing a solr search based on user preferences. So I will not be depending on the user to type abcdefg. That will be automatically generated based on user selections. The contents of the field do not contain spaces and since I am created the search parameters, case isn't important either. Thanks, Brian Lamb On Tue, May 31, 2011 at 9:44 AM, Erick Erickson erickerick...@gmail.com wrote: That'll work for your case, although be aware that string types aren't analyzed at all, so case matters, as do spaces etc. What is the use-case here? If you explain it a bit there might be better answers Best Erick On Fri, May 27, 2011 at 9:17 AM, Brian Lamb brian.l...@journalexperts.com wrote: For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb -- Thanks and Regards, DakshinaMurthy BM
Explain the difference in similarity and similarityProvider
I'm looking over the patch notes from https://issues.apache.org/jira/browse/SOLR-2338 and I do not understand the difference between similarity class=com.example.solr.CustomSimilarityFactory str name=paramkeyparam value/str /similarity and similarityProvider class=org.apache.solr.schema.CustomSimilarityProviderFactory str name=echois there an echo?/str /similarityProvider When would I use one over the other? Thanks, Brian Lamb
Re: Edgengram
For this, I ended up just changing it to string and using abcdefg* to match. That seems to work so far. Thanks, Brian Lamb On Wed, May 25, 2011 at 4:53 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb
Re: Similarity per field
I'm still not having any luck with this. Has anyone actually gotten this to work so far? I feel like I've followed the directions to the letter but it just doesn't work. Thanks, Brian Lamb On Wed, May 25, 2011 at 2:48 PM, Brian Lamb brian.l...@journalexperts.comwrote: I looked at the patch page and saw the files that were changed. I went into my install and looked at those same files and found that they had indeed been changed. So it looks like I have the correct version of solr. On Wed, May 25, 2011 at 1:01 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I sent a mail in about this topic a week ago but now that I have more information about what I am doing, as well as a better understanding of how the similarity class works, I wanted to start a new thread with a bit more information about what I'm doing, what I want to do, and how I can make it work correctly. I have written a similarity class that I would like applied to a specific field. This is how I am defining the fieldType: fieldType name=edgengram_cust class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=1 side=front / /analyzer similarity class=my.package.similarity.MySimilarity/ /fieldType And then I assign a specific field to that fieldType: field name=myfield multiValued=true type=edgengram_cust indexed=true stored=true required=false omitNorms=true / Then, I restarted solr and did a fullimport. However, the changes I have made do not appear to be taking hold. For simplicity, right now I just have the idf function returning 1. When I do a search with debugQuery=on, the idf behaves as it normally does. However, when I search on this field, the idf should be 1 and that is not the case. To try and nail down where the problem occurs, I commented out the similarity class definition in the fieldType and added it globally to the schema file: similarity class=my.package.similarity.MySimilarity/ Then, I restarted solr and did a fullimport. This time, the idf scores were all 1. So it seems to me the problem is not with my similarity class but in trying to apply it to a specific fieldType. According to https://issues.apache.org/jira/browse/SOLR-2338, this should be in the trunk now yes? I have run svn up on both my lucene and solr installs and it still is not recognizing it on a per field basis. Is the tag different inside a fieldType? Did I not update solr correctly? Where is my mistake? Thanks, Brian Lamb
Similarity per field
Hi all, I sent a mail in about this topic a week ago but now that I have more information about what I am doing, as well as a better understanding of how the similarity class works, I wanted to start a new thread with a bit more information about what I'm doing, what I want to do, and how I can make it work correctly. I have written a similarity class that I would like applied to a specific field. This is how I am defining the fieldType: fieldType name=edgengram_cust class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=1 side=front / /analyzer similarity class=my.package.similarity.MySimilarity/ /fieldType And then I assign a specific field to that fieldType: field name=myfield multiValued=true type=edgengram_cust indexed=true stored=true required=false omitNorms=true / Then, I restarted solr and did a fullimport. However, the changes I have made do not appear to be taking hold. For simplicity, right now I just have the idf function returning 1. When I do a search with debugQuery=on, the idf behaves as it normally does. However, when I search on this field, the idf should be 1 and that is not the case. To try and nail down where the problem occurs, I commented out the similarity class definition in the fieldType and added it globally to the schema file: similarity class=my.package.similarity.MySimilarity/ Then, I restarted solr and did a fullimport. This time, the idf scores were all 1. So it seems to me the problem is not with my similarity class but in trying to apply it to a specific fieldType. According to https://issues.apache.org/jira/browse/SOLR-2338, this should be in the trunk now yes? I have run svn up on both my lucene and solr installs and it still is not recognizing it on a per field basis. Is the tag different inside a fieldType? Did I not update solr correctly? Where is my mistake? Thanks, Brian Lamb
Re: Similarity per field
I looked at the patch page and saw the files that were changed. I went into my install and looked at those same files and found that they had indeed been changed. So it looks like I have the correct version of solr. On Wed, May 25, 2011 at 1:01 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I sent a mail in about this topic a week ago but now that I have more information about what I am doing, as well as a better understanding of how the similarity class works, I wanted to start a new thread with a bit more information about what I'm doing, what I want to do, and how I can make it work correctly. I have written a similarity class that I would like applied to a specific field. This is how I am defining the fieldType: fieldType name=edgengram_cust class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=1 side=front / /analyzer similarity class=my.package.similarity.MySimilarity/ /fieldType And then I assign a specific field to that fieldType: field name=myfield multiValued=true type=edgengram_cust indexed=true stored=true required=false omitNorms=true / Then, I restarted solr and did a fullimport. However, the changes I have made do not appear to be taking hold. For simplicity, right now I just have the idf function returning 1. When I do a search with debugQuery=on, the idf behaves as it normally does. However, when I search on this field, the idf should be 1 and that is not the case. To try and nail down where the problem occurs, I commented out the similarity class definition in the fieldType and added it globally to the schema file: similarity class=my.package.similarity.MySimilarity/ Then, I restarted solr and did a fullimport. This time, the idf scores were all 1. So it seems to me the problem is not with my similarity class but in trying to apply it to a specific fieldType. According to https://issues.apache.org/jira/browse/SOLR-2338, this should be in the trunk now yes? I have run svn up on both my lucene and solr installs and it still is not recognizing it on a per field basis. Is the tag different inside a fieldType? Did I not update solr correctly? Where is my mistake? Thanks, Brian Lamb
Edgengram
Hi all, I'm running into some confusion with the way edgengram works. I have the field set up as: fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=100 side=front / /analyzer /fieldType I've also set up my own similarity class that returns 1 as the idf score. What I've found this does is if I match a string abcdefg against a field containing abcdefghijklmnop, then the idf will score that as a 7: 7.0 = idf(myfield: a=51 ab=23 abc=2 abcd=2 abcde=2 abcdef=2 abcdefg=2) I get why that's happening, but is there a way to avoid that? Do I need to do a new field type to achieve the desired affect? Thanks, Brian Lamb
Re: Similarity
This did the trick. Thanks! On Mon, May 23, 2011 at 5:03 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hmm. I don't add code to Apache packages but create my own packages and namespaces, build a jar and add it to the lib directory as specified in solrconfig. Then you can use the FQCN to in the similarity config to point to the class. May be it can work when messing inside the apache namespace but then you have to build Lucene as well. Okay well this is encouraging. I changed SweetSpotSimilarity to MyClassSimilarity. I created this class in: lucene/contrib/misc/src/java/org/apache/lucene/misc/ I am getting a ClassNotFoundException when I try to start solr. Here is the contents of the MyClassSimilarity file: package org.apache.lucene.misc; import org.apache.lucene.search.DefaultSimilarity; public class MyClassSimilarity extends DefaultSimilarity { public MyClassSimilarity() { super(); } public float idf(int a1, int a2) { return 1; } } So then this raises two questions. Why am I getting a classNotFoundException and how can I go about fixing it? Thanks, Brian Lamb On Mon, May 23, 2011 at 3:41 PM, Markus Jelsma markus.jel...@openindex.iowrote: As far as i know, SweetSpotSimilarty needs be configured. I did use it once but wrapped a factory around it to configure the sweet spot. It worked just as expected and explained in that paper about the subject. If you use a custom similarity that , for example, caps tf to 1. Does it then work? Hi all, I'm having trouble getting the basic similarity example to work. If you notice at the bottom of the schema.xml file, there is a line there that is commented out: !-- similarity class=org.apache.lucene.search.DefaultSimilarity/ -- I uncomment that line and replace it with the following: similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ Which comes natively with lucene. However, the scores before and after making this change are the same. I did a full import both times but that didn't seem to help. I ran svn up on both my solr directory and my lucene directory. Actually, my lucene directory was not previously under svn so I removed everything in there and did svn co http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/ So why isn't my installation taking the SweetSpot Similarity change? Thanks, Brian Lamb
Similarity
Hi all, I'm having trouble getting the basic similarity example to work. If you notice at the bottom of the schema.xml file, there is a line there that is commented out: !-- similarity class=org.apache.lucene.search.DefaultSimilarity/ -- I uncomment that line and replace it with the following: similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ Which comes natively with lucene. However, the scores before and after making this change are the same. I did a full import both times but that didn't seem to help. I ran svn up on both my solr directory and my lucene directory. Actually, my lucene directory was not previously under svn so I removed everything in there and did svn co http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/ So why isn't my installation taking the SweetSpot Similarity change? Thanks, Brian Lamb
Re: Similarity
Okay well this is encouraging. I changed SweetSpotSimilarity to MyClassSimilarity. I created this class in: lucene/contrib/misc/src/java/org/apache/lucene/misc/ I am getting a ClassNotFoundException when I try to start solr. Here is the contents of the MyClassSimilarity file: package org.apache.lucene.misc; import org.apache.lucene.search.DefaultSimilarity; public class MyClassSimilarity extends DefaultSimilarity { public MyClassSimilarity() { super(); } public float idf(int a1, int a2) { return 1; } } So then this raises two questions. Why am I getting a classNotFoundException and how can I go about fixing it? Thanks, Brian Lamb On Mon, May 23, 2011 at 3:41 PM, Markus Jelsma markus.jel...@openindex.iowrote: As far as i know, SweetSpotSimilarty needs be configured. I did use it once but wrapped a factory around it to configure the sweet spot. It worked just as expected and explained in that paper about the subject. If you use a custom similarity that , for example, caps tf to 1. Does it then work? Hi all, I'm having trouble getting the basic similarity example to work. If you notice at the bottom of the schema.xml file, there is a line there that is commented out: !-- similarity class=org.apache.lucene.search.DefaultSimilarity/ -- I uncomment that line and replace it with the following: similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ Which comes natively with lucene. However, the scores before and after making this change are the same. I did a full import both times but that didn't seem to help. I ran svn up on both my solr directory and my lucene directory. Actually, my lucene directory was not previously under svn so I removed everything in there and did svn co http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/ So why isn't my installation taking the SweetSpot Similarity change? Thanks, Brian Lamb
Re: Similarity class for an individual field
Yes. Was that not what I was supposed to do? On Thu, May 19, 2011 at 8:26 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: (11/05/20 3:45), Brian Lamb wrote: Hi all, Based on advice I received on a previous email thread, I applied patch https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to apply a similarity class to certain fields but not all fields. I ran the following commands: $ cdyour Solr trunk checkout dir $ svn up $ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch $ patch -p0 -i SOLR-2338.patch And I did not get any errors. I then created my own SimilarityClass Brian, I'm confused what you did because SOLR-2338 has been resolved in March and committed in trunk, but you did svn up apply patch in your trunk? Koji -- http://www.rondhuit.com/en/
Re: Similarity class for an individual field
So what was my mistake? I still have not resolved this issue. On Fri, May 20, 2011 at 11:22 AM, Brian Lamb brian.l...@journalexperts.comwrote: Yes. Was that not what I was supposed to do? On Thu, May 19, 2011 at 8:26 PM, Koji Sekiguchi k...@r.email.ne.jpwrote: (11/05/20 3:45), Brian Lamb wrote: Hi all, Based on advice I received on a previous email thread, I applied patch https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to apply a similarity class to certain fields but not all fields. I ran the following commands: $ cdyour Solr trunk checkout dir $ svn up $ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch $ patch -p0 -i SOLR-2338.patch And I did not get any errors. I then created my own SimilarityClass Brian, I'm confused what you did because SOLR-2338 has been resolved in March and committed in trunk, but you did svn up apply patch in your trunk? Koji -- http://www.rondhuit.com/en/
Similarity class for an individual field
Hi all, Based on advice I received on a previous email thread, I applied patch https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to apply a similarity class to certain fields but not all fields. I ran the following commands: $ cd your Solr trunk checkout dir $ svn up $ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch $ patch -p0 -i SOLR-2338.patch And I did not get any errors. I then created my own SimilarityClass listed below because it isn't very large: package org.apache.lucene.misc; import org.apache.lucene.search.DefaultSimilarity; public class SimpleSimilarity extends DefaultSimilarity { public SimpleSimilarity() { super(); } public float idf(int dont, int care) { return 1; } } As you can see, it isn't very complicated. I'm just trying to remove the idf from the scoring equation in certain cases. Next, I make a change to the schema.xml file: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SimpleSimilarity/ /fieldType And apply that to the field in question: field name=string_noidf multiValued=true type=string_noidf indexed=true stored=true required=false omitNorms=true / But I think something did not get applied correctly to the patch. I restarted and did a full import but the scores are exactly the same. Also, I tried using the existing SweetSpotSimilarity: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ /fieldType But the scores remained unchanged even in that case. At this point, I'm not quite sure how to debug this to see whether the problem is with the patch or the similarity class but given that the SweetSpot similarity class didn't work either, I'm inclined to think it was a problem with the patch. Any thoughts on this one? Thanks, Brian Lamb
Re: Similarity class for an individual field
Also, I've tried adding: similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ To the end of the schema file so that it is applied globally but it does not appear to change the score either. What am I doing incorrectly? Thanks, Brian Lamb On Thu, May 19, 2011 at 2:45 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, Based on advice I received on a previous email thread, I applied patch https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to apply a similarity class to certain fields but not all fields. I ran the following commands: $ cd your Solr trunk checkout dir $ svn up $ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch $ patch -p0 -i SOLR-2338.patch And I did not get any errors. I then created my own SimilarityClass listed below because it isn't very large: package org.apache.lucene.misc; import org.apache.lucene.search.DefaultSimilarity; public class SimpleSimilarity extends DefaultSimilarity { public SimpleSimilarity() { super(); } public float idf(int dont, int care) { return 1; } } As you can see, it isn't very complicated. I'm just trying to remove the idf from the scoring equation in certain cases. Next, I make a change to the schema.xml file: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SimpleSimilarity/ /fieldType And apply that to the field in question: field name=string_noidf multiValued=true type=string_noidf indexed=true stored=true required=false omitNorms=true / But I think something did not get applied correctly to the patch. I restarted and did a full import but the scores are exactly the same. Also, I tried using the existing SweetSpotSimilarity: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ /fieldType But the scores remained unchanged even in that case. At this point, I'm not quite sure how to debug this to see whether the problem is with the patch or the similarity class but given that the SweetSpot similarity class didn't work either, I'm inclined to think it was a problem with the patch. Any thoughts on this one? Thanks, Brian Lamb
Re: Similarity class for an individual field
I tried editing the SweetSpotSimilarity class located at lucene/contrib/misc/src/java/org/apache/lucene/misc/SweetSpotSimilarity.java to just return 1 for each function and the score does not change at all. This has led me to believe that it does not recognize similarity at all. At this point, all I have for similarity is the line at the end of the file to apply similarity to all searches but that does not even work. So where am I going wrong? Thanks, Brian Lamb On Thu, May 19, 2011 at 3:41 PM, Brian Lamb brian.l...@journalexperts.comwrote: Also, I've tried adding: similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ To the end of the schema file so that it is applied globally but it does not appear to change the score either. What am I doing incorrectly? Thanks, Brian Lamb On Thu, May 19, 2011 at 2:45 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, Based on advice I received on a previous email thread, I applied patch https://issues.apache.org/jira/browse/SOLR-2338. My goal was to be able to apply a similarity class to certain fields but not all fields. I ran the following commands: $ cd your Solr trunk checkout dir $ svn up $ wget https://issues.apache.org/jira/secure/attachment/12475027/SOLR-2338.patch $ patch -p0 -i SOLR-2338.patch And I did not get any errors. I then created my own SimilarityClass listed below because it isn't very large: package org.apache.lucene.misc; import org.apache.lucene.search.DefaultSimilarity; public class SimpleSimilarity extends DefaultSimilarity { public SimpleSimilarity() { super(); } public float idf(int dont, int care) { return 1; } } As you can see, it isn't very complicated. I'm just trying to remove the idf from the scoring equation in certain cases. Next, I make a change to the schema.xml file: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SimpleSimilarity/ /fieldType And apply that to the field in question: field name=string_noidf multiValued=true type=string_noidf indexed=true stored=true required=false omitNorms=true / But I think something did not get applied correctly to the patch. I restarted and did a full import but the scores are exactly the same. Also, I tried using the existing SweetSpotSimilarity: fieldType name=string_noidf class=solr.StrField sortMissingLast=true omitNorms=true similarity class=org.apache.lucene.misc.SweetSpotSimilarity/ /fieldType But the scores remained unchanged even in that case. At this point, I'm not quite sure how to debug this to see whether the problem is with the patch or the similarity class but given that the SweetSpot similarity class didn't work either, I'm inclined to think it was a problem with the patch. Any thoughts on this one? Thanks, Brian Lamb
Re: Disable IDF scoring on certain fields
I believe I have applied the patch correctly. However, I cannot seem to figure out where the similarity class I create should reside. Any tips on that? Thanks, Brian Lamb On Tue, May 17, 2011 at 4:00 PM, Brian Lamb brian.l...@journalexperts.comwrote: Thank you Robert for pointing this out. This is not being used for autocomplete. I already have another core set up for that :-) The idea is like I outlined above. I just want a multivalued field that treats every term in the field the same so that the only way documents separate themselves is by an unrelated boost and/or matching on multiple terms in that field. On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma markus.jel...@openindex.io wrote: Well, if you're experimental you can try trunk as Robert points out it has been fixed there. If not, i guess you're stuck with creating another core. If this fieldType specifically used for auto-completion? If so, another core, preferably on another machine, is in my opinion the way to go. Auto-completion is tough in terms of performance. Thanks Robert for pointing to the Jira ticket. Cheers Hi Markus, I was just looking at overriding DefaultSimilarity so your email was well timed. The problem I have with it is as you mentioned, it does not seem possible to do it on a field by field basis. Has anyone had any luck with doing some of the similarity functions on a field by field basis? I have need to do more than one of them and from what I can find, it seems that only computeNorm accounts for the name of the field. Thanks, Brian Lamb On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Although you can configure per field TF (by omitTermFreqAndPositions) you can't do this for IDF. If you index is only used for this specific purpose (seems like an auto-complete index) then you can override DefaultSimilarity and return a static value for IDF. If you still want IDF for other fields then i think you have a problem because Solr doesn't yet support per-field similarity. http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav a/org/apache/lucene/search/DefaultSimilarity.java?view=markup Cheers, Hi all, I have a field defined in my schema.xml file as fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType field name=myfield multiValued=true type=edgengram indexed=true stored=true required=false omitNorms=true / I would like do disable IDF scoring on this field. I am not interested in how rare the term is, I only care if the term is present or not. The idea is that if a user does a search for myfield:dog OR myfield:pony, that any document containing dog or pony would be scored identically. In the case that both showed up, that record would be moved to the top but all the records where they both showed up would have the same score. So long story short, how can I disable the idf score for this particular field? Thanks, Brian Lamb
Re: MoreLikeThis PDF search
Would I be better off trying to use something like PHP to read the PDF file and extrapolate the information and then pass it on to the MoreLikeThis handler or is there a way it can be done by giving it the PDF directly? On Fri, May 13, 2011 at 4:54 PM, Brian Lamb brian.l...@journalexperts.comwrote: Any thoughts on this one? On Thu, May 12, 2011 at 10:46 AM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I've become more and more familiar with the MoreLikeThis handler over the last several months. I'm curious whether it is possible to do a MoreLikeThis search by uploading a PDF? I looked at the ExtractingRequestHandler and that looks like it that is used to process PDF files and the like but is it possible to combine the two? Just to be clear, I don't want to send a PDF and have that be a part of the index. But rather, I'd like to be able to use the PDF as a MoreLikeThis search. Thanks, Brian Lamb
Disable IDF scoring on certain fields
Hi all, I have a field defined in my schema.xml file as fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType field name=myfield multiValued=true type=edgengram indexed=true stored=true required=false omitNorms=true / I would like do disable IDF scoring on this field. I am not interested in how rare the term is, I only care if the term is present or not. The idea is that if a user does a search for myfield:dog OR myfield:pony, that any document containing dog or pony would be scored identically. In the case that both showed up, that record would be moved to the top but all the records where they both showed up would have the same score. So long story short, how can I disable the idf score for this particular field? Thanks, Brian Lamb
Re: Disable IDF scoring on certain fields
Hi Markus, I was just looking at overriding DefaultSimilarity so your email was well timed. The problem I have with it is as you mentioned, it does not seem possible to do it on a field by field basis. Has anyone had any luck with doing some of the similarity functions on a field by field basis? I have need to do more than one of them and from what I can find, it seems that only computeNorm accounts for the name of the field. Thanks, Brian Lamb On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Although you can configure per field TF (by omitTermFreqAndPositions) you can't do this for IDF. If you index is only used for this specific purpose (seems like an auto-complete index) then you can override DefaultSimilarity and return a static value for IDF. If you still want IDF for other fields then i think you have a problem because Solr doesn't yet support per-field similarity. http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/java/org/apache/lucene/search/DefaultSimilarity.java?view=markup Cheers, Hi all, I have a field defined in my schema.xml file as fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType field name=myfield multiValued=true type=edgengram indexed=true stored=true required=false omitNorms=true / I would like do disable IDF scoring on this field. I am not interested in how rare the term is, I only care if the term is present or not. The idea is that if a user does a search for myfield:dog OR myfield:pony, that any document containing dog or pony would be scored identically. In the case that both showed up, that record would be moved to the top but all the records where they both showed up would have the same score. So long story short, how can I disable the idf score for this particular field? Thanks, Brian Lamb
Re: Disable IDF scoring on certain fields
Thank you Robert for pointing this out. This is not being used for autocomplete. I already have another core set up for that :-) The idea is like I outlined above. I just want a multivalued field that treats every term in the field the same so that the only way documents separate themselves is by an unrelated boost and/or matching on multiple terms in that field. On Tue, May 17, 2011 at 3:55 PM, Markus Jelsma markus.jel...@openindex.iowrote: Well, if you're experimental you can try trunk as Robert points out it has been fixed there. If not, i guess you're stuck with creating another core. If this fieldType specifically used for auto-completion? If so, another core, preferably on another machine, is in my opinion the way to go. Auto-completion is tough in terms of performance. Thanks Robert for pointing to the Jira ticket. Cheers Hi Markus, I was just looking at overriding DefaultSimilarity so your email was well timed. The problem I have with it is as you mentioned, it does not seem possible to do it on a field by field basis. Has anyone had any luck with doing some of the similarity functions on a field by field basis? I have need to do more than one of them and from what I can find, it seems that only computeNorm accounts for the name of the field. Thanks, Brian Lamb On Tue, May 17, 2011 at 3:34 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hi, Although you can configure per field TF (by omitTermFreqAndPositions) you can't do this for IDF. If you index is only used for this specific purpose (seems like an auto-complete index) then you can override DefaultSimilarity and return a static value for IDF. If you still want IDF for other fields then i think you have a problem because Solr doesn't yet support per-field similarity. http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/lucene/src/jav a/org/apache/lucene/search/DefaultSimilarity.java?view=markup Cheers, Hi all, I have a field defined in my schema.xml file as fieldType name=edgengram class=solr.TextField positionIncrementGap=1000 analyzer tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25 side=front / /analyzer /fieldType field name=myfield multiValued=true type=edgengram indexed=true stored=true required=false omitNorms=true / I would like do disable IDF scoring on this field. I am not interested in how rare the term is, I only care if the term is present or not. The idea is that if a user does a search for myfield:dog OR myfield:pony, that any document containing dog or pony would be scored identically. In the case that both showed up, that record would be moved to the top but all the records where they both showed up would have the same score. So long story short, how can I disable the idf score for this particular field? Thanks, Brian Lamb
Re: MoreLikeThis PDF search
Any thoughts on this one? On Thu, May 12, 2011 at 10:46 AM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I've become more and more familiar with the MoreLikeThis handler over the last several months. I'm curious whether it is possible to do a MoreLikeThis search by uploading a PDF? I looked at the ExtractingRequestHandler and that looks like it that is used to process PDF files and the like but is it possible to combine the two? Just to be clear, I don't want to send a PDF and have that be a part of the index. But rather, I'd like to be able to use the PDF as a MoreLikeThis search. Thanks, Brian Lamb
MoreLikeThis PDF search
Hi all, I've become more and more familiar with the MoreLikeThis handler over the last several months. I'm curious whether it is possible to do a MoreLikeThis search by uploading a PDF? I looked at the ExtractingRequestHandler and that looks like it that is used to process PDF files and the like but is it possible to combine the two? Just to be clear, I don't want to send a PDF and have that be a part of the index. But rather, I'd like to be able to use the PDF as a MoreLikeThis search. Thanks, Brian Lamb
Changing the schema
If I change the field type in my schema, do I need to rebuild the entire index? I'm at a point now where it takes over a day to do a full import due to the sheer size of my application and I would prefer not having to reindex just because I want to make a change somewhere. Thanks, Brian Lamb
Re: Solr security
Great posts all. I will give these a look and come up with something based on these recommendations. I'm sure as I begin implementing something, I will have more questions arise. On Tue, May 10, 2011 at 9:00 AM, Anthony Wlodarski anth...@tinkertownlabs.com wrote: The WIKI has a loose interpretation of how to set-up Jetty securely. Please take a look at the article I wrote here: http://anthonyw.net/2011/04/securing-jetty-and-solr-with-php-authentication/. Even if PHP is not your language that sits on top of Solr you can still use the first part of the tutorial. If you are using Tomcat I would recommend looking here: http://blog.comtaste.com/2009/02/securing_your_solr_server_on_t.html Regards, -Anthony On 05/09/2011 05:28 PM, Jan Høydahl wrote: Hi, You can simply configure a firewall on your Solr server to only allow access from your frontend server. Whether you use the built-in software firewall of Linux/Windows/Whatever or use some other FW utility is a choice you need to make. This is by design - you should never ever expose your backend services, whether it's a search server or a database server, to the public. Read more about Solr security on the WIKI: http://wiki.apache.org/solr/SolrSecurity -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 9. mai 2011, at 20.57, Brian Lamb wrote: Hi all, Is it possible to set up solr so that it will only execute dataimport commands if they come from localhost? Right now, my application and my solr installation are on different servers so any requests are formatted http://domain:8983 instead of http://localhost:8983. I am concerned that when I launch my application, there will be the potential for abuse. Is the best solution to have everything reside on the same server? What are some other solutions? Thanks, Brian Lamb -- Anthony Wlodarski Lead Software Engineer Get2Know.me (http://www.get2know.me) Office: 646-285-0500 x217 Fax: 646-285-0400
Solr security
Hi all, Is it possible to set up solr so that it will only execute dataimport commands if they come from localhost? Right now, my application and my solr installation are on different servers so any requests are formatted http://domain:8983 instead of http://localhost:8983. I am concerned that when I launch my application, there will be the potential for abuse. Is the best solution to have everything reside on the same server? What are some other solutions? Thanks, Brian Lamb
Negative boost
Hi all, I understand that the only way to simulate a negative boost is to positively boost the inverse. I have looked at http://wiki.apache.org/solr/SolrRelevancyFAQ but I think I am missing something on the formatting of my query. I am using: http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1 In this case, I am trying to search for records about dog but to put records containing Sheltie closer to the bottom as I am not really interested in that. However, the following queries: http://localhost:8983/solr/search?q=dog http://localhost:8983/solr/search?q=dogbq=(*:* -species:Sheltie)^1 Return the exact same set of results with a record about a Sheltie as the top result each time. What am I doing incorrectly? Thanks, Brian Lamb
Re: MoreLikeThis
It finds something under match but just nothing under response. I tried turning on debugQuery=on but I did not see anything that jumped out at me as a bug or anything. Is there some kind of threshold setting that I can tinker with to see if that is the problem? On Sun, Apr 24, 2011 at 2:37 AM, Grant Ingersoll gsing...@apache.orgwrote: On Apr 21, 2011, at 8:46 PM, Brian Lamb wrote: Hi all, I have an mlt search set up on my site with over 2 million records in the index. Normally, my results look like: response lst name=responseHeader int name=status0/int int name=QTime204/int /lst result name=match numFound=41750 start=0 doc str name=titleSome result./str /doc /result result name=response numFound=130872 start=0 doc str name=titleA similar result/str /doc ... /result /response And there are 100 results under response. However, in some cases, there are no results under response. Why is this the case and is there anything I can do about it? Is it because it couldn't find anything? Or are you thinking there is a bug? You might try adding debugQuery=true and see what gets parsed, etc. and then try running that query. Here is my mlt configuration: requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.fltitle,score/str int name=mlt.mindf1/int int name=rows100/int str name=fl*,score/str /lst /requestHandler And here is the URL I use to get results: http://localhost:8983/solr/mlt/?q=title:Some random title Any help on this matter would be greatly appreciated. Thanks! -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search
MoreLikeThis
Hi all, I have an mlt search set up on my site with over 2 million records in the index. Normally, my results look like: response lst name=responseHeader int name=status0/int int name=QTime204/int /lst result name=match numFound=41750 start=0 doc str name=titleSome result./str /doc /result result name=response numFound=130872 start=0 doc str name=titleA similar result/str /doc ... /result /response And there are 100 results under response. However, in some cases, there are no results under response. Why is this the case and is there anything I can do about it? Here is my mlt configuration: requestHandler name=/mlt class=solr.MoreLikeThisHandler lst name=defaults str name=mlt.fltitle,score/str int name=mlt.mindf1/int int name=rows100/int str name=fl*,score/str /lst /requestHandler And here is the URL I use to get results: http://localhost:8983/solr/mlt/?q=title:Some random title Any help on this matter would be greatly appreciated. Thanks! Brian Lamb
Re: MoreLikeThis match
Does anyone have any thoughts on this one? On Fri, Apr 8, 2011 at 9:26 AM, Brian Lamb brian.l...@journalexperts.comwrote: I've looked at both wiki pages and none really clarify the difference between these two. If I copy and paste an existing index value for field and do an mlt search, it shows up under match but not results. What is the difference between these two? On Thu, Apr 7, 2011 at 2:24 PM, Brian Lamb brian.l...@journalexperts.comwrote: Actually, what is the difference between match and response? It seems that match always returns one result but I've thrown a few cases at it where the score of the highest response is higher than the score of match. And then there are cases where the match score dwarfs the highest response score. On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I've been using MoreLikeThis for a while through select: http://localhost:8983/solr/select/?q=field:more like thismlt=truemlt.fl=fieldrows=100fl=*,score I was looking over the wiki page today and saw that you can also do this: http://localhost:8983/solr/mlt/?q=field:more like thismlt=truemlt.fl=fieldrows=100 which seems to run faster and do a better job overall. When the results are returned, they are formatted like this: response lst name=responseHeader int name=status0/int int name=QTime1/int /lst result name=match numFound=24 start=0 maxScore=3.0438285 doc float name=score3.0438285/float str name=id5/str /doc /result result name=response numFound=4077 start=0 maxScore=0.12775186 doc float name=score0.1125823/float str name=id3/str /doc doc float name=score0.10231556/float str name=id8/str /doc ... /result /response It seems that it always returns just 1 response under match and response is set by the rows parameter. How can I get more than one result under match? What I'm trying to do here is whatever is set for field:, I would like to return the top 100 records that match that search based on more like this. Thanks, Brian Lamb
Re: MoreLikeThis match
I've looked at both wiki pages and none really clarify the difference between these two. If I copy and paste an existing index value for field and do an mlt search, it shows up under match but not results. What is the difference between these two? On Thu, Apr 7, 2011 at 2:24 PM, Brian Lamb brian.l...@journalexperts.comwrote: Actually, what is the difference between match and response? It seems that match always returns one result but I've thrown a few cases at it where the score of the highest response is higher than the score of match. And then there are cases where the match score dwarfs the highest response score. On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I've been using MoreLikeThis for a while through select: http://localhost:8983/solr/select/?q=field:more like thismlt=truemlt.fl=fieldrows=100fl=*,score I was looking over the wiki page today and saw that you can also do this: http://localhost:8983/solr/mlt/?q=field:more like thismlt=truemlt.fl=fieldrows=100 which seems to run faster and do a better job overall. When the results are returned, they are formatted like this: response lst name=responseHeader int name=status0/int int name=QTime1/int /lst result name=match numFound=24 start=0 maxScore=3.0438285 doc float name=score3.0438285/float str name=id5/str /doc /result result name=response numFound=4077 start=0 maxScore=0.12775186 doc float name=score0.1125823/float str name=id3/str /doc doc float name=score0.10231556/float str name=id8/str /doc ... /result /response It seems that it always returns just 1 response under match and response is set by the rows parameter. How can I get more than one result under match? What I'm trying to do here is whatever is set for field:, I would like to return the top 100 records that match that search based on more like this. Thanks, Brian Lamb
MoreLikeThis match
Hi all, I've been using MoreLikeThis for a while through select: http://localhost:8983/solr/select/?q=field:more like thismlt=truemlt.fl=fieldrows=100fl=*,score I was looking over the wiki page today and saw that you can also do this: http://localhost:8983/solr/mlt/?q=field:more like thismlt=truemlt.fl=fieldrows=100 which seems to run faster and do a better job overall. When the results are returned, they are formatted like this: response lst name=responseHeader int name=status0/int int name=QTime1/int /lst result name=match numFound=24 start=0 maxScore=3.0438285 doc float name=score3.0438285/float str name=id5/str /doc /result result name=response numFound=4077 start=0 maxScore=0.12775186 doc float name=score0.1125823/float str name=id3/str /doc doc float name=score0.10231556/float str name=id8/str /doc ... /result /response It seems that it always returns just 1 response under match and response is set by the rows parameter. How can I get more than one result under match? What I'm trying to do here is whatever is set for field:, I would like to return the top 100 records that match that search based on more like this. Thanks, Brian Lamb
Re: MoreLikeThis match
Actually, what is the difference between match and response? It seems that match always returns one result but I've thrown a few cases at it where the score of the highest response is higher than the score of match. And then there are cases where the match score dwarfs the highest response score. On Thu, Apr 7, 2011 at 1:30 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I've been using MoreLikeThis for a while through select: http://localhost:8983/solr/select/?q=field:more like thismlt=truemlt.fl=fieldrows=100fl=*,score I was looking over the wiki page today and saw that you can also do this: http://localhost:8983/solr/mlt/?q=field:more like thismlt=truemlt.fl=fieldrows=100 which seems to run faster and do a better job overall. When the results are returned, they are formatted like this: response lst name=responseHeader int name=status0/int int name=QTime1/int /lst result name=match numFound=24 start=0 maxScore=3.0438285 doc float name=score3.0438285/float str name=id5/str /doc /result result name=response numFound=4077 start=0 maxScore=0.12775186 doc float name=score0.1125823/float str name=id3/str /doc doc float name=score0.10231556/float str name=id8/str /doc ... /result /response It seems that it always returns just 1 response under match and response is set by the rows parameter. How can I get more than one result under match? What I'm trying to do here is whatever is set for field:, I would like to return the top 100 records that match that search based on more like this. Thanks, Brian Lamb
Re: Matching the beginning of a word within a term
Thank you both for your replies. It looks like EdgeNGramFilter will do the job nicely. Time to reindex...again. On Fri, Apr 1, 2011 at 8:31 AM, Jan Høydahl jan@cominvent.com wrote: Check out http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory Don't know if it works with phrases though -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com On 31. mars 2011, at 16.49, Brian Lamb wrote: No, I don't really want to break down the words into subwords. In the example I provided, I would not want kind to match either record because it is not at the beginning of the word even though kind appears in both records as part of a word. On Wed, Mar 30, 2011 at 4:42 PM, lboutros boutr...@gmail.com wrote: Do you want to tokenize subwords based on dictionaries ? A bit like disagglutination of german words ? If so, something like this could help : DictionaryCompoundWordTokenFilter http://search.lucidimagination.com/search/document/CDRG_ch05_5.8.8 Ludovic http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html 2011/3/30 Brian Lamb [via Lucene] ml-node+2754668-300063934-383...@n3.nabble.com Hi all, I have a field set up like this: field name=common_names multiValued=true type=text indexed=true stored=true required=false / And I have some records: RECORD1 arr name=common_names strcompanion to mankind/str strpooch/str /arr RECORD2 arr name=common_names strcompanion to womankind/str strman's worst enemy/str /arr I would like to write a query that will match the beginning of a word within the term. Here is the query I would use as it exists now: http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND%20df=common_names} companion man~10 In the above example. I would want to return only RECORD1. The query as it exists right now is designed to only match records where both words are present in the same term. So if I changed man to mankind in the query, RECORD1 will be returned. Even though the phrases companion and man exist in the same term in RECORD2, I do not want RECORD2 to be returned because 'man' is not at the beginning of the word. How can I achieve this? Thanks, Brian Lamb -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2754668.html To start a new topic under Solr - User, email ml-node+472068-1765922688-383...@n3.nabble.com To unsubscribe from Solr - User, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE= . - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2755561.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Matching on a multi valued field
I just noticed Juan's response and I find that I am encountering that very issue in a few cases. Boosting is a good way to put the more relevant results to the top but it is possible to only have the correct results returned? On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb brian.l...@journalexperts.comwrote: Thank you all for your responses. The field had already been set up with positionIncrementGap=100 so I just needed to add in the slop. On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora jua...@informa.eswrote: A multiValued field is actually a single field with all data separated with positionIncrement. Try setting that value high enough and use a PhraseQuery. That is true but you cannot do things like: q=bar* foo*~10 with default query search. and if you use dismax you will have the same problems with multivalued fields. Imagine the situation: Doc1: field A: [foo bar,dooh] 2 values Doc2: field A: [bar dooh, whatever] Another 2 values the query: qt=dismax qf= fieldA q = ( bar dooh ) will return both Doc1 and Doc2. The only thing you can do in this situation is boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first position of the results: pf = fieldA^1 Thanks, JP. El 29/03/2011, a las 23:14, Markus Jelsma escribió: orly, all replies came in while sending =) Hi, Your filter query is looking for a match of man's friend in a single field. Regardless of analysis of the common_names field, all terms are present in the common_names field of both documents. A multiValued field is actually a single field with all data separated with positionIncrement. Try setting that value high enough and use a PhraseQuery. That should work Cheers, Hi all, I have a field set up like this: field name=common_names multiValued=true type=text indexed=true stored=true required=false / And I have some records: RECORD1 arr name=common_names strman's best friend/str strpooch/str /arr RECORD2 arr name=common_names strman's worst enemy/str strfriend to no one/str /arr Now if I do a search such as: http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND df=common_names}man's friend Both records are returned. However, I only want RECORD1 returned. I understand why RECORD2 is returned but how can I structure my query so that only RECORD1 is returned? Thanks, Brian Lamb
Re: Matching the beginning of a word within a term
No, I don't really want to break down the words into subwords. In the example I provided, I would not want kind to match either record because it is not at the beginning of the word even though kind appears in both records as part of a word. On Wed, Mar 30, 2011 at 4:42 PM, lboutros boutr...@gmail.com wrote: Do you want to tokenize subwords based on dictionaries ? A bit like disagglutination of german words ? If so, something like this could help : DictionaryCompoundWordTokenFilter http://search.lucidimagination.com/search/document/CDRG_ch05_5.8.8 Ludovic http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilter.html 2011/3/30 Brian Lamb [via Lucene] ml-node+2754668-300063934-383...@n3.nabble.com Hi all, I have a field set up like this: field name=common_names multiValued=true type=text indexed=true stored=true required=false / And I have some records: RECORD1 arr name=common_names strcompanion to mankind/str strpooch/str /arr RECORD2 arr name=common_names strcompanion to womankind/str strman's worst enemy/str /arr I would like to write a query that will match the beginning of a word within the term. Here is the query I would use as it exists now: http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND%20df=common_names} companion man~10 In the above example. I would want to return only RECORD1. The query as it exists right now is designed to only match records where both words are present in the same term. So if I changed man to mankind in the query, RECORD1 will be returned. Even though the phrases companion and man exist in the same term in RECORD2, I do not want RECORD2 to be returned because 'man' is not at the beginning of the word. How can I achieve this? Thanks, Brian Lamb -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2754668.html To start a new topic under Solr - User, email ml-node+472068-1765922688-383...@n3.nabble.com To unsubscribe from Solr - User, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=472068code=Ym91dHJvc2xAZ21haWwuY29tfDQ3MjA2OHw0Mzk2MDUxNjE= . - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Matching-the-beginning-of-a-word-within-a-term-tp2754668p2755561.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Matching on a multi valued field
Thank you all for your responses. The field had already been set up with positionIncrementGap=100 so I just needed to add in the slop. On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora jua...@informa.es wrote: A multiValued field is actually a single field with all data separated with positionIncrement. Try setting that value high enough and use a PhraseQuery. That is true but you cannot do things like: q=bar* foo*~10 with default query search. and if you use dismax you will have the same problems with multivalued fields. Imagine the situation: Doc1: field A: [foo bar,dooh] 2 values Doc2: field A: [bar dooh, whatever] Another 2 values the query: qt=dismax qf= fieldA q = ( bar dooh ) will return both Doc1 and Doc2. The only thing you can do in this situation is boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first position of the results: pf = fieldA^1 Thanks, JP. El 29/03/2011, a las 23:14, Markus Jelsma escribió: orly, all replies came in while sending =) Hi, Your filter query is looking for a match of man's friend in a single field. Regardless of analysis of the common_names field, all terms are present in the common_names field of both documents. A multiValued field is actually a single field with all data separated with positionIncrement. Try setting that value high enough and use a PhraseQuery. That should work Cheers, Hi all, I have a field set up like this: field name=common_names multiValued=true type=text indexed=true stored=true required=false / And I have some records: RECORD1 arr name=common_names strman's best friend/str strpooch/str /arr RECORD2 arr name=common_names strman's worst enemy/str strfriend to no one/str /arr Now if I do a search such as: http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND df=common_names}man's friend Both records are returned. However, I only want RECORD1 returned. I understand why RECORD2 is returned but how can I structure my query so that only RECORD1 is returned? Thanks, Brian Lamb
Matching the beginning of a word within a term
Hi all, I have a field set up like this: field name=common_names multiValued=true type=text indexed=true stored=true required=false / And I have some records: RECORD1 arr name=common_names strcompanion to mankind/str strpooch/str /arr RECORD2 arr name=common_names strcompanion to womankind/str strman's worst enemy/str /arr I would like to write a query that will match the beginning of a word within the term. Here is the query I would use as it exists now: http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND%20df=common_names}companion man~10 In the above example. I would want to return only RECORD1. The query as it exists right now is designed to only match records where both words are present in the same term. So if I changed man to mankind in the query, RECORD1 will be returned. Even though the phrases companion and man exist in the same term in RECORD2, I do not want RECORD2 to be returned because 'man' is not at the beginning of the word. How can I achieve this? Thanks, Brian Lamb
String field
Hi all, I'm a little confused about the string field. I read somewhere that if I want to do an exact match, I should use an exact match. So I made a few modifications to my schema file: field name=id type=string indexed=true stored=true required=false / field name=common_names multiValued=true type=string indexed=true stored=true required=false / field name=genus type=string indexed=true stored=true required=false / field name=species type=string indexed=true stored=true required=false / And did a full import but when I do a search and return all fields, only id is showing up. The only difference is that id is my primary key field so that could be why it is showing up but why aren't the others showing up? Thanks, Brian Lamb
Re: String field
The full import wasn't spitting out any errors on the web page but in looking at the logs, there were errors. Correcting those errors solved that issue. Thanks, Brian Lamb On Tue, Mar 29, 2011 at 2:44 PM, Erick Erickson erickerick...@gmail.comwrote: try the schema browser from the admin page to be sure the fields you *think* are in the index really are. Did you do a commit after indexing? Did you re-index after the schema changes? Are you 100% sure that, if you did re-index, the new fields were in the docs submitted? Best Erick On Tue, Mar 29, 2011 at 11:46 AM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I'm a little confused about the string field. I read somewhere that if I want to do an exact match, I should use an exact match. So I made a few modifications to my schema file: field name=id type=string indexed=true stored=true required=false / field name=common_names multiValued=true type=string indexed=true stored=true required=false / field name=genus type=string indexed=true stored=true required=false / field name=species type=string indexed=true stored=true required=false / And did a full import but when I do a search and return all fields, only id is showing up. The only difference is that id is my primary key field so that could be why it is showing up but why aren't the others showing up? Thanks, Brian Lamb
Matching on a multi valued field
Hi all, I have a field set up like this: field name=common_names multiValued=true type=text indexed=true stored=true required=false / And I have some records: RECORD1 arr name=common_names strman's best friend/str strpooch/str /arr RECORD2 arr name=common_names strman's worst enemy/str strfriend to no one/str /arr Now if I do a search such as: http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND df=common_names}man's friend Both records are returned. However, I only want RECORD1 returned. I understand why RECORD2 is returned but how can I structure my query so that only RECORD1 is returned? Thanks, Brian Lamb
Re: Default operator
Thank you both for your input. I ended up using Ahmet's way because it seems to fit better with the rest of the application. On Sat, Mar 26, 2011 at 6:02 AM, lboutros boutr...@gmail.com wrote: The other way could be to extend the SolrQueryParser to read a per field default operator in the solr config file. Then it should be possible to override this functions : setDefaultOperator getDefaultOperator and this two which are using the default operator : getFieldQuery addClause The you just have to declare it in the solr config file and configure your default operators. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Default-operator-tp2732237p2734931.html Sent from the Solr - User mailing list archive at Nabble.com.
Default operator
Hi all, I know that I can change the default operator in two ways: 1) *solrQueryParser defaultOperator*=AND|OR/ 2) Add q.op=AND I'm wondering if it is possible to change the default operator for a specific field only? For example, if I use the URL: http://localhost:8983/solr/search/?q=animal:german shepherdtype:dog canine I would want it to effectively be: http://localhost:8983/solr/search/?q=animal:german AND shepherdtype:dog OR canine Other than parsing the URL before I send it out, is there a way to do this? Thanks, Brian Lamb
Re: Adding the suggest component
I'm still confused as to why I'm getting this error. To me it reads that the .java file was declared incorrectly but I shouldn't need to change those files so where am I doing something incorrectly? On Tue, Mar 22, 2011 at 3:40 PM, Brian Lamb brian.l...@journalexperts.comwrote: That fixed that error as well as the could not initialize Dataimport class error. Now I'm getting: org.apache.solr.common.SolrException: Error Instantiating Request Handler, org.apache.solr.handler.dataimport.DataImportHandler is not a org.apache.solr.request.SolrRequestHandler I can't find anything on this one. What I've added to the solrconfig.xml file matches whats in example-DIH so I don't quite understand what the issue is here. It sounds to me like it is not declared properly somewhere but I'm not sure where/why. Here is the relevant portion of my solrconfig.xml file: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler Thanks for all the help so far. You all have been great. Brian Lamb On Tue, Mar 22, 2011 at 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote: java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.handler.dataimport.DataImportHandler at java.lang.Class.forName0(Native Method) java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory at org.apache.solr.handler.dataimport.DataImportHandler.clinit(DataImportHandler.java:72) Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:217) You can find slf4j- related jars in \trunk\solr\lib, but this error is weird.
Re: Adding the suggest component
Thank you for the suggestion. I followed your advice and was able to get a version up and running. Thanks again for all the help! On Wed, Mar 23, 2011 at 1:55 PM, Ahmet Arslan iori...@yahoo.com wrote: I'm still confused as to why I'm getting this error. To me it reads that the .java file was declared incorrectly but I shouldn't need to change those files so where am I doing something incorrectly? Brian, I think best thing to do is checkout a new clean copy from subversion and then do things step by step on this clean copy.
Re: Adding the suggest component
Thanks everyone for the advice. I checked out a recent version from SVN and ran: ant clean example This worked just fine. However when I went to start the solr server, I get this error message: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' It looks like those files are there: contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/ But for some reason, they aren't able to be found. Where would I update this setting and what would I update it to? Thanks, Brian Lamb On Mon, Mar 21, 2011 at 10:15 AM, Erick Erickson erickerick...@gmail.comwrote: OK, I think you're jumping ahead and trying to do too many things at once. What did you download? Source? The distro? The error you posted usually happens for me when I haven't compiled the example target from source. So I'd guess you don't have the proper targets built. This assumes you downloaded the source via SVN. If you downloaded a distro, I'd start by NOT copying anything anywhere, just go to the example code and start Solr. Make sure you have what you think you have. I've seen interesting things get cured by removing the entire directory where your servlet container unpacks war files, but that's usually in development environments. When I get in these situations, I usually find it's best to back up, do one thing at a time and verify that I get the expected results at each step. It's tedious, but Best Erick On Fri, Mar 18, 2011 at 4:18 PM, Ahmet Arslan iori...@yahoo.com wrote: downloaded a recent version and there were the following files/folders: build.xml dev-tools LICENSE.txt lucene NOTICE.txt README.txt solr So I did cp -r solr/* /path/to/solr/stuff/ and started solr. I didn't get any error message but I only got the following messages: How do you start solr? using java -jar start.jar? Did you run 'ant clean example' in the solr folder?
Re: Adding the suggest component
I found the following in the build.xml file: invoke-javadoc destdir=${build.javadoc} sources packageset dir=${src}/common / packageset dir=${src}/solrj / packageset dir=${src}/java / packageset dir=${src}/webapp/src / packageset dir=contrib/dataimporthandler/src/main/java / packageset dir=contrib/clustering/src/main/java / packageset dir=contrib/extraction/src/main/java / packageset dir=contrib/uima/src/main/java / packageset dir=contrib/analysis-extras/src/java / group title=Core packages=org.apache.* / group title=Common packages=org.apache.solr.common.* / group title=SolrJ packages=org.apache.solr.client.solrj* / group title=contrib: DataImportHandler packages=org.apache.solr.handler.dataimport* / group title=contrib: Clustering packages=org.apache.solr.handler.clustering* / group title=contrib: Solr Cell packages=org.apache.solr.handler.extraction* / group title=contrib: Solr UIMA packages=org.apache.solr.uima* / /sources /invoke-javadoc It looks like the dataimport handler path is correct in there so I don't understand why it's not being compile. I ran ant example again today but I'm still getting the same error. Thanks, Brian Lamb On Tue, Mar 22, 2011 at 11:28 AM, Brian Lamb brian.l...@journalexperts.comwrote: Thanks everyone for the advice. I checked out a recent version from SVN and ran: ant clean example This worked just fine. However when I went to start the solr server, I get this error message: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' It looks like those files are there: contrib/dataimporthandler/src/main/java/org/apache/solr/handler/dataimport/ But for some reason, they aren't able to be found. Where would I update this setting and what would I update it to? Thanks, Brian Lamb On Mon, Mar 21, 2011 at 10:15 AM, Erick Erickson erickerick...@gmail.comwrote: OK, I think you're jumping ahead and trying to do too many things at once. What did you download? Source? The distro? The error you posted usually happens for me when I haven't compiled the example target from source. So I'd guess you don't have the proper targets built. This assumes you downloaded the source via SVN. If you downloaded a distro, I'd start by NOT copying anything anywhere, just go to the example code and start Solr. Make sure you have what you think you have. I've seen interesting things get cured by removing the entire directory where your servlet container unpacks war files, but that's usually in development environments. When I get in these situations, I usually find it's best to back up, do one thing at a time and verify that I get the expected results at each step. It's tedious, but Best Erick On Fri, Mar 18, 2011 at 4:18 PM, Ahmet Arslan iori...@yahoo.com wrote: downloaded a recent version and there were the following files/folders: build.xml dev-tools LICENSE.txt lucene NOTICE.txt README.txt solr So I did cp -r solr/* /path/to/solr/stuff/ and started solr. I didn't get any error message but I only got the following messages: How do you start solr? using java -jar start.jar? Did you run 'ant clean example' in the solr folder?
Re: Adding the suggest component
Awesome! That fixed that problem. I'm getting another class not found error but I'll see if I can fix it on my own first. On Tue, Mar 22, 2011 at 11:56 AM, Ahmet Arslan iori...@yahoo.com wrote: --- On Tue, 3/22/11, Brian Lamb brian.l...@journalexperts.com wrote: From: Brian Lamb brian.l...@journalexperts.com Subject: Re: Adding the suggest component To: solr-user@lucene.apache.org Cc: Erick Erickson erickerick...@gmail.com Date: Tuesday, March 22, 2011, 5:28 PM Thanks everyone for the advice. I checked out a recent version from SVN and ran: ant clean example This worked just fine. However when I went to start the solr server, I get this error message: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' run 'ant clean dist' and copy trunk/solr/dist/ apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar apache-solr-dataimporthandler-4.0-SNAPSHOT.jar to solrHome/lib directory.
Re: Adding the suggest component
I fixed a few other exceptions it threw when I started the server but I don't know how to fix this one: java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.handler.dataimport.DataImportHandler at java.lang.Class.forName0(Native Method) java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory at org.apache.solr.handler.dataimport.DataImportHandler.clinit(DataImportHandler.java:72) Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:217) I've searched Google but haven't been able to find a reason why this happens and how to fix it. Thanks, Brian Lamb On Tue, Mar 22, 2011 at 12:54 PM, Brian Lamb brian.l...@journalexperts.comwrote: Awesome! That fixed that problem. I'm getting another class not found error but I'll see if I can fix it on my own first. On Tue, Mar 22, 2011 at 11:56 AM, Ahmet Arslan iori...@yahoo.com wrote: --- On Tue, 3/22/11, Brian Lamb brian.l...@journalexperts.com wrote: From: Brian Lamb brian.l...@journalexperts.com Subject: Re: Adding the suggest component To: solr-user@lucene.apache.org Cc: Erick Erickson erickerick...@gmail.com Date: Tuesday, March 22, 2011, 5:28 PM Thanks everyone for the advice. I checked out a recent version from SVN and ran: ant clean example This worked just fine. However when I went to start the solr server, I get this error message: SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler' run 'ant clean dist' and copy trunk/solr/dist/ apache-solr-dataimporthandler-extras-4.0-SNAPSHOT.jar apache-solr-dataimporthandler-4.0-SNAPSHOT.jar to solrHome/lib directory.
Re: Adding the suggest component
That fixed that error as well as the could not initialize Dataimport class error. Now I'm getting: org.apache.solr.common.SolrException: Error Instantiating Request Handler, org.apache.solr.handler.dataimport.DataImportHandler is not a org.apache.solr.request.SolrRequestHandler I can't find anything on this one. What I've added to the solrconfig.xml file matches whats in example-DIH so I don't quite understand what the issue is here. It sounds to me like it is not declared properly somewhere but I'm not sure where/why. Here is the relevant portion of my solrconfig.xml file: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler Thanks for all the help so far. You all have been great. Brian Lamb On Tue, Mar 22, 2011 at 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote: java.lang.NoClassDefFoundError: Could not initialize class org.apache.solr.handler.dataimport.DataImportHandler at java.lang.Class.forName0(Native Method) java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory at org.apache.solr.handler.dataimport.DataImportHandler.clinit(DataImportHandler.java:72) Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory at java.net.URLClassLoader$1.run(URLClassLoader.java:217) You can find slf4j- related jars in \trunk\solr\lib, but this error is weird.
Re: Adding the suggest component
That does seem like a better solution. I downloaded a recent version and there were the following files/folders: build.xml dev-tools LICENSE.txt lucene NOTICE.txt README.txt solr So I did cp -r solr/* /path/to/solr/stuff/ and started solr. I didn't get any error message but I only got the following messages: 2011-03-18 14:11:02.016:INFO::Logging to STDERR via org.mortbay.log.StdErrLog 2011-03-18 14:11:02.240:INFO::jetty-6.1-SNAPSHOT 2011-03-18 14:11:02.284:INFO::Started SocketConnector@0.0.0.0:8983 Where as before I got a bunch of messages indicating various libraries had been loaded. Additionally, when I go to http://localhost/solr/admin/, I get the following message: HTTP ERROR: 404 Problem accessing /solr/admin. Reason: NOT_FOUND What did I do incorrectly? Thanks, Brian Lamb On Fri, Mar 18, 2011 at 9:04 AM, Erick Erickson erickerick...@gmail.comwrote: What do you mean you copied the contents...to the right place? If you checked out trunk and copied the files into 1.4.1, you have mixed source files between disparate versions. All bets are off. Or do you mean jar files? or??? I'd build the source you checked out (at the Solr level) and use that rather than try to mix-n-match. BTW, if you're just starting (as in not in production), you may want to consider using 3.1, as it's being released even as we speak and has many improvements over 1.4. You can get a nightly build from here: https://builds.apache.org/hudson/view/S-Z/view/Solr/ Best Erick On Thu, Mar 17, 2011 at 3:36 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, When I installed Solr, I downloaded the most recent version (1.4.1) I believe. I wanted to implement the Suggester ( http://wiki.apache.org/solr/Suggester). I copied and pasted the information there into my solrconfig.xml file but I'm getting the following error: Error loading class 'org.apache.solr.spelling.suggest.Suggester' I read up on this error and found that I needed to checkout a newer version from SVN. I checked out a full version and copied the contents of src/java/org/apache/spelling/suggest to the same location on my set up. However, I am still receiving this error. Did I not put the files in the right place? What am I doing incorrectly? Thanks, Brian Lamb
Re: Adding the suggest component
Sorry, that was a typo on my part. I was using http://localhost:8983/solr/admin and getting the above error messages. On Fri, Mar 18, 2011 at 2:57 PM, Geert-Jan Brits gbr...@gmail.com wrote: 2011-03-18 14:11:02.284:INFO::Started SocketConnector@0.0.0.0:8983 Solr started on port 8983 instead of this: http://localhost/solr/admin/ try this instead: http://localhost:8983/solr/admin/ http://localhost/solr/admin/ Cheers, Geert-Jan 2011/3/18 Brian Lamb brian.l...@journalexperts.com That does seem like a better solution. I downloaded a recent version and there were the following files/folders: build.xml dev-tools LICENSE.txt lucene NOTICE.txt README.txt solr So I did cp -r solr/* /path/to/solr/stuff/ and started solr. I didn't get any error message but I only got the following messages: 2011-03-18 14:11:02.016:INFO::Logging to STDERR via org.mortbay.log.StdErrLog 2011-03-18 14:11:02.240:INFO::jetty-6.1-SNAPSHOT 2011-03-18 14:11:02.284:INFO::Started SocketConnector@0.0.0.0:8983 Where as before I got a bunch of messages indicating various libraries had been loaded. Additionally, when I go to http://localhost/solr/admin/, I get the following message: HTTP ERROR: 404 Problem accessing /solr/admin. Reason: NOT_FOUND What did I do incorrectly? Thanks, Brian Lamb On Fri, Mar 18, 2011 at 9:04 AM, Erick Erickson erickerick...@gmail.com wrote: What do you mean you copied the contents...to the right place? If you checked out trunk and copied the files into 1.4.1, you have mixed source files between disparate versions. All bets are off. Or do you mean jar files? or??? I'd build the source you checked out (at the Solr level) and use that rather than try to mix-n-match. BTW, if you're just starting (as in not in production), you may want to consider using 3.1, as it's being released even as we speak and has many improvements over 1.4. You can get a nightly build from here: https://builds.apache.org/hudson/view/S-Z/view/Solr/ Best Erick On Thu, Mar 17, 2011 at 3:36 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, When I installed Solr, I downloaded the most recent version (1.4.1) I believe. I wanted to implement the Suggester ( http://wiki.apache.org/solr/Suggester). I copied and pasted the information there into my solrconfig.xml file but I'm getting the following error: Error loading class 'org.apache.solr.spelling.suggest.Suggester' I read up on this error and found that I needed to checkout a newer version from SVN. I checked out a full version and copied the contents of src/java/org/apache/spelling/suggest to the same location on my set up. However, I am still receiving this error. Did I not put the files in the right place? What am I doing incorrectly? Thanks, Brian Lamb
Adding the suggest component
Hi all, When I installed Solr, I downloaded the most recent version (1.4.1) I believe. I wanted to implement the Suggester ( http://wiki.apache.org/solr/Suggester). I copied and pasted the information there into my solrconfig.xml file but I'm getting the following error: Error loading class 'org.apache.solr.spelling.suggest.Suggester' I read up on this error and found that I needed to checkout a newer version from SVN. I checked out a full version and copied the contents of src/java/org/apache/spelling/suggest to the same location on my set up. However, I am still receiving this error. Did I not put the files in the right place? What am I doing incorrectly? Thanks, Brian Lamb
Multicore
Hi all, I am setting up multicore and the schema.xml file in the core0 folder says not to sure that one because its very stripped down. So I copied the schema from example/solr/conf but now I am getting a bunch of class not found exceptions: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.KeywordMarkerFilterFactory' For example. I also copied over the solrconfig.xml from example/solr/conf and changed all the lib dir=xxx paths to go up one directory higher (lib dir=../xxx / instead). I've found that when I use my solrconfig file with the stripped down schema.xml file, it runs correctly. But when I use the full schema xml file, I get those errors. Now this says to me I am not loading a library or two somewhere but I've looked through the configuration files and cannot see any other place other than solrconfig.xml where that would be set so what am I doing incorrectly? Thanks, Brian Lamb
Re: Dynamically boost search scores
Thank you for the advice. I looked at the page you recommended and came up with: http://localhost:8983/solr/search/?q=dogfl=boost_score,genus,species,scorerows=15bf=%22ord%28sum%28boost_score,1%29%29 ^10%22 But appeared to have no effect. The results were in the same order as they were when I left off the bf parameter. So what am I doing incorrectly? Thanks, Brian Lamb On Mon, Mar 14, 2011 at 11:45 AM, Markus Jelsma markus.jel...@openindex.iowrote: See boosting documents by function query. This way you can use document's boost_score field to affect the final score. http://wiki.apache.org/solr/FunctionQuery On Monday 14 March 2011 16:40:42 Brian Lamb wrote: Hi all, I have a field in my schema called boost_score. I would like to set it up so that if I pass in a certain flag, each document score is boosted by the number in boost_score. For example if I use: http://localhost/solr/search/?q=dog I would get search results like normal. But if I use: http://localhost/solr/search?q=dogboost=true The score of each document would be boosted by the number in the field boost_score. Unfortunately, I have no idea how to implement this actually but I'm hoping that's where you all can come in. Thanks, Brian Lamb -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Sorting
It doesn't necessarily need to go through an XSLT but the idea remains the same. I want have the highest scores first no matter which result they match with. So if the results are like this: lst name=moreLikeThis result name=3 numFound=2 start=0 maxScore=0.439 doc float name=score0.439/float str name=id1/str /doc doc float name=score0.215/float str name=id2/str /doc doc float name=score0.115/float str name=id3/str /doc /result result name=2 numFound=3 start=0 maxScore=0.539 doc float name=score0.539/float str name=id4/str /doc doc float name=score0.338/float str name=id5/str /doc /result /lst I would want them to be formatted like this: lst name=moreLikeThis doc float name=score0.539/float str name=id4/str /doc doc float name=score0.439/float str name=id1/str /doc doc float name=score0.338/float str name=id5/str /doc doc float name=score0.215/float str name=id2/str /doc doc float name=score0.115/float str name=id3/str /doc /lst The way I do it now is to fetch the results and then parse them with PHP to simulate that but it seems horribly inefficient so I'd like to do it within Solr if at all possible. On Thu, Mar 10, 2011 at 4:02 PM, Brian Lamb brian.l...@journalexperts.comwrote: Any ideas on this one? On Wed, Mar 9, 2011 at 2:00 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I know that I can add sort=score desc to the url to sort in descending order. However, I would like to sort a MoreLikeThis response which returns records like this: lst name=moreLikeThis result name=3 numFound=113611 start=0 maxScore=0.4392774 result name=2 numFound= start=0 maxScore=0.5392774 /lst I don't want them grouped by result; I would just like have them all thrown together and then sorted according to score. I have an XSLT which does put them altogether and returns the following: moreLikeThis similar scorex./score idsome_id/id /similar /moreLikeThis However it appears that it basically applies the stylesheet to result name=3 then result name=2. How can I make it so that with my XSLT, the results appear sorted by score?
Dynamically boost search scores
Hi all, I have a field in my schema called boost_score. I would like to set it up so that if I pass in a certain flag, each document score is boosted by the number in boost_score. For example if I use: http://localhost/solr/search/?q=dog I would get search results like normal. But if I use: http://localhost/solr/search?q=dogboost=true The score of each document would be boosted by the number in the field boost_score. Unfortunately, I have no idea how to implement this actually but I'm hoping that's where you all can come in. Thanks, Brian Lamb
Re: docBoost
Okay I think I have the idea: dataConfig dataSource type=JdbcDataSource name=animals batchSize=-1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/animals?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull user=user password=pass/ script![CDATA[ function BoostScores(row) { // if searching for recommendations add in the boost score if(some_condition) { row.put('$docBoost', row.get('boost_score')); } // end if(some_condition) return row; } // end function BoostRecommendations(row) ]]/script document entity name=animal dataSource=animals pk=id query=SELECT * FROM animals field column=id name=id / field column=genus name=genus / field column=species name=species / entity name=boosters dataSource=boosts query=SELECT boost_score FROM boosts WHERE animal_id=${ animal.id} field column=boost_score name=boost_score / /entity /entity /document /dataConfig Now, am I right in thinking that the boost score is only when the data is loaded? If so, that's close to what I want to do but not exactly. I would like to load all the data without boosting any scores but storing what the boost score would be. And then, depending on the search, boost scores by the value. For example, if a user searches for dog, they would get search results that were unboosted. However, I would also want the option to pass in a flag of some kind so that if a user searches for dog, they would get search results with the boost score factored in. Ideally it would be something like: Regular search: http://localhost/solr/search/?q=dog Boosted search: http://localhost/solr/search?q=dogboost=true To achieve this, would it be applied in the data import handler? If so, what would I need to put in for some_condition? Thanks for all the help so far. I truly do appreciate it. Thanks, Brian Lamb On Wed, Mar 9, 2011 at 11:50 PM, Bill Bell billnb...@gmail.com wrote: Yes just add if statement based on a field type and do a row.put() only if that other value is a certain value. On 3/9/11 1:39 PM, Brian Lamb brian.l...@journalexperts.com wrote: That makes sense. As a follow up, is there a way to only conditionally use the boost score? For example, in some cases I want to use the boost score and in other cases I want all documents to be treated equally. On Wed, Mar 9, 2011 at 2:42 PM, Jayendra Patil jayendra.patil@gmail.com wrote: you can use the ScriptTransformer to perform the boost calcualtion and addition. http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer dataConfig script![CDATA[ function f1(row) { // Add boost row.put('$docBoost',1.5); return row; } ]]/script document entity name=e pk=id transformer=script:f1 query=select * from X /entity /document /dataConfig Regards, Jayendra On Wed, Mar 9, 2011 at 2:01 PM, Brian Lamb brian.l...@journalexperts.com wrote: Anyone have any clue on this on? On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I am using dataimport to create my index and I want to use docBoost to assign some higher weights to certain docs. I understand the concept behind docBoost but I haven't been able to find an example anywhere that shows how to implement it. Assuming the following config file: document entity name=animal dataSource=animals pk=id query=SELECT * FROM animals field column=id name=id / field column=genus name=genus / field column=species name=species / entity name=boosters dataSource=boosts query=SELECT boost_score FROM boosts WHERE animal_id = ${ animal.id} field column=boost_score name=boost_score / /entity /entity /document How do I add in a docBoost score? The boost score is currently in a separate table as shown above.
Re: Sorting
Any ideas on this one? On Wed, Mar 9, 2011 at 2:00 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I know that I can add sort=score desc to the url to sort in descending order. However, I would like to sort a MoreLikeThis response which returns records like this: lst name=moreLikeThis result name=3 numFound=113611 start=0 maxScore=0.4392774 result name=2 numFound= start=0 maxScore=0.5392774 /lst I don't want them grouped by result; I would just like have them all thrown together and then sorted according to score. I have an XSLT which does put them altogether and returns the following: moreLikeThis similar scorex./score idsome_id/id /similar /moreLikeThis However it appears that it basically applies the stylesheet to result name=3 then result name=2. How can I make it so that with my XSLT, the results appear sorted by score?
Re: dataimport
This has since been fixed. The problem was that there was not enough memory on the machine. It works just fine now. On Tue, Mar 8, 2011 at 6:22 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : INFO: Creating a connection for entity id with URL: : jdbc:mysql://localhost/researchsquare_beta_library?characterEncoding=UTF8zeroDateTimeBehavior=convertToNull : Feb 24, 2011 8:58:25 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 : call : INFO: Time taken for getConnection(): 137 : Killed : : So it looks like for whatever reason, the server crashes trying to do a full : import. When I add a LIMIT clause on the query, it works fine when the LIMIT : is only 250 records but if I try to do 500 records, I get the same message. ...wow. that's ... weird. I've never seen a java process just log Killed like that. The only time i've ever seen a process log Killed is if it was terminated by the os (ie: kill -9 pid) What OS are you using? how are you running solr? (ie: are you using the simple jetty example java -jar start.jar or are you using a differnet servlet container?) ... are you absolutely certain your machine doens't have some sort of monitoring in place that kills jobs if they take too long, or use too much CPU? -Hoss
Sorting
Hi all, I know that I can add sort=score desc to the url to sort in descending order. However, I would like to sort a MoreLikeThis response which returns records like this: lst name=moreLikeThis result name=3 numFound=113611 start=0 maxScore=0.4392774 result name=2 numFound= start=0 maxScore=0.5392774 /lst I don't want them grouped by result; I would just like have them all thrown together and then sorted according to score. I have an XSLT which does put them altogether and returns the following: moreLikeThis similar scorex./score idsome_id/id /similar /moreLikeThis However it appears that it basically applies the stylesheet to result name=3 then result name=2. How can I make it so that with my XSLT, the results appear sorted by score?
Re: docBoost
Anyone have any clue on this on? On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I am using dataimport to create my index and I want to use docBoost to assign some higher weights to certain docs. I understand the concept behind docBoost but I haven't been able to find an example anywhere that shows how to implement it. Assuming the following config file: document entity name=animal dataSource=animals pk=id query=SELECT * FROM animals field column=id name=id / field column=genus name=genus / field column=species name=species / entity name=boosters dataSource=boosts query=SELECT boost_score FROM boosts WHERE animal_id = ${ animal.id} field column=boost_score name=boost_score / /entity /entity /document How do I add in a docBoost score? The boost score is currently in a separate table as shown above.
Re: docBoost
That makes sense. As a follow up, is there a way to only conditionally use the boost score? For example, in some cases I want to use the boost score and in other cases I want all documents to be treated equally. On Wed, Mar 9, 2011 at 2:42 PM, Jayendra Patil jayendra.patil@gmail.com wrote: you can use the ScriptTransformer to perform the boost calcualtion and addition. http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer dataConfig script![CDATA[ function f1(row) { // Add boost row.put('$docBoost',1.5); return row; } ]]/script document entity name=e pk=id transformer=script:f1 query=select * from X /entity /document /dataConfig Regards, Jayendra On Wed, Mar 9, 2011 at 2:01 PM, Brian Lamb brian.l...@journalexperts.com wrote: Anyone have any clue on this on? On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb brian.l...@journalexperts.comwrote: Hi all, I am using dataimport to create my index and I want to use docBoost to assign some higher weights to certain docs. I understand the concept behind docBoost but I haven't been able to find an example anywhere that shows how to implement it. Assuming the following config file: document entity name=animal dataSource=animals pk=id query=SELECT * FROM animals field column=id name=id / field column=genus name=genus / field column=species name=species / entity name=boosters dataSource=boosts query=SELECT boost_score FROM boosts WHERE animal_id = ${ animal.id} field column=boost_score name=boost_score / /entity /entity /document How do I add in a docBoost score? The boost score is currently in a separate table as shown above.
Excluding results from more like this
Hi all, I'm using MoreLikeThis to find similar results but I'd like to exclude records by the id number. For example, I use the following URL: http://localhost:8983/solr/search/?q=id:(2 3 5)mlt=truemlt.fl=description,idfl=*,score How would I exclude record 4 form the MoreLikeThis results? I tried, http://localhost:8983/solr/search/?q=id:(2 3 5)mlt=truemlt.fl=description,idfl=*,scoremlt.q=!4 But that still returned record 4 in the MoreLikeThisResults.
Re: Excluding results from more like this
That doesn't seem to do it. Record 4 is still showing up in the MoreLikeThis results. On Wed, Mar 9, 2011 at 4:12 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Brian, ...?q=id:(2 3 5) -4 Otis --- Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Brian Lamb brian.l...@journalexperts.com To: solr-user@lucene.apache.org Sent: Wed, March 9, 2011 4:05:10 PM Subject: Excluding results from more like this Hi all, I'm using MoreLikeThis to find similar results but I'd like to exclude records by the id number. For example, I use the following URL: http://localhost:8983/solr/search/?q=id:(2 3 5)mlt=truemlt.fl=description,idfl=*,score How would I exclude record 4 form the MoreLikeThis results? I tried, http://localhost:8983/solr/search/?q=id:(2 3 5)mlt=truemlt.fl=description,idfl=*,scoremlt.q=!4 But that still returned record 4 in the MoreLikeThisResults.
docBoost
Hi all, I am using dataimport to create my index and I want to use docBoost to assign some higher weights to certain docs. I understand the concept behind docBoost but I haven't been able to find an example anywhere that shows how to implement it. Assuming the following config file: document entity name=animal dataSource=animals pk=id query=SELECT * FROM animals field column=id name=id / field column=genus name=genus / field column=species name=species / entity name=boosters dataSource=boosts query=SELECT boost_score FROM boosts WHERE animal_id = ${ animal.id} field column=boost_score name=boost_score / /entity /entity /document How do I add in a docBoost score? The boost score is currently in a separate table as shown above.
Re: Indexed, but cannot search
Here are the relevant parts of schema.xml: field name=globalField type=text indexed=true stored=true multiValued=true/ defaultSearchFieldglobalField/defaultSearchField copyField source=* dest=globalField / This is what is returned when I search: response - lst name=responseHeader int name=status0/int int name=QTime1/int - lst name=params str name=qMammal/str str name=debugQuerytrue/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0/ - lst name=debug str name=rawquerystringMammal/str str name=querystringMammal/str str name=parsedqueryglobalField:mammal/str str name=parsedquery_toStringglobalField:mammal/str lst name=explain/ str name=QParserLuceneQParser/str - lst name=timing double name=time1.0/double - lst name=prepare double name=time1.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst - lst name=process double name=time0.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response On Tue, Mar 1, 2011 at 7:57 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hmm, please provide analyzer of text and output of debugQuery=true. Anyway, if field type is fieldType text and the catchall field text is fieldType text as well and you reindexed, it should work as expected. Oh if only it were that easy :-). I have reindexed since making that change which is how I was able to get the regular search working. I have not however been able to get the search across all fields to work. On Tue, Mar 1, 2011 at 3:01 PM, Markus Jelsma markus.jel...@openindex.iowrote: Traditionally, people forget to reindex ;) Hi all, The problem was that my fields were defined as type=string instead of type=text. Once I corrected that, it seems to be fixed. The only part that still is not working though is the search across all fields. For example: http://localhost:8983/solr/select/?q=type%3AMammal Now correctly returns the records matching mammal. But if I try to do a global search across all fields: http://localhost:8983/solr/select/?q=Mammal http://localhost:8983/solr/select/?q=text%3AMammal I get no results returned. Here is how the schema is set up: field name=text type=text indexed=true stored=false multiValued=true/ defaultSearchFieldtext/defaultSearchField copyField source=* dest=text / Thanks to everyone for your help so far. I think this is the last hurdle I have to jump over. On Tue, Mar 1, 2011 at 12:34 PM, Upayavira u...@odoko.co.uk wrote: Next question, do you have your type field set to index=true in your schema? Upayavira On Tue, 01 Mar 2011 11:06 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A* http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on I get the following as a response: result name=response numFound=249943 start=0 doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc /response (plus some other docs but one is enough for this example) But if I go to http://localhost:8983/solr/select/?q=type%3A http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10in dent=on Mammal I only get: result name=response numFound=0 start=0 But it seems that should return at least the result I have listed above. What am I doing incorrectly? On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote: q=dog is equivalent to q=text:dog (where the default search field is defined as text at the bottom of schema.xml
Re: Indexed, but cannot search
So here's something interesting. I did a delta import this morning and it looks like I can do a global search across those fields. I'll do another full import and see if that fixed the problem. I had done a fullimport after making this change but it seems like another reindex is in order. On Wed, Mar 2, 2011 at 10:31 AM, Markus Jelsma markus.jel...@openindex.iowrote: Please also provide analysis part of fieldType text. You can also use Luke to inspect the index. http://localhost:8983/solr/admin/luke?fl=globalFieldnumTerms=100 On Wednesday 02 March 2011 16:09:33 Brian Lamb wrote: Here are the relevant parts of schema.xml: field name=globalField type=text indexed=true stored=true multiValued=true/ defaultSearchFieldglobalField/defaultSearchField copyField source=* dest=globalField / This is what is returned when I search: response - lst name=responseHeader int name=status0/int int name=QTime1/int - lst name=params str name=qMammal/str str name=debugQuerytrue/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0/ - lst name=debug str name=rawquerystringMammal/str str name=querystringMammal/str str name=parsedqueryglobalField:mammal/str str name=parsedquery_toStringglobalField:mammal/str lst name=explain/ str name=QParserLuceneQParser/str - lst name=timing double name=time1.0/double - lst name=prepare double name=time1.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time1.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst - lst name=process double name=time0.0/double - lst name=org.apache.solr.handler.component.QueryComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.FacetComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.MoreLikeThisComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.HighlightComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.StatsComponent double name=time0.0/double /lst - lst name=org.apache.solr.handler.component.DebugComponent double name=time0.0/double /lst /lst /lst /lst /response On Tue, Mar 1, 2011 at 7:57 PM, Markus Jelsma markus.jel...@openindex.iowrote: Hmm, please provide analyzer of text and output of debugQuery=true. Anyway, if field type is fieldType text and the catchall field text is fieldType text as well and you reindexed, it should work as expected. Oh if only it were that easy :-). I have reindexed since making that change which is how I was able to get the regular search working. I have not however been able to get the search across all fields to work. On Tue, Mar 1, 2011 at 3:01 PM, Markus Jelsma markus.jel...@openindex.iowrote: Traditionally, people forget to reindex ;) Hi all, The problem was that my fields were defined as type=string instead of type=text. Once I corrected that, it seems to be fixed. The only part that still is not working though is the search across all fields. For example: http://localhost:8983/solr/select/?q=type%3AMammal Now correctly returns the records matching mammal. But if I try to do a global search across all fields: http://localhost:8983/solr/select/?q=Mammal http://localhost:8983/solr/select/?q=text%3AMammal I get no results returned. Here is how the schema is set up: field name=text type=text indexed=true stored=false multiValued=true/ defaultSearchFieldtext/defaultSearchField copyField source=* dest=text / Thanks to everyone for your help so far. I think this is the last hurdle I have to jump over. On Tue, Mar 1, 2011 at 12:34 PM, Upayavira u...@odoko.co.uk wrote: Next question, do you have your type field set to index=true in your schema? Upayavira On Tue, 01 Mar 2011 11:06 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A* http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows
Formatting the XML returned
Hi all, This list has proven itself quite useful since I got started with Solr. I'm wondering if it is possible to dictate the XML that is returned by a search? Right now it seems very inefficient in that it is formatted like: str name=field1Val/str str name=field2Val/str Etc. I would like to change it so that it reads something like: field1Val/field1 field2Val/field2 Is this possible? If so, how? Thanks, Brian Lamb
Re: Sub entities
Yes, it looks like I had left off the field (misspelled it actually). I reran the full import and the fields did properly show up. However, it is still not working as expected. Using the example below, a result returned would only list one specie instead of a list of species. I have the following in my schema.xml file: field column=specie multiValued=true name=specie type=string indexed=true stored=true required=false / I reran the fullimport but it is still only listing one specie instead of multiple. Is my above declaration incorrect? On Tue, Mar 1, 2011 at 3:41 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Brian, except for your sql-syntax error in the specie_relations-query SELECT specie_id FROMspecie_relations .. (missing whitespace after FROM) your config looks okay. following questions: * is there a field named specie in your schema? (otherwise dih will silently ignore it) * did you check your mysql-query log? to see which queries were executed and what their result is? And, just as quick notice .. there is no need to use field column=foo name=foo (while both attribute have the same value). Regards Stefan On Mon, Feb 28, 2011 at 9:52 PM, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my dataimport to work correctly but I'm a little unclear as to how the entity within an entity works in regards to search results. When I do a search for all results, it seems only the outermost responses are returned. For example, I have the following in my db config file: dataConfig dataSource type=JdbcDataSource name=mystuff batchSize=-1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db?characterEncoding=UTF8amp;zeroDateTimeBehavior=convertToNull user=user password=password/ document entity name=animal dataSource=mystuff query=SELECT * FROM animals field column=id name=id / field column=type name=type / field column=genus name=genus / !-- Add in the species -- entity name=specie_relations dataSource=mystuff query=SELECT specie_id FROM specie_relations WHERE animal_id=${animal.id} entity name=species dataSource=mystuff query=SELECT specie FROM species WHERE id=${specie_relations.specie_id} field column=specie name=specie / /entity /entity /entity /document /dataSource /dataConfig However, specie never shows up in my search results: doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc I had hoped the results would include the species. Can it? If so, what is my malfunction?
Re: Indexed, but cannot search
Thank you for your reply but the searching is still not working out. For example, when I go to: http://localhost:8983/solr/select/?q=*%3A*http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on I get the following as a response: result name=response numFound=249943 start=0 doc str name=typeMammal/str str name=id1/str str name=genusCanis/str /doc /response (plus some other docs but one is enough for this example) But if I go to http://localhost:8983/solr/select/?q=type%3Ahttp://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on Mammal I only get: result name=response numFound=0 start=0 But it seems that should return at least the result I have listed above. What am I doing incorrectly? On Mon, Feb 28, 2011 at 6:57 PM, Upayavira u...@odoko.co.uk wrote: q=dog is equivalent to q=text:dog (where the default search field is defined as text at the bottom of schema.xml). If you want to specify a different field, well, you need to tell it :-) Is that it? Upayavira On Mon, 28 Feb 2011 15:38 -0500, Brian Lamb brian.l...@journalexperts.com wrote: Hi all, I was able to get my installation of Solr indexed using dataimport. However, I cannot seem to get search working. I can verify that the data is there by going to: http://localhost:8983/solr/select/?q=*%3A*version=2.2start=0rows=10indent=on This gives me the response: result name=response numFound=234961 start=0 But when I go to http://localhost:8983/solr/select/?q=dogversion=2.2start=0rows=10indent=on I get the response: result name=response numFound=0 start=0 I know that dog should return some results because it is the first result when I select all the records. So what am I doing incorrectly that would prevent me from seeing results? --- Enterprise Search Consultant at Sourcesense UK, Making Sense of Open Source
Re: Sub entities
Thanks for the help Stefan. It seems removing column=specie fixed it. On Tue, Mar 1, 2011 at 11:18 AM, Stefan Matheis matheis.ste...@googlemail.com wrote: Brian, On Tue, Mar 1, 2011 at 4:52 PM, Brian Lamb brian.l...@journalexperts.com wrote: field column=specie multiValued=true name=specie type=string indexed=true stored=true required=false / Not sure, but iirc field in this context has no column-Attribute .. that should normally not break your solr-configuration. Are you sure, that your animal has multiple species assigned? Checked the Query from the MySQL-Query-Log and verified that it returns more than one record? Otherwise you could enable http://wiki.apache.org/solr/DataImportHandler#LogTransformer for your dataimport, which outputs a log-row for every record .. just to ensure, that your Query-Results is correctly imported HTH, Regards Stefan