Re: Inconsistent results in Solr Search with Lucene Index
I fixed that problem with reconfiguring schema.xml. Thanks for your help. Jak Grant Ingersoll yazmış: Have you setup your Analyzers, etc. so they correspond to the exact ones that you were using in Lucene? Under the Solr Admin you can try the analysis tool to see how your index and queries are treated. What happens if you do a *:* query from the Admin query screen? If your index is reasonably sized, I would just reindex, but you shouldn't have to do this. -Grant On Nov 27, 2007, at 8:18 AM, trysteps wrote: Hi All, I am trying to use Solr Search with Lucene Index so just set all schema.xml configs like tokenize and field necessaries. But I can not get results like Lucene. For example , search for 'dog' returns lots of results with lucene but in Solr, I can't get any result. But search with 'dog*' returns same result with Lucene. What is the best way to integrate Lucene index to Solr, are there any well-documented sources? Thanks for your Attention, Trysteps -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
RE: Tips for searching
If you want any letter and any possible substring you might be better off breaking every word into single letters with special tokens between words: ie: the quick brown fox Becomes t h e ZZ q u i c k ZZ b r o w n ZZ f o x then you can do all the single letter searches and multi letter searches turn into phrase searches. Ie: uic (from quick) would be rewritten as u i c And so on. This should give you better performance and more predictable results than wildcard searches depending on the size and complexity of your data. Relevancy would be horrible since the tf/idf would always have a common denominator depending on character set but there are ways around that as well. - will -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Friday, November 30, 2007 7:51 PM To: solr-user@lucene.apache.org Subject: Re: Tips for searching On 30-Nov-07, at 4:43 PM, Dave C. wrote: Thanks for the quick response Mike... Ideally it should match more than just a single character, i.e. the in weather or pro in profile or 000 in 18000. Would these cases be taken care of by the StopFilterFactory? No... you are looking for variant of WildcardQuery's. Prefix wildcards are supported (pro* - profile), but generalize wildcard queries aren't enabled by default. There has been lots of discussion on the list if you do a search. -Mike
Tomcat6?
In the Solr wiki, there is not described how to install Solr on Tomcat 6, and I not managed it myself :( In the chapter Configuring Solr Home with JNDI there is mentioned the directory $CATALINA_HOME/conf/Catalina/localhost , which not exists with TOMCAT 6. Alternatively I tried the folder $CATALINA_HOME/work/Catalina/localhost, but with no success.. (I can query the top level page, but the Solr Admin link then not works). Can anybody help? -- Dipl.-Inf. Jörg Kiegeland ikv++ technologies ag Bernburger Strasse 24-25, D-10963 Berlin e-mail: [EMAIL PROTECTED], web: http://www.ikv.de phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0 = Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg board of directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO) supervising board: Prof. Dr. Bernd Mahr (chairman) _
Re: Tomcat6?
In context.xml, I added.. Environment name=/solr/home value=/Users/mruno/solr-src/example/ solr type=java.lang.String / I think that's all I did to get it working in Tocmat 6. --Matthew Runo On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote: In the Solr wiki, there is not described how to install Solr on Tomcat 6, and I not managed it myself :( In the chapter Configuring Solr Home with JNDI there is mentioned the directory $CATALINA_HOME/conf/Catalina/localhost , which not exists with TOMCAT 6. Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ localhost, but with no success.. (I can query the top level page, but the Solr Admin link then not works). Can anybody help? -- Dipl.-Inf. Jörg Kiegeland ikv++ technologies ag Bernburger Strasse 24-25, D-10963 Berlin e-mail: [EMAIL PROTECTED], web: http://www.ikv.de phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0 = Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg board of directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO) supervising board: Prof. Dr. Bernd Mahr (chairman) _
RE: Tomcat6?
$CALINA_HOME/conf/Catalina/localhost doesn't exist by default, but you can create it and it will work exactly the same way it did in Tomcat 5. It's not created by default because its not needed by the manager webapp anymore. -Original Message- From: Matthew Runo [mailto:[EMAIL PROTECTED] Sent: Monday, December 03, 2007 10:15 AM To: solr-user@lucene.apache.org Subject: Re: Tomcat6? In context.xml, I added.. Environment name=/solr/home value=/Users/mruno/solr-src/example/ solr type=java.lang.String / I think that's all I did to get it working in Tocmat 6. --Matthew Runo On Dec 3, 2007, at 7:58 AM, Jörg Kiegeland wrote: In the Solr wiki, there is not described how to install Solr on Tomcat 6, and I not managed it myself :( In the chapter Configuring Solr Home with JNDI there is mentioned the directory $CATALINA_HOME/conf/Catalina/localhost , which not exists with TOMCAT 6. Alternatively I tried the folder $CATALINA_HOME/work/Catalina/ localhost, but with no success.. (I can query the top level page, but the Solr Admin link then not works). Can anybody help? -- Dipl.-Inf. Jörg Kiegeland ikv++ technologies ag Bernburger Strasse 24-25, D-10963 Berlin e-mail: [EMAIL PROTECTED], web: http://www.ikv.de phone: +49 30 34 80 77 18, fax: +49 30 34 80 78 0 = Handelsregister HRB 81096; Amtsgericht Berlin-Charlottenburg board of directors: Dr. Olaf Kath (CEO); Dr. Marc Born (CTO) supervising board: Prof. Dr. Bernd Mahr (chairman) _
RE: Solr Highlighting, word index
You can tell lucene to store token offsets using TermVectors (configurable via schema.xml). Then you can customize the request handler to return the token offsets (and/or positions) by retrieving the TVs. I think that is the best plan of action, how do I create a custom request handler that will use the existing indexed fields? There will be 2 requests as I see it, 1 for the search and 1 to retrieve the offsets when you view one of those found items. Any advice you can give me will be much appricated as I've had no luck with google so far. Thanks for your help so far, Best Regards, Martin Owens
How to delete records that don't contain a field?
I was wondering if there was a way to post a delete query using curl to delete all records that do not contain a certain field--something like this: curl http://localhost:8080/solr/update --data-binary 'deletequery-_title:[* TO *]/query/delete' -H 'Content-type:text/xml; charset=utf-8' The minus syntax seems to return the correct list of ids (that is, all records that do not contain the _title field) when I use the Solr administrative console to do the above query, so I'm wondering if Solr just doesn't support this type of delete. Thanks for any help...
Re: How to delete records that don't contain a field?
On Dec 3, 2007 5:22 PM, Jeff Leedy [EMAIL PROTECTED] wrote: I was wondering if there was a way to post a delete query using curl to delete all records that do not contain a certain field--something like this: curl http://localhost:8080/solr/update --data-binary 'deletequery-_title:[* TO *]/query/delete' -H 'Content-type:text/xml; charset=utf-8' The minus syntax seems to return the correct list of ids (that is, all records that do not contain the _title field) when I use the Solr administrative console to do the above query, so I'm wondering if Solr just doesn't support this type of delete. Not yet... it makes sense to support this in the future though. -Yonik
1.2 commit script chokes on 1.2 response format
LIke others before me, I stumbled across this bug, where solr/bin/commit warns that a commit failed when in fact it succeeded quite nicely, while getting collection distribution up running today: http://www.mail-archive.com/solr-user@lucene.apache.org/msg04585.html It's a trivial fix, and it seems like it's already been done in trunk: http://svn.apache.org/viewvc/lucene/solr/trunk/src/scripts/commit?r1=543259r2=555612view=patch The change has not been applied to 1.2. It might be nice if it were. -Charlie
RE: How to delete records that don't contain a field?
Wouldn't this be: *:* AND negative query -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Monday, December 03, 2007 2:23 PM To: solr-user@lucene.apache.org Subject: Re: How to delete records that don't contain a field? On Dec 3, 2007 5:22 PM, Jeff Leedy [EMAIL PROTECTED] wrote: I was wondering if there was a way to post a delete query using curl to delete all records that do not contain a certain field--something like this: curl http://localhost:8080/solr/update --data-binary 'deletequery-_title:[* TO *]/query/delete' -H 'Content-type:text/xml; charset=utf-8' The minus syntax seems to return the correct list of ids (that is, all records that do not contain the _title field) when I use the Solr administrative console to do the above query, so I'm wondering if Solr just doesn't support this type of delete. Not yet... it makes sense to support this in the future though. -Yonik
Re: CJK Analyzers for Solr
it seems good. On Dec 3, 2007 1:01 AM, Ken Krugler [EMAIL PROTECTED] wrote: Wunder - are you aware of any free dictionaries for either C or J or K? When I dealt with this in the past, I looked for something free, but found only commercial dictionaries. I would use data files from: http://ftp.monash.edu.au/pub/nihongo/00INDEX.html -- Ken Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Walter Underwood [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, November 28, 2007 5:43:32 PM Subject: Re: CJK Analyzers for Solr With Ultraseek, we switched to a dictionary-based segmenter for Chinese because the N-gram highlighting wasn't acceptable to our Chinese customers. I guess it is something to check for each application. wunder On 11/27/07 10:46 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: For what it's worth I worked on indexing and searching a *massive* pile of data, a good portion of which was in CJ and some K. The n-gram approach was used for all 3 languages and the quality of search results, including highlighting was evaluated and okay-ed by native speakers of these languages. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Walter Underwood [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, November 27, 2007 2:41:38 PM Subject: Re: CJK Analyzers for Solr Dictionaries are surprisingly expensive to build and maintain and bi-gram is surprisingly effective for Chinese. See this paper: http://citeseer.ist.psu.edu/kwok97comparing.html I expect that n-gram indexing would be less effective for Japanese because it is an inflected language. Korean is even harder. It might work to break Korean into the phonetic subparts and use n-gram on those. You should not do term highlighting with any of the n-gram methods. The relevance can be very good, but the highlighting just looks dumb. wunder On 11/27/07 8:54 AM, Eswar K [EMAIL PROTECTED] wrote: Is there any specific reason why the CJK analyzers in Solr were chosen to be n-gram based instead of it being a morphological analyzer which is kind of implemented in Google as it considered to be more effective than the n-gram ones? Regards, Eswar On Nov 27, 2007 7:57 AM, Eswar K [EMAIL PROTECTED] wrote: thanks james... How much time does it take to index 18m docs? - EswarOn Nov 27, 2007 7:43 AM, James liu [EMAIL PROTECTED] wrote: i not use HYLANDA analyzer. i use je-analyzer and indexing at least 18m docs. i m sorry i only use chinese analyzer.On Nov 27, 2007 10:01 AM, Eswar K [EMAIL PROTECTED] wrote: What is the performance of these CJK analyzers (one in lucene and hylanda )? We would potentially be indexing millions of documents. James, We would have a look at hylanda too. What abt japanese and korean analyzers, any recommendations? - Eswar On Nov 27, 2007 7:21 AM, James liu [EMAIL PROTECTED] wrote: I don't think NGram is good method for Chinese. CJKAnalyzer of Lucene is 2-Gram. Eswar K: if it is chinese analyzer,,i recommend hylandaÅiwww.hylanda.comÅj,,,it is the best chinese analyzer and it not free. if u wanna free chinese analyzer, maybe u can try je-analyzer. it have some problem when using it. On Nov 27, 2007 5:56 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote: Eswar, We've uses the NGram stuff that exists in Lucene's contrib/analyzers instead of CJK. Doesn't that allow you to do everything that the Chinese and CJK analyzers do? It's been a few months since I've looked at Chinese and CJK Analzyers, so I could be off. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Eswar K [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, November 26, 2007 8:30:52 AM Subject: CJK Analyzers for Solr Hi, Does Solr come with Language analyzers for CJK? If not, can you please direct me to some good CJK analyzers? Regards, Eswar -- regards jl -- regards jl -- Ken Krugler Krugle, Inc. +1 530-210-6378 If you can't find it, you can't fix it -- regards jl