Re: Problem using replication in 8/25/09 nightly build of 1.4
On Wed, Aug 26, 2009 at 11:53 PM, Ron Ellis r...@benetech.org wrote: Hi Everyone, When trying to utilize the new HTTP based replication built into Solr 1.4 I encounter a problem. When I view the replication admin page on the slave all of the master values are null i.e. Replicatable Index Version:null, Generation: null | Latest Index Version:null, Generation: null. If the master has just been started, it has no index which can be replicated to slave. If you do a commit on master then a replicateable index version will be shown on the slave and replication will proceed. Alternately, you can add the following to master configuration str name=replicateAfterstartup/str Despite these missing values the two seem to be talking over HTTP successfully (if I shutdown the master the slave replication page starts exploding with a NPE). The slave replication page should not show a NPE if the master is down. I'll look into it. When I hit http://solr/replication?command=indexversionwt=xml I get the following... response - lst name=responseHeader int name=status0/int int name=QTime13/int /lst long name=indexversion0/long long name=generation0/long /response However in the admin/replication UI on the master I see... ** Index Version: 1250525534711, Generation: 1778 Any idea what I'm doing wrong or how I could begin to diagnose? I am using the 8/25 nightly build of solr with the example solrconfig.xml provided. The only modifications to the config have been to uncomment the master/rslave replication sections and remove the data directory location line so it falls back to solr.home/data. Also if it's relevant this index was originally created in solr 1.3. I think that should be fine. I assume both master and slave are same Solr version 1.4? -- Regards, Shalin Shekhar Mangar.
Re: Pattern matching in Solr
Hi, In Schema.xml file,I am not able ot find splitOnCaseChange=1. I am not looking for case sensitive search. Let me know what file you are refering to?. I am looking for exact match search only Moreover for scenario 2 the KeywordTokenizerFactory and EdgeNGramFilterFactory refers which link in Solr wiki. Regards Bhaskar --- On Wed, 8/26/09, Avlesh Singh avl...@gmail.com wrote: From: Avlesh Singh avl...@gmail.com Subject: Re: Pattern matching in Solr To: solr-user@lucene.apache.org Date: Wednesday, August 26, 2009, 11:31 AM You could have used your previous thread itself ( http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr), Bhaskar. In your scenario one, you need an exact token match, right? You are getting expected results if your field type is text. Look for the WordDelimiterFilterFactory in your field type definition for the text field inside schema.xml. You'll find an attribute splitOnCaseChange=1. Because of this, ChandarBhaskar is converted into two tokens Chandra and Bhaskar and hence the matches. You may choose to remove this attribute if the behaviour is not desired. For your scenario two, you may want to look at the KeywordTokenizerFactory and EdgeNGramFilterFactory on Solr wiki. Generally, for all such use cases people create multiple fields in their schema storing the same data analyzed in different ways. Cheers Avlesh On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, Can any one help me with the below scenario?. Scenario 1: Assume that I give Google as input string i am using Carrot with Solr Carrot is for front end display purpose the issue is Assuming i give BHASKAR as input string It should give me search results pertaining to BHASKAR only. Select * from MASTER where name =Bhaskar; Example:It should not display search results as ChandarBhaskar or BhaskarC. Should display Bhaskar only. Scenario 2: Select * from MASTER where name like %BHASKAR%; It should display records containing the word BHASKAR Ex: Bhaskar ChandarBhaskar BhaskarC Bhaskarabc How to achieve Scenario 1 in Solr ?. Regards Bhaskar __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: ${solr.abortOnConfigurationError:false} - does it defaults to false
On Thu, Aug 27, 2009 at 1:05 AM, Ryan McKinley ryan...@gmail.com wrote: On Aug 26, 2009, at 3:33 PM, djain101 wrote: I have one quick question... If in solrconfig.xml, if it says ... abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnConfigurationError does it mean abortOnConfigurationError defaults to false if it is not set as system property? correct Should that be changed to be true by default in the example solrconfig? -- Regards, Shalin Shekhar Mangar.
Re: ${solr.abortOnConfigurationError:false} - does it defaults to false
On Thu, Aug 27, 2009 at 12:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Aug 27, 2009 at 1:05 AM, Ryan McKinley ryan...@gmail.com wrote: On Aug 26, 2009, at 3:33 PM, djain101 wrote: I have one quick question... If in solrconfig.xml, if it says ... abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnConfigurationError does it mean abortOnConfigurationError defaults to false if it is not set as system property? correct Should that be changed to be true by default in the example solrconfig? I just checked the 1.3 release. It was true by default. Somewhere in between the default was changed. I think we should revert this change. -- Regards, Shalin Shekhar Mangar.
Re: Solr Replication
when you say a slice you mean one instance of solr? So your JMX console is connecting to only one solr? On Thu, Aug 27, 2009 at 3:19 AM, J Gskinny_joe...@hotmail.com wrote: Thanks for the response. It's interesting because when I run jconsole all I can see is one ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice it finds on its path. Is there anyway to have multiple replication handlers or at least obtain replication on a per slice/instance via JMX like how you can see attributes for each slice/instance via each replication admin jsp page? Thanks again. From: noble.p...@corp.aol.com Date: Wed, 26 Aug 2009 11:05:34 +0530 Subject: Re: Solr Replication To: solr-user@lucene.apache.org The ReplicationHandler is not enforced as a singleton , but for all practical purposes it is a singleton for one core. If an instance (a slice as you say) is setup as a repeater, It can act as both a master and slave in the repeater the configuration should be as follows MASTER |_SLAVE (I am a slave of MASTER) | REPEATER (I am a slave of MASTER and master to my slaves ) | | REPEATER_SLAVE( of REPEATER) the point is that REPEATER will have a slave section has a masterUrl which points to master and REPEATER_SLAVE will have a slave section which has a masterurl pointing to repeater On Wed, Aug 26, 2009 at 12:40 AM, J Gskinny_joe...@hotmail.com wrote: Hello, We are running multiple slices in our environment. I have enabled JMX and I am inspecting the replication handler mbean to obtain some information about the master/slave configuration for replication. Is the replication handler mbean a singleton? I only see one mbean for the entire server and it's picking an arbitrary slice to report on. So I'm curious if every slice gets its own replication handler mbean? This is important because I have no way of knowing in this specific server any information about the other slices, in particular, information about the master/slave value for the other slices. Reading through the Solr 1.4 replication strategy, I saw that a slice can be configured to be a master and a slave, i.e. a repeater. I'm wondering how repeaters work because let's say I have a slice named 'A' and the master is on server 1 and the slave is on server 2 then how are these two servers communicating to replicate? Looking at the jmx information I have in the MBean both the isSlave and isMaster is set to true for my repeater so how does this solr slice know if it's the master or slave? I'm a bit confused. Thanks. _ With Windows Live, you can organize, edit, and share your photos. http://www.windowslive.com/Desktop/PhotoGallery -- - Noble Paul | Principal Engineer| AOL | http://aol.com _ Hotmail® is up to 70% faster. Now good news travels really fast. http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009 -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Trie Date question
Hello everyone, after reading Grant's article about TrieRange capabilities on the lucid blog I did some experimenting, but I have some trouble with the tdate type and I was hoping that you guys could point me in the right direction. So, basically I index a regular solr date field and use that for sorting and range queries today. For experimenting I added tdate field, indexing it with the same data as in my other date field, but I'm obviously doing something wrong here, because the results coming back are completely different... the definitions in my schema: field name=datetime type=date indexed=true stored=false omitNorms=true/ field name=tdatetime type=tdate indexed=true stored=false/ so if I do a query on my test index: q=datetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=1031524 (don't worry about the ordering yet).. then, if I do the following on my trie date field: q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=0 Where did I go wrong? (And yes, both fields are indexed with the exactly same data...) Thanks for any guidance here! Cheers, Aleks -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
Solr project statisitics
Hi, Where can I find general statistics about the Solr project. The only thing I found is statistics about the Lucene project at: http://people.apache.org/~vgritsenko/stats/projects/lucene.html#Downloads-N1008F Now the question is whether these number include all lucene's sub-projects (including Solr). If that's the case, then is there a way find out Solr's part in these numbers, otherwise are there any other publicly available statistics about Solr? Cheers, Uri
Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS
Guys, Thanks everyone who helped or tried to help me out with this issue. After talking with a buddy of mine who uses solr, he said that XPath exception seemed familiar. It turns out that right at the bottom of the Solr Wiki install page is a troubleshooting section with one entry... and it was regarding XPath. tomcat did not have xalan in its class path and the easiest way to fix that was to create a symlink to the file in /usr/share/tomcat/shared/lib directory. my version of xalan was located under /usr/share/java. for future reference, have anyone complaining about this same issue (XPath etc) go to this page: http://wiki.apache.org/solr/SolrTomcat#head-7fe06bf7aac41f6307f0290a2150b365227e1074 and at the bottom they will get the same instructions. Guys, again... thanks so much! --Aaron On Wed, Aug 26, 2009 at 8:47 PM, Fuad Efendif...@efendi.ca wrote: Looks like you totally ignored my previous post... Who is vendor of this openjdk-1.6.0.0? Who is vendor of JVM which this JDK runs on? ... such installs for Java are totally mess, you may have incompatible Servlet API loaded by bootstrap classloader before Tomcat classes First of all, please, try to install standard Java from SUN on your development box, and run some samples... !This is due to your tomcat instance not having the xalan jar file in !the classpath P.S. Don't rely on CentOS 'approved' Java libraries.
Re: HTML decoder is splitting tokens
Hello. Thanks for the hints. Still some trouble, though. I added just the HTMLStripCharFilterFactory because, according to documentation, it should also replace HTML entities. It did, but still left a space after the entity, so I got two tokens from Guuml;nther. That seems like a bug? Adding MappingCharFilterFactory in front of the HTML stripper (so that the latter will not see the entity) does work as expected. That is, until I try strings like use lt;pgt; to mark a paragraph, where the HTML stripper will then remove parts of the actual text. So this approach will not work. Finally, I was happy that I could now use an arbitrary tokenizer with HTML input. The PatternTokenizer, however, seems to be using character offsets corresponding to the output of the char filters, and so the highlighting markers end up at the wrong place. Is that a bug, or a configuration issue? Cheers, Anders. Koji Sekiguchi wrote: Hi Anders, Sorry, I don't know this is a bug or a feature, but I'd like to show an alternate way if you'd like. In Solr trunk, HTMLStripWhitespaceTokenizerFactory is marked as deprecated. Instead, HTMLStripCharFilterFactory and an arbitrary TokenizerFactory are encouraged to use. And I'd recommend you to use MappingCharFilterFactory to convert character references to real characters. That is, you have: fieldType name=textHtml class=solr.TextField analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType where the contents of mapping.txt: uuml; = ü auml; = ä iuml; = ï euml; = ë ouml; = ö : : Then run analysis.jsp and see the result. Thank you, Koji Anders Melchiorsen wrote: Hi. When indexing the string Guuml;nther with HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens, Gü and nther. Is this a bug, or am I doing something wrong? (Using a Solr nightly from 2009-05-29) Anders.
Re: Pattern matching in Solr
In Schema.xml file,I am not able ot find splitOnCaseChange=1. Unless you have modified the stock field type definition of text field in your core's schema.xml you should be able to find this property set for the WordDelimiterFilterFactory. Read more here - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089 Moreover for scenario 2 the KeywordTokenizerFactory and EdgeNGramFilterFactory refers which link in Solr wiki. Google for these two. Cheers Avlesh On Thu, Aug 27, 2009 at 12:21 PM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, In Schema.xml file,I am not able ot find splitOnCaseChange=1. I am not looking for case sensitive search. Let me know what file you are refering to?. I am looking for exact match search only Moreover for scenario 2 the KeywordTokenizerFactory and EdgeNGramFilterFactory refers which link in Solr wiki. Regards Bhaskar --- On Wed, 8/26/09, Avlesh Singh avl...@gmail.com wrote: From: Avlesh Singh avl...@gmail.com Subject: Re: Pattern matching in Solr To: solr-user@lucene.apache.org Date: Wednesday, August 26, 2009, 11:31 AM You could have used your previous thread itself ( http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr ), Bhaskar. In your scenario one, you need an exact token match, right? You are getting expected results if your field type is text. Look for the WordDelimiterFilterFactory in your field type definition for the text field inside schema.xml. You'll find an attribute splitOnCaseChange=1. Because of this, ChandarBhaskar is converted into two tokens Chandra and Bhaskar and hence the matches. You may choose to remove this attribute if the behaviour is not desired. For your scenario two, you may want to look at the KeywordTokenizerFactory and EdgeNGramFilterFactory on Solr wiki. Generally, for all such use cases people create multiple fields in their schema storing the same data analyzed in different ways. Cheers Avlesh On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, Can any one help me with the below scenario?. Scenario 1: Assume that I give Google as input string i am using Carrot with Solr Carrot is for front end display purpose the issue is Assuming i give BHASKAR as input string It should give me search results pertaining to BHASKAR only. Select * from MASTER where name =Bhaskar; Example:It should not display search results as ChandarBhaskar or BhaskarC. Should display Bhaskar only. Scenario 2: Select * from MASTER where name like %BHASKAR%; It should display records containing the word BHASKAR Ex: Bhaskar ChandarBhaskar BhaskarC Bhaskarabc How to achieve Scenario 1 in Solr ?. Regards Bhaskar __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: Lucene Search Performance Analysis Workshop
Fuad - http://www.lucidimagination.com/blog/2009/05/27/filtered-query-performance-increases-for-solr-14/ Use fq=filter instead, generally speaking. Erik On Aug 26, 2009, at 10:24 PM, Fuad Efendi wrote: I am wondering... are new SOLR filtering features faster than standard Lucene queries like {query} AND {filter}??? Why can't we improve Lucene then? Fuad P.S. https://issues.apache.org/jira/browse/SOLR-1169 https://issues.apache.org/jira/browse/SOLR-1179 -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: August-26-09 8:50 PM To: solr-user@lucene.apache.org Subject: Fwd: Lucene Search Performance Analysis Workshop While Andrzej's talk will focus on things at the Lucene layer, I'm sure there'll be some great tips and tricks useful to Solrians too. Andrzej is one of the sharpest folks I've met, and he's also a very impressive presenter. Tune in if you can. Erik Begin forwarded message: From: Andrzej Bialecki a...@getopt.org Date: August 26, 2009 5:44:40 PM EDT To: java-u...@lucene.apache.org Subject: Lucene Search Performance Analysis Workshop Reply-To: java-u...@lucene.apache.org Hi all, I am giving a free talk/ workshop next week on how to analyze and improve Lucene search performance for native lucene apps. If you've ever been challenged to get your Java Lucene search apps running faster, I think you might find the talk of interest. Free online workshop: Thursday, September 3rd 2009 11:00-11:30AM PDT / 14:00-14:30 EDT Follow this link to sign up: http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d cb1d6bbc?trk=WR-SEP2009-AP About: Lucene Performance Workshop: Understanding Lucene Search Performance with Andrzej Bialecki Experienced Java developers know how to use the Apache Lucene library to build powerful search applications natively in Java. LucidGaze for Lucene from Lucid Imagination, just released this week, provides a powerful utility for making transparent the underlying indexing and search operations, and analyzing their impact on search performance. Agenda: * Understanding sources of variability in Lucene search performance * LucidGaze for Lucene APIs for performance statistics * Applying LucidGaze for Lucene performance statistics to real-world performance problems Join us for a free online workshop. Sign up via the link below: http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d cb1d6bbc?trk=WR-SEP2009-AP About the Presenter: Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid Imagination Technical Advisory Board; he also serves as the project lead for Nutch, and as committer in the Lucene-java, Nutch and Hadoop projects. He has broad expertise, across domains as diverse as information retrieval, systems architecture, embedded systems kernels, networking and business process/e-commerce modeling. He's also the author of the popular Luke index inspection utility. Andrzej holds a master's degree in Electronics from Warsaw Technical University, speaks four languages and programs in many, many more. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene Search Performance Analysis Workshop
On Aug 26, 2009, at 10:24 PM, Fuad Efendi wrote: I am wondering... are new SOLR filtering features faster than standard Lucene queries like {query} AND {filter}??? The new filtering features in Solr are just doing what Lucene started doing in 2.4 and that is using skipping when possible. It used to be the case in both Lucene and Solr that the filter was only every applied after scoring but before insertion into the Priority Queue. That is now fixed. Why can't we improve Lucene then? Fuad P.S. https://issues.apache.org/jira/browse/SOLR-1169 https://issues.apache.org/jira/browse/SOLR-1179 -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: August-26-09 8:50 PM To: solr-user@lucene.apache.org Subject: Fwd: Lucene Search Performance Analysis Workshop While Andrzej's talk will focus on things at the Lucene layer, I'm sure there'll be some great tips and tricks useful to Solrians too. Andrzej is one of the sharpest folks I've met, and he's also a very impressive presenter. Tune in if you can. Erik Begin forwarded message: From: Andrzej Bialecki a...@getopt.org Date: August 26, 2009 5:44:40 PM EDT To: java-u...@lucene.apache.org Subject: Lucene Search Performance Analysis Workshop Reply-To: java-u...@lucene.apache.org Hi all, I am giving a free talk/ workshop next week on how to analyze and improve Lucene search performance for native lucene apps. If you've ever been challenged to get your Java Lucene search apps running faster, I think you might find the talk of interest. Free online workshop: Thursday, September 3rd 2009 11:00-11:30AM PDT / 14:00-14:30 EDT Follow this link to sign up: http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d cb1d6bbc?trk=WR-SEP2009-AP About: Lucene Performance Workshop: Understanding Lucene Search Performance with Andrzej Bialecki Experienced Java developers know how to use the Apache Lucene library to build powerful search applications natively in Java. LucidGaze for Lucene from Lucid Imagination, just released this week, provides a powerful utility for making transparent the underlying indexing and search operations, and analyzing their impact on search performance. Agenda: * Understanding sources of variability in Lucene search performance * LucidGaze for Lucene APIs for performance statistics * Applying LucidGaze for Lucene performance statistics to real-world performance problems Join us for a free online workshop. Sign up via the link below: http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d cb1d6bbc?trk=WR-SEP2009-AP About the Presenter: Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid Imagination Technical Advisory Board; he also serves as the project lead for Nutch, and as committer in the Lucene-java, Nutch and Hadoop projects. He has broad expertise, across domains as diverse as information retrieval, systems architecture, embedded systems kernels, networking and business process/e-commerce modeling. He's also the author of the popular Luke index inspection utility. Andrzej holds a master's degree in Electronics from Warsaw Technical University, speaks four languages and programs in many, many more. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Solr project statisitics
On Aug 27, 2009, at 4:00 AM, Uri Boness wrote: Hi, Where can I find general statistics about the Solr project. The only thing I found is statistics about the Lucene project at: http://people.apache.org/~vgritsenko/stats/projects/lucene.html#Downloads-N1008F Now the question is whether these number include all lucene's sub- projects (including Solr). If that's the case, then is there a way find out Solr's part in these numbers, otherwise are there any other publicly available statistics about Solr? Those are pretty much it. It is further complicated by the fact that the ASF has a really large mirroring system, which complicates the downloads picture quite a bit. Nor does it account for the many people using distributions from Ubuntu, etc.
Re: Lucene Search Performance Analysis Workshop
On Thu, Aug 27, 2009 at 6:30 AM, Grant Ingersollgsing...@apache.org wrote: I am wondering... are new SOLR filtering features faster than standard Lucene queries like {query} AND {filter}??? The new filtering features in Solr are just doing what Lucene started doing in 2.4 and that is using skipping when possible. It used to be the case in both Lucene and Solr that the filter was only every applied after scoring but before insertion into the Priority Queue. That is now fixed. I think performance of filtering can still be further improved, within Lucene... it's still very much a work in progress. EG if a filter is random access (eg RAM resident as a bit set), which I think for Solr is frequently the case (?), it ought to be applied just like we now apply deleted documents (LUCENE-1536 is opened for this). This can result in sizable performance gains, especially for more complex queries and no-so-dense filters. Mike
Announcing Dutch Lucene User Group
Hi, We started a new Lucene user group in The Netherlands. In the last couple of years we've notice an increasing demand and interest in Lucene and Solr. We thought it's about time to have a centralize place where people can have open discussions, trainings, and periodic meet-ups to share knowledge and experience with relation to these technologies. The website is up and running and you're welcome to register (even if you're not living in The Netherlands - the content is in English :-)). Check out: http://www.lucene-nl.org Cheers, Uri
Thanks
Hello, Earlier this your our company decided to (finally :)) upgrade our website to something a little faster/prettier/maintainable-er. After some research we decided on using Solr and after indexing our data for the first time and trying some manual queries we were all amazed at the speed. This summer we started developing the new site and today we've gone live.You can see the site running at http://www.mysecondhome.eu (I don't mean to advertise, so feel free not to buy a house). I'd like to thank the people here for their help with lifting me from Solr-ignorance to Solr-seems-to-know-a-little-bit. We're running a nightly build of Solr 1.4 with SOLR-1240 applied for the dynamic facet count updates when using the sliders in the search screen. Again, thank you and if you have any suggestions or questions regarding our implementation, feel free to ask. Regards, gwk
RE: Thanks
Hi Gwk, It's a nice clean site, easy to use and seems very fast, well done! How well does it do in regards to SEO though? I noticed there's a lot of ajax going on in the background to help speed things up for the user (love the sliders), but seems to be lacking structure for the search engines. I'm not sure if this is your intention or not, but you could massively increase the number of pages the crawlers see by extending your url rewrites to be a bit more static i.e. http://www.mysecondhome.co.uk/search/country/France#/s?s=date_descp=1t=objectta=[]pmin=0pmax=%3Ecountry[]=Franceapmin=0apmax=%3Esamin=0samax=%3E could become: http://www.mysecondhome.co.uk/search/country/France/region/Auvergne/minprice/20/maxprice/3/page/2 This is what we do with our solr implemented search system across all our sites, which in turn has increased general traffic and organic traffic (eg www.visordown.com, www.madeformums.com) Cheers Dave -Original Message- From: gwk [mailto:g...@eyefi.nl] Sent: 27 August 2009 13:04 To: solr-user@lucene.apache.org Subject: Thanks Hello, Earlier this your our company decided to (finally :)) upgrade our website to something a little faster/prettier/maintainable-er. After some research we decided on using Solr and after indexing our data for the first time and trying some manual queries we were all amazed at the speed. This summer we started developing the new site and today we've gone live.You can see the site running at http://www.mysecondhome.eu (I don't mean to advertise, so feel free not to buy a house). I'd like to thank the people here for their help with lifting me from Solr-ignorance to Solr-seems-to-know-a-little-bit. We're running a nightly build of Solr 1.4 with SOLR-1240 applied for the dynamic facet count updates when using the sliders in the search screen. Again, thank you and if you have any suggestions or questions regarding our implementation, feel free to ask. Regards, gwk
Re: Thanks
Dave Searle wrote: Hi Gwk, It's a nice clean site, easy to use and seems very fast, well done! How well does it do in regards to SEO though? I noticed there's a lot of ajax going on in the background to help speed things up for the user (love the sliders), but seems to be lacking structure for the search engines. I'm not sure if this is your intention or not, but you could massively increase the number of pages the crawlers see by extending your url rewrites to be a bit more static Hi Dave, Thanks for the reply, actually, we did think about SEO, turn off javascript in your browser and you'll see the site still works (at least, it's supposed to). We've added all AJAXy-interaction after we implemented the functionality to work without Javascript. So you'll get no nice fancy sliders but two drop-downs to select a range. Regards, gwk
Re: Trie Date question
I can't reproduce any problem. Are you using a recent nightly build? See the example schema of a recent nightly build for the correct way to define a Trie based field - the article / blog may be out of date. Here's what I used to test the example data: http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY] -Yonik http://www.lucidimagination.com On Thu, Aug 27, 2009 at 3:49 AM, Aleksander Stensbyaleksander.sten...@integrasco.com wrote: Hello everyone, after reading Grant's article about TrieRange capabilities on the lucid blog I did some experimenting, but I have some trouble with the tdate type and I was hoping that you guys could point me in the right direction. So, basically I index a regular solr date field and use that for sorting and range queries today. For experimenting I added tdate field, indexing it with the same data as in my other date field, but I'm obviously doing something wrong here, because the results coming back are completely different... the definitions in my schema: field name=datetime type=date indexed=true stored=false omitNorms=true/ field name=tdatetime type=tdate indexed=true stored=false/ so if I do a query on my test index: q=datetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=1031524 (don't worry about the ordering yet).. then, if I do the following on my trie date field: q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY] i get numFound=0 Where did I go wrong? (And yes, both fields are indexed with the exactly same data...) Thanks for any guidance here! Cheers, Aleks -- Aleksander M. Stensby Lead Software Developer and System Architect Integrasco A/S www.integrasco.com http://twitter.com/Integrasco http://facebook.com/Integrasco Please consider the environment before printing all or any of this e-mail
RE: JDWP Error
: JDPA/JDWP are for remote debugging of SUN JVM... : It shouldn't be SOLR related... check configs of Resin... right, it sounds like you probably already have another process that is listening on that port (and older execution of resin that was never shut down cleanly?) ... : then, when we want to stop resin it doesn't works, any advice? -Hoss
Optimal Cache Settings, complicated by regular commits
Hi all, I'm trying to work out the optimum cache settings for our Solr server, I'll begin by outlining our usage. Number of documents: approximately 25,000 Commit frequency: sometimes we do massive amounts of sequential commits, most of the time its less frequent but still several times an hour We make heavy use of faceting and sorting, and the number of possible facets led to choosing a filterCache size of about 50,000 The problem we have is that the default cache settings resulting in very low hit rates (less than 30% for documents, less than 1% for filterCache), so we upped the cache size up gradually until the hit rates were in the 80s-90s, now we have the issue of commits being very slow (more than 5 seconds for a document), to the point where it causes a timeout elsewhere in our systems. This is made worse by the fact that committing seems to empty the cache, given that it takes about an hour to get the cache to a good state this is obviously very problematic. Is there a way for commits to selectively empty the cache? Any advice regarding the config would be appreciated. The server load is relatively low, ideally we're looking to minimize the response time rather than aim for CPU or memory efficiency. Regards, Andrew Ingram
Re: SortableFloatFieldSource not accessible? (1.3)
Yes it will. Thanks. On Wed, Aug 26, 2009 at 8:51 PM, Yonik Seeley yo...@lucidimagination.comwrote: SortableFloatField works in function queries... it's just that everyone goes through SortableFloatField.getValueSource() to create them. Will that work for you? -Yonik http://www.lucidimagination.com On Wed, Aug 26, 2009 at 6:23 PM, Christophe Bioccachristo...@openplaces.org wrote: The class SortableFloatFieldSource cannot be accessed from outside its package. So it can't be used as part of a FunctionQuery. Is there a workaround to this, or should I roll my own? Will it be fixed in 1.4?
how to selectively sort records keeping some at the bottom always.. ?
Hi, If I have documents of type a, b and c but when I sort by some criteria, lets say date, can I make documents of kind c always appear at the bottom ? So effectively I want one kind of records always appear at the bottom since they don't have valid data, whether sort is ascending or descending; Would a function query help here ? Or is it even possible ? Thanks Preetam
Searching with or without diacritics
Hello, I started to use solr only recently using the ruby/rails sunspot-solr client. I use solr on a slovak/czech data set and realized one not wanted behaviour of the search. When the user searches an expression or word which contains dicritics, letters like š, č, ť, ä, ô,... usually the special characters are omitted in the search query. In this case solr does not return records which contain the expression intended to be found by the user. How can I configure solr in a way, that it founds records containing special characters, even if they are without special accents in the query? Some info about my solr instance: Solr Specification Version: 1.3.0Solr Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12 11:06:47Lucene Specification Version: 2.4-devLucene Implementation Version: 2.4-dev 691741 - 2008-09-03 15:25:16 Thank for your help, regards, Georg
Re: Thanks
This looks great! Congratulations! Feel free to add your site to the Powered by Solr page at http://wiki.apache.org/solr/PublicServers On Thu, Aug 27, 2009 at 5:34 PM, gwk g...@eyefi.nl wrote: Hello, Earlier this your our company decided to (finally :)) upgrade our website to something a little faster/prettier/maintainable-er. After some research we decided on using Solr and after indexing our data for the first time and trying some manual queries we were all amazed at the speed. This summer we started developing the new site and today we've gone live.You can see the site running at http://www.mysecondhome.eu (I don't mean to advertise, so feel free not to buy a house). I'd like to thank the people here for their help with lifting me from Solr-ignorance to Solr-seems-to-know-a-little-bit. We're running a nightly build of Solr 1.4 with SOLR-1240 applied for the dynamic facet count updates when using the sliders in the search screen. Again, thank you and if you have any suggestions or questions regarding our implementation, feel free to ask. Regards, gwk -- Regards, Shalin Shekhar Mangar.
RE: Thanks
Great site (fast from Canada), multilingual, hope you will get millions of ads quickly and share your findings of SOLR faceting performance (don't forget about SOLR HTTP-caching support!) I am currently developing similar in Canada, http://www.casaGURU.com (and hope to improve http://www.zoocasa.com) -Original Message- From: gwk [mailto:g...@eyefi.nl] Sent: August-27-09 8:04 AM To: solr-user@lucene.apache.org Subject: Thanks Hello, Earlier this your our company decided to (finally :)) upgrade our website to something a little faster/prettier/maintainable-er. After some research we decided on using Solr and after indexing our data for the first time and trying some manual queries we were all amazed at the speed. This summer we started developing the new site and today we've gone live.You can see the site running at http://www.mysecondhome.eu (I don't mean to advertise, so feel free not to buy a house). I'd like to thank the people here for their help with lifting me from Solr-ignorance to Solr-seems-to-know-a-little-bit. We're running a nightly build of Solr 1.4 with SOLR-1240 applied for the dynamic facet count updates when using the sliders in the search screen. Again, thank you and if you have any suggestions or questions regarding our implementation, feel free to ask. Regards, gwk
Distributed Search nightly delete
Hi All, I need to build a Search system using Solr. I need to keep data of 30 days which will be around 400GB. I will be using Distributed Search with Master/Slaves (Data will be published to each shard on round robin basis). My challenge is I need to delete older than 30 days data (around 12GB) every night. Search will be very high on the data of the current day as well as last 1 week data. How many shards with masters slaves I should have to handle the search as well as to delete old data every night? Thanks in Advance. -- View this message in context: http://www.nabble.com/Distributed-Search---nightly-delete-tp25173735p25173735.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr admin url for example gives 404
:Try running ant example and then run Solr. right ... on a clean checkout, the solr.war needs to be built and copied to the example directory, otherwise you are just running an empty jetty server. do you see anything in example/webapps? : 1 get the latest Solr from svn (R 808058) : 2 run ant clean test (all tests pass) : 3 cd ./example : 4. start solr : $ java -jar start.jar -Hoss
Re: com.ctc.wstx.exc.WstxUnexpectedCharException error
: I have a valid xml document that begins: how are you inspecting the document? I suspect that what you actually have is a documenting containing hte literal bytes RD but some tool you are using to view the document is displaying the $ to you as amp; ...OR... your source document has the literal byts Ramp;D in it, but some code you are using is parsing that as xml and put wrtting it (over the wire) to solr has a string literal without reencoding (RD) try running nc -l in place of solr, and have your indexing code post to it -- then see what you get. Solr certianly doesn't have a problem with proerly escaped ampersands, but it will complain about illegal xml escape sequences... $ java -Ddata=args -jar post.jar 'adddocfield name=idRamp;D/field/doc/add' SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing args to http://localhost:8983/solr/update.. SimplePostTool: COMMITting Solr index changes.. $ java -Ddata=args -jar post.jar 'adddocfield name=idRD/field/doc/add' SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing args to http://localhost:8983/solr/update.. SimplePostTool: FATAL: Solr returned an error: comctcwstxexcWstxLazyException_Unexpected_character__code_60_expected_a_semicolon_after_the_reference_for_entity_D__at_ -Hoss
Updating a solr record
I realize there is no way to update particular fields in a solr record. I know the recommendation is to delete the record from the index and re-add it, but in my case, it is difficult to completely reindex, so that creates problems with my work flow. That is, the info that I use to create a solr doc comes from two places: a local file that contains most of the info, and a URL in that file that points to a web page that contains the rest of the info. To completely reindex, we have to hit every website again, which is problematic for a number of reasons. (Plus, those websites don't change much, so it is just wasted effort.) (Once in a while we do reindex, and it is a huge production to do so.) But that means that if I want to make a small change to either schema.xml or the local files that I'm indexing, I can't. I can't even fix minor bugs until our yearly reindexing. So, the question is: Is there any way to get the info that is already in the solr index for a document, so that I can use that as a starting place? I would just tweak that record and add it again. Thanks, Paul
Re: Updating a solr record
On Thu, Aug 27, 2009 at 1:27 PM, Eric Pughep...@opensourceconnections.com wrote: You can just query Solr, find the records that you want (including all the website data). Update them, and then send the entire record back. Correct me if I'm wrong, but I think you'd end up losing the fields that are indexed but not stored. -- http://www.linkedin.com/in/paultomblin
Re: Lucene Search Performance Analysis Workshop
Agreed, Solr uses random access bitsets everywhere so I'm thinking this could be an improvement or at least a great option to enable and try out. I'll update LUCENE-1536 so we can benchmark. On Thu, Aug 27, 2009 at 4:06 AM, Michael McCandlessluc...@mikemccandless.com wrote: On Thu, Aug 27, 2009 at 6:30 AM, Grant Ingersollgsing...@apache.org wrote: I am wondering... are new SOLR filtering features faster than standard Lucene queries like {query} AND {filter}??? The new filtering features in Solr are just doing what Lucene started doing in 2.4 and that is using skipping when possible. It used to be the case in both Lucene and Solr that the filter was only every applied after scoring but before insertion into the Priority Queue. That is now fixed. I think performance of filtering can still be further improved, within Lucene... it's still very much a work in progress. EG if a filter is random access (eg RAM resident as a bit set), which I think for Solr is frequently the case (?), it ought to be applied just like we now apply deleted documents (LUCENE-1536 is opened for this). This can result in sizable performance gains, especially for more complex queries and no-so-dense filters. Mike
Re: Problem using replication in 8/25/09 nightly build of 1.4
On Thu, Aug 27, 2009 at 12:27 PM, Shalin Shekhar Mangarshalinman...@gmail.com wrote: On Wed, Aug 26, 2009 at 11:53 PM, Ron Ellis r...@benetech.org wrote: Hi Everyone, When trying to utilize the new HTTP based replication built into Solr 1.4 I encounter a problem. When I view the replication admin page on the slave all of the master values are null i.e. Replicatable Index Version:null, Generation: null | Latest Index Version:null, Generation: null. If the master has just been started, it has no index which can be replicated to slave. If you do a commit on master then a replicateable index version will be shown on the slave and replication will proceed. Alternately, you can add the following to master configuration str name=replicateAfterstartup/str Despite these missing values the two seem to be talking over HTTP successfully (if I shutdown the master the slave replication page starts exploding with a NPE). The slave replication page should not show a NPE if the master is down. I'll look into it This should .be fixed in the trunk. When I hit http://solr/replication?command=indexversionwt=xml I get the following... response - lst name=responseHeader int name=status0/int int name=QTime13/int /lst long name=indexversion0/long long name=generation0/long /response However in the admin/replication UI on the master I see... ** Index Version: 1250525534711, Generation: 1778 Any idea what I'm doing wrong or how I could begin to diagnose? I am using the 8/25 nightly build of solr with the example solrconfig.xml provided. The only modifications to the config have been to uncomment the master/rslave replication sections and remove the data directory location line so it falls back to solr.home/data. Also if it's relevant this index was originally created in solr 1.3. I think that should be fine. I assume both master and slave are same Solr version 1.4? -- Regards, Shalin Shekhar Mangar. -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Updating a solr record
Eric Pugh wrote: Do you have to reindex? Are you meaning an optimize operation? You can do an update by just sending Solr a new record, and letting Solr deal with the removing and adding of the data. The problem is that I can't easily create the new record. There is some data that I no longer have access to, but did at the time I created the record to begin with. You can just query Solr, find the records that you want (including all the website data). Update them, and then send the entire record back. This is what I'd like to know how to do. I'll experiment with this, but I thought that I wouldn't be able to get back all the info I need to recreate the doc. Or am I missing something? Are these documents so huge that you don't want to pull back an entire record for some reason? I would like to get the record from solr because I just can't create the record the same way as I originally did. (Besides the time involved in crawling all those websites, some of them only allow us access for a limited amount of time, so to reindex, we need to call them up and schedule a time for them to whitelist us.) Eric On Thu, Aug 27, 2009 at 1:21 PM, Paul Rosenp...@performantsoftware.com wrote: I realize there is no way to update particular fields in a solr record. I know the recommendation is to delete the record from the index and re-add it, but in my case, it is difficult to completely reindex, so that creates problems with my work flow. That is, the info that I use to create a solr doc comes from two places: a local file that contains most of the info, and a URL in that file that points to a web page that contains the rest of the info. To completely reindex, we have to hit every website again, which is problematic for a number of reasons. (Plus, those websites don't change much, so it is just wasted effort.) (Once in a while we do reindex, and it is a huge production to do so.) But that means that if I want to make a small change to either schema.xml or the local files that I'm indexing, I can't. I can't even fix minor bugs until our yearly reindexing. So, the question is: Is there any way to get the info that is already in the solr index for a document, so that I can use that as a starting place? I would just tweak that record and add it again. Thanks, Paul
extended documentation on analyzers
is there an online resource or a book that contains a thorough list of tokenizers and filters available and their functionality? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters is very helpful but i would like to go through additional filters to make sure im not reinventing the wheel adding my own --joe
RE: How to reduce the Solr index size..
stored=true means that this piece of info will be stored in a filesystem. So that your index will contain 1Mb of pure log PLUS some info related to indexing itself: terms, etc. Search speed is more important than index size... And note this: message field contains actual log, stored=true, so that only this field will make 1Mb if not indexed -Original Message- From: Silent Surfer [mailto:silentsurfe...@yahoo.com] Sent: August-20-09 11:01 AM To: Solr User Subject: How to reduce the Solr index size.. Hi, I am newbie to Solr. We recently started using Solr. We are using Solr to process the server logs. We are creating the indexes for each line of the logs, so that users would be able to do a fine grain search upto second/ms. Now what we are observing is , the index size that is being created is almost double the size of the actual log size. i.e if the logs size is say 1 MB, the actual index size is around 2 MB. Could anyone let us know what can be done to reduce the index size. Do we need to change any configurations/delete any files which are created during the indexing processes, but not required for searching.. Our schema is as follows: field name=pkey type=string indexed=true stored=true required=false / field name=date type=date indexed=true stored=true omitNorms=true/ field name=level type=string indexed=true stored=true/ field name=app type=string indexed=true stored=true/ field name=server type=string indexed=true stored=true/ field name=port type=string indexed=true stored=true/ field name=class type=string indexed=true stored=true/ field name=method type=string indexed=true stored=true/ field name=filename type=string indexed=true stored=true/ field name=linenumber type=string indexed=true stored=true/ field name=message type=text indexed=true stored=true/ message field holds the actual logtext. Thanks, sS
Re: Solr project statisitics
Hmmm.. I see, too bad. So, here's a crazy question: if you had to guess, how much of these numbers come from Solr nowadays (compared to lucene java and the other related projects)? (I know.. it is a crazy question, but I had to ask :-)) Grant Ingersoll wrote: On Aug 27, 2009, at 4:00 AM, Uri Boness wrote: Hi, Where can I find general statistics about the Solr project. The only thing I found is statistics about the Lucene project at: http://people.apache.org/~vgritsenko/stats/projects/lucene.html#Downloads-N1008F Now the question is whether these number include all lucene's sub-projects (including Solr). If that's the case, then is there a way find out Solr's part in these numbers, otherwise are there any other publicly available statistics about Solr? Those are pretty much it. It is further complicated by the fact that the ASF has a really large mirroring system, which complicates the downloads picture quite a bit. Nor does it account for the many people using distributions from Ubuntu, etc.
Re: extended documentation on analyzers
If you have a specific need, ask on this list. That worked for me. I don't think I would have recognized KeywordAnalyzer as the one I wanted. wunder On Aug 27, 2009, at 11:32 AM, Joe Calderon wrote: is there an online resource or a book that contains a thorough list of tokenizers and filters available and their functionality? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters is very helpful but i would like to go through additional filters to make sure im not reinventing the wheel adding my own --joe
Re: facets: case and accent insensitive sort
Hi Sébastien, I've experienced the same issue but when using range queries. Maybe this might help you too. I was trying to filter a query using a range as [ B TO F ] being case and accent insensitive, and still get back the case and accent at results. The solution have been NOT TOKENIZE the field and get a SINGLE token as if it was a STRING field and store it without case and accents. The KeywordTokenizer did the job, then at query time the indexed value (without accents and case insensitve) is used, but the stored value is returned in the response. As far I know facets use indexed value at processing, but i'm not sure which of both(indexed or stored) is returned. KeywordTokenizer is not clear at Solr docs. See what Lucene says: KeywordTokenizer - Emits the entire input as a single token. fieldType name=text_insensitive class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ /analyzer /fieldType Cheers, Michel Bottan On Mon, Jun 29, 2009 at 10:17 AM, Sébastien Lamy lamys...@free.fr wrote: Thanks for your reply. I will have a look at this. Peter Wolanin a écrit : Seems like this might be approached using a Lucene payload? For example where the original string is stored as the payload and available in the returned facets for display purposes? Payloads are byte arrays stored with Terms on Fields. See https://issues.apache.org/jira/browse/LUCENE-755 Solr seems to have support for a few example payloads already like NumericPayloadTokenFilter Almost any way you approach this it seems like there are potentially problems since you might have multiple combinations of case and accent mapping to the same case-less accent-less value that you want to use for sorting (and I assume for counting) your facets? -Peter On Fri, Jun 26, 2009 at 9:02 AM, Sébastien Lamylamys...@free.fr wrote: Shalin Shekhar Mangar a écrit : On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy lamys...@free.fr wrote: If I use a copyField to store into a string type, and facet on that, my problem remains: The facets are sorted case and accent sensitive. And I want an *insensitive* sort. If I use a copyField to store into a type with no accents and case (e.g alphaOnlySort), then solr return me facet values with no accents and no case. And I want the facet values returned by solr to *have accents and case*. Ah, of course you are right. There is no way to do this right now except at the client side. Thank you for your response. Would it be easy to modify Solr to behave like I want. Where should I start to investigate?
tag cloud with solr 1.3
Hi all,How would I go about implementing a 'tag cloud' with Solr1.3? All I want to do is to display a list of most occurring terms in the corpus. Is there an easy way to go about that in 1.3? I saw a couple of postings about implementing it with TermVectorComponent but thats in 1.4. I'd really appreciate some help. Thanks in advance. Paul
Re: Return 2 fields per facet.. name and id, for example? / facet value search
Hi, I have a similar requirement to Matthew (from his post 2 years ago). Is this still the way to go in storing both the ID and name/value for facet values? I'm planning to use id#name format if this is still the case and doing a prefix query. I believe this is a common requirement so I'd appreciate if any of you guys can share what's the best way to do it. Also, I'm indexing the facet values for text search as well. Should the field declaration below suffice the requirement? field name=category type=text indexed=true stored=true required=true multiValued=true/ Thanks, R Re: Return 2 fields per facet.. name and id, for example? Matthew Runo Fri, 07 Sep 2007 13:15:12 -0700 Ahh... sneaky. I'll probably do the combined-name#id method. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Sep 7, 2007, at 12:38 PM, Yonik Seeley wrote: On 9/7/07, Matthew Runo [EMAIL PROTECTED] wrote: I've found something which is either already in SOLR, or should be (as I can see it being very helpful). I couldn't figure out how to do it though.. Lets say I'm trying to print out a page of products, and I want to provide a list of brands to filter by. It would be great if in my facets I could get this sort of xml... int name=adidas id=145/int That way, I'd be able to know the brand id of adidas without having to run a second query somewhere for each facet to look it up. If you can get the name from the id in your webapp, then index the id to begin with (instead of the name). int name=145/int Or, if you need both the name and the id, index them both together, separated by a special character that you can strip out on the webapp side... int name=adidas#145/int -Yonik
Case insensitive search and original string
Hi, Totally a Solr newbie here. The docs and list have been helpful but I have a question on lowercase / case insensitive search. Do you really need to have another field (copied or not) to retain the original casing of a field? So let's say I have a field with a type that is lowercased during index and query time, where can I pull out the original string (non-lowercased) from the response? Should copyfield be used? Thanks, R
Re: Optimal Cache Settings, complicated by regular commits
Andrew, Which version of Solr are you using? There's an open issue to fix caching filters at the segment level, which will not clear the caches on each commit, you can vote to indicate your interest. http://issues.apache.org/jira/browse/SOLR-1308 -J On Thu, Aug 27, 2009 at 7:06 AM, Andrew Ingrama...@andrewingram.net wrote: Hi all, I'm trying to work out the optimum cache settings for our Solr server, I'll begin by outlining our usage. Number of documents: approximately 25,000 Commit frequency: sometimes we do massive amounts of sequential commits, most of the time its less frequent but still several times an hour We make heavy use of faceting and sorting, and the number of possible facets led to choosing a filterCache size of about 50,000 The problem we have is that the default cache settings resulting in very low hit rates (less than 30% for documents, less than 1% for filterCache), so we upped the cache size up gradually until the hit rates were in the 80s-90s, now we have the issue of commits being very slow (more than 5 seconds for a document), to the point where it causes a timeout elsewhere in our systems. This is made worse by the fact that committing seems to empty the cache, given that it takes about an hour to get the cache to a good state this is obviously very problematic. Is there a way for commits to selectively empty the cache? Any advice regarding the config would be appreciated. The server load is relatively low, ideally we're looking to minimize the response time rather than aim for CPU or memory efficiency. Regards, Andrew Ingram
Re: Updating a solr record
Hi Eric, I think I understand what you are saying but I'm not sure how it would work. I think you are saying to have two different indexes, each one has the same documents, but one has the hard-to-get fields and the other has the easy-to-get fields. Then I would make the same query twice, once to each index. So, let's say I'm looking for all documents that contain the word poem and I want to initially display the the 10 most relevant matches. I think I'd have to ask each index for its 10 most relevant matches, then merge them myself, and display the appropriate ones. Well, the same document could appear in both lists so I'd have to get rid of duplicates. Also, wouldn't the relevancy of the duplicate doc go up? But I wouldn't know by how much. That's the first problem, but then what if the user wants to see page 2? I certainly wouldn't query for documents #10-19 on each server. Eric Pugh wrote: Right... You know, if some of your data needs to updated frequently, but other is updated once per year, and is really massive dataset, then maybe splitting it up into separate cores? Since you mentioned that you can't get the raw data again, you could just duplicate your existing index by doing a filesytem copy. Leave that alone so you don't update it and lose your data, and start a new core that you can update and ignore the fact is has all the website data in it. And tie the two cores data sets together outside of Solr. Eric On Thu, Aug 27, 2009 at 1:46 PM, Paul Tomblinptomb...@xcski.com wrote: On Thu, Aug 27, 2009 at 1:27 PM, Eric Pughep...@opensourceconnections.com wrote: You can just query Solr, find the records that you want (including all the website data). Update them, and then send the entire record back. Correct me if I'm wrong, but I think you'd end up losing the fields that are indexed but not stored. -- http://www.linkedin.com/in/paultomblin
Alfresco has internal index - integrating into Solr
I am currently prototyping the use of Alfresco Document Management that has an internal Lucene to index all the documents managed by Alfresco. What would I need to understand in order to integrate that Lucene Index into a separate Solr installation? I am new to Solr and am trying to use Solr to index WCM produced files on a file system and then federate (integrate) the Alfresco Lucene Index. So I want to understand how I should do this from Solr and what I need to get from Alfresco. Thanks...jay blanton -- View this message in context: http://www.nabble.com/Alfresco-has-internal-index---integrating-into-Solr-tp25179342p25179342.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Replication
We have multiple solr webapps all running from the same WAR file. Each webapp is running under the same Tomcat container and I consider each webapp the same thing as a slice (or instance). I've configured the Tomcat container to enable JMX and when I connect using JConsole I only see the replication handler for one of the webapps in the server. I was under the impression each webapp gets its own replication handler. Is this not true? It would be nice to be able to have a JMX MBean for each replication handler in the container so we can get all the same replication information using JMX as in using the replication admin page for each web app. Thanks. From: noble.p...@corp.aol.com Date: Thu, 27 Aug 2009 13:04:38 +0530 Subject: Re: Solr Replication To: solr-user@lucene.apache.org when you say a slice you mean one instance of solr? So your JMX console is connecting to only one solr? On Thu, Aug 27, 2009 at 3:19 AM, J Gskinny_joe...@hotmail.com wrote: Thanks for the response. It's interesting because when I run jconsole all I can see is one ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice it finds on its path. Is there anyway to have multiple replication handlers or at least obtain replication on a per slice/instance via JMX like how you can see attributes for each slice/instance via each replication admin jsp page? Thanks again. From: noble.p...@corp.aol.com Date: Wed, 26 Aug 2009 11:05:34 +0530 Subject: Re: Solr Replication To: solr-user@lucene.apache.org The ReplicationHandler is not enforced as a singleton , but for all practical purposes it is a singleton for one core. If an instance (a slice as you say) is setup as a repeater, It can act as both a master and slave in the repeater the configuration should be as follows MASTER |_SLAVE (I am a slave of MASTER) | REPEATER (I am a slave of MASTER and master to my slaves ) | | REPEATER_SLAVE( of REPEATER) the point is that REPEATER will have a slave section has a masterUrl which points to master and REPEATER_SLAVE will have a slave section which has a masterurl pointing to repeater On Wed, Aug 26, 2009 at 12:40 AM, J Gskinny_joe...@hotmail.com wrote: Hello, We are running multiple slices in our environment. I have enabled JMX and I am inspecting the replication handler mbean to obtain some information about the master/slave configuration for replication. Is the replication handler mbean a singleton? I only see one mbean for the entire server and it's picking an arbitrary slice to report on. So I'm curious if every slice gets its own replication handler mbean? This is important because I have no way of knowing in this specific server any information about the other slices, in particular, information about the master/slave value for the other slices. Reading through the Solr 1.4 replication strategy, I saw that a slice can be configured to be a master and a slave, i.e. a repeater. I'm wondering how repeaters work because let's say I have a slice named 'A' and the master is on server 1 and the slave is on server 2 then how are these two servers communicating to replicate? Looking at the jmx information I have in the MBean both the isSlave and isMaster is set to true for my repeater so how does this solr slice know if it's the master or slave? I'm a bit confused. Thanks. _ With Windows Live, you can organize, edit, and share your photos. http://www.windowslive.com/Desktop/PhotoGallery -- - Noble Paul | Principal Engineer| AOL | http://aol.com _ Hotmail® is up to 70% faster. Now good news travels really fast. http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009 -- - Noble Paul | Principal Engineer| AOL | http://aol.com _ With Windows Live, you can organize, edit, and share your photos. http://www.windowslive.com/Desktop/PhotoGallery
Re: How to reduce the Solr index size..
2009/8/27 Fuad Efendi f...@efendi.ca: stored=true means that this piece of info will be stored in a filesystem. So that your index will contain 1Mb of pure log PLUS some info related to indexing itself: terms, etc. Search speed is more important than index size... Not if you run out of space for the index. :-) And note this: message field contains actual log, stored=true, so that only this field will make 1Mb if not indexed -Original Message- From: Silent Surfer [mailto:silentsurfe...@yahoo.com] Sent: August-20-09 11:01 AM To: Solr User Subject: How to reduce the Solr index size.. Hi, I am newbie to Solr. We recently started using Solr. We are using Solr to process the server logs. We are creating the indexes for each line of the logs, so that users would be able to do a fine grain search upto second/ms. Now what we are observing is , the index size that is being created is almost double the size of the actual log size. i.e if the logs size is say 1 MB, the actual index size is around 2 MB. Could anyone let us know what can be done to reduce the index size. Do we need to change any configurations/delete any files which are created during the indexing processes, but not required for searching.. Our schema is as follows: field name=pkey type=string indexed=true stored=true required=false / field name=date type=date indexed=true stored=true omitNorms=true/ field name=level type=string indexed=true stored=true/ field name=app type=string indexed=true stored=true/ field name=server type=string indexed=true stored=true/ field name=port type=string indexed=true stored=true/ field name=class type=string indexed=true stored=true/ field name=method type=string indexed=true stored=true/ field name=filename type=string indexed=true stored=true/ field name=linenumber type=string indexed=true stored=true/ field name=message type=text indexed=true stored=true/ message field holds the actual logtext. Thanks, sS -- -
RE: Updating a solr record
I haven't read all messages in this thread yet, but I probably have an answer to some questions... 1. You want to change schema.xml and to reindex, but you don't have access to source documents (stored somewhere on Internet). But you probably use stored=true in your schema. Then, use SOLR as your storage device, use id:[* TO *] to retrieve documents from SOLR and reindex it in another SOLR schema... 2. If you don't use stored=true you can still get access to term vectors, which you can probably reuse to create fake field with same term vector in an updated document... just an idea, may be I am wrong... -Original Message- From: Paul Rosen [mailto:p...@performantsoftware.com] Sent: August-27-09 1:22 PM To: solr-user@lucene.apache.org Subject: Updating a solr record I realize there is no way to update particular fields in a solr record. I know the recommendation is to delete the record from the index and re-add it, but in my case, it is difficult to completely reindex, so that creates problems with my work flow. That is, the info that I use to create a solr doc comes from two places: a local file that contains most of the info, and a URL in that file that points to a web page that contains the rest of the info. To completely reindex, we have to hit every website again, which is problematic for a number of reasons. (Plus, those websites don't change much, so it is just wasted effort.) (Once in a while we do reindex, and it is a huge production to do so.) But that means that if I want to make a small change to either schema.xml or the local files that I'm indexing, I can't. I can't even fix minor bugs until our yearly reindexing. So, the question is: Is there any way to get the info that is already in the solr index for a document, so that I can use that as a starting place? I would just tweak that record and add it again. Thanks, Paul
Re: Case insensitive search and original string
--- On Thu, 8/27/09, Rihaed Tan tanrihae...@gmail.com wrote: From: Rihaed Tan tanrihae...@gmail.com Subject: Case insensitive search and original string To: solr-user@lucene.apache.org Date: Thursday, August 27, 2009, 10:10 PM Hi, Totally a Solr newbie here. The docs and list have been helpful but I have a question on lowercase / case insensitive search. Do you really need to have another field (copied or not) to retain the original casing of a field? So let's say I have a field with a type that is lowercased during index and query time, where can I pull out the original string (non-lowercased) from the response? Should copyfield be used? Thanks, R Are you asking for displaying purpose? If yes by default Solr gives you original string of a field in the response. Stemming, lowercasing, etc do not effect this behaviour. You can allways display original documents to the users. If you want to capture original words -that matched the query terms- from original documents, then use highlighting. ( hl=truehl.fragsize=0 ) You will find those words between em /em tags in the response.
RE: Alfresco has internal index - integrating into Solr
Check also Liferay trunk and WIKI pages, it had similar problem - and they have plugin for SOLR now, just a matter of configuration change - and search implementation is SOLR... They use SolrJ to do this task, and generic wrappers around search implementation (which could be anything)... -Fuad http://www.linkedin.com/in/liferay -Original Message- From: jaybytez [mailto:jayby...@gmail.com] Sent: August-27-09 4:27 PM To: solr-user@lucene.apache.org Subject: Alfresco has internal index - integrating into Solr I am currently prototyping the use of Alfresco Document Management that has an internal Lucene to index all the documents managed by Alfresco. What would I need to understand in order to integrate that Lucene Index into a separate Solr installation? I am new to Solr and am trying to use Solr to index WCM produced files on a file system and then federate (integrate) the Alfresco Lucene Index. So I want to understand how I should do this from Solr and what I need to get from Alfresco. Thanks...jay blanton -- View this message in context: http://www.nabble.com/Alfresco-has-internal-index---integrating-into-Solr-tp 25179342p25179342.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: tag cloud with solr 1.3
Hi all,How would I go about implementing a 'tag cloud' with Solr1.3? All I want to do is to display a list of most occurring terms in the corpus. Is there an easy way to go about that in 1.3? Yes http://localhost:8983/solr/admin/luke?fl=textnumTerms=100 will give you top 100 most occurring terms from field named text.
SnowballPorterFilterFactory stemming word question
i have a field defined in my schema.xml file fieldtype name=stemField class=solr.TextField analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SnowballPorterFilterFactory language=English / /analyzer /fieldtype If i analyse this field type in analysis.jsp, the follwoing are the results if i give running its stems word to run which is fine If i give machine why is that it stems to machin, now from where does this word come from If i give revolutionary it stems to revolutionari, i thought it should stem to revolution. How does stemming work? Does it reduces adverb to verb etc..., or we have to customize it. Please let me know Thanks -- View this message in context: http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25180310.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: encoding problem
Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS through either the GUI or to startup.bat, but absolutely no impact. Have tried reindexing also, but still no impact - results such as - “My Universe is Here� bern -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:50 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the repository software that we are running. Tomcat respects an environment variable called JAVA_OPTS through which you can pass any jvm argument (e.g. heap size, file encoding). Set JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the following to startup.bat: set JAVA_OPTS=-Dfile.encoding=UTF-8 -- Regards, Shalin Shekhar Mangar.
Re: encoding problem
Have you determined if the problem is on the indexing side or the query side? I don't see any reason you should have to set/change any encoding in the JVM. -Yonik http://www.lucidimagination.com On Thu, Aug 27, 2009 at 7:03 PM, Bernadette Houghtonbernadette.hough...@deakin.edu.au wrote: Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS through either the GUI or to startup.bat, but absolutely no impact. Have tried reindexing also, but still no impact - results such as - “My Universe is Here� bern -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:50 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the repository software that we are running. Tomcat respects an environment variable called JAVA_OPTS through which you can pass any jvm argument (e.g. heap size, file encoding). Set JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the following to startup.bat: set JAVA_OPTS=-Dfile.encoding=UTF-8 -- Regards, Shalin Shekhar Mangar.
RE: Searching and Displaying Different Logical Entities
Funtick wrote: then 2) get all P's by ID, including facet counts, etc. The problem I face with this solution is that I can have many matching P's (10,000+), so my second query will have many (10,000+) constraints. SOLR can automatically provide you P's with Counts, and it will be _unique_... I assume you mean to facet by P in the C index. My next problem is to sort those P's based on some attribute of P (as opposed to alphabetically or by occurrence in C). Funtick wrote: Even if cardinality of P is 10,000+ SOLR is very fast now (expect few seconds response time for initial request). You need single query with faceting... Is there a practical limit for maxBooleanClauses? The default is 1024, but I need at least 10,000. Funtick wrote: (!) You do not need P's ID. Single document will have unique ID, and fields such as P, C (with possible attributes). Do not think in terms of RDBMS... Lucene does all 'normalization' behind the scenes, and SOLR will give you Ps with Cs... If I put both P's and C's into a single index, then I agree, I don't need P's ID. If I have P and C in separate indices then I still need to maintain the logical relationship between P and C. It wasn't clear to me if you suggested I continue with either of my 2 proposed solutions. Can you clarify? Thanks, Wojtek -- View this message in context: http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp25156301p25181664.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: encoding problem
Shalin, the XML from solr admin for the relevant field is displaying as - str name=citation_ta title=Browse by Author Name for Moncrieff, Joan href=/fez/list/author/Moncrieff%2C+Joan/Moncrieff, Joan/a, a title=Browse by Author Name for Macauley, Peter href=/fez/list/author/Macauley%2C+Peter/Macauley, Peter/a and a title=Browse by Author Name for Epps, Janine href=/fez/list/author/Epps%2C+Janine/Epps, Janine/a a title=Browse by Year 2006 href=/fez/list/year/2006/2006/a, a title=Click to view Journal, Media Article: ldquo;My Universe is Hererdquo;: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers href=/fez/view/changeme:156“My Universe is Here�: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers/ai/i, vol. 38, no. 2, pp. 71-83./str The weird thing is that the title displays OK in one place, but not in the href bit. bern
Can solr do the equivalent of select distinct(field)?
Can I get all the distinct values from the Solr database, or do I have to select everything and aggregate it myself? -- http://www.linkedin.com/in/paultomblin
Re: Updating a solr record
I guess if you have stored=true then there is no problem. 2. If you don't use stored=true you can still get access to term vectors, which you can probably reuse to create fake field with same term vector in an updated document... just an idea, may be I am wrong... Reconstructing a the field value from a term enum might work... of course the value won't be as the original value, but when indexed, if you don't have any really special filters (e.g. shingle filter), most likely the tokens will be re-indexed as they are (that is, it is most likely that the filters will not have any effect). just make sure to take the position increments in account! for example, if you have synonym filter set up, then you'll need to choose only one term in a single position (otherwise the term frequency of the document will increase on every update). Uri Fuad Efendi wrote: I haven't read all messages in this thread yet, but I probably have an answer to some questions... 1. You want to change schema.xml and to reindex, but you don't have access to source documents (stored somewhere on Internet). But you probably use stored=true in your schema. Then, use SOLR as your storage device, use id:[* TO *] to retrieve documents from SOLR and reindex it in another SOLR schema... 2. If you don't use stored=true you can still get access to term vectors, which you can probably reuse to create fake field with same term vector in an updated document... just an idea, may be I am wrong... -Original Message- From: Paul Rosen [mailto:p...@performantsoftware.com] Sent: August-27-09 1:22 PM To: solr-user@lucene.apache.org Subject: Updating a solr record I realize there is no way to update particular fields in a solr record. I know the recommendation is to delete the record from the index and re-add it, but in my case, it is difficult to completely reindex, so that creates problems with my work flow. That is, the info that I use to create a solr doc comes from two places: a local file that contains most of the info, and a URL in that file that points to a web page that contains the rest of the info. To completely reindex, we have to hit every website again, which is problematic for a number of reasons. (Plus, those websites don't change much, so it is just wasted effort.) (Once in a while we do reindex, and it is a huge production to do so.) But that means that if I want to make a small change to either schema.xml or the local files that I'm indexing, I can't. I can't even fix minor bugs until our yearly reindexing. So, the question is: Is there any way to get the info that is already in the solr index for a document, so that I can use that as a starting place? I would just tweak that record and add it again. Thanks, Paul
Re: Can Apache Solr have more than one schema?
Not in the same core. You can define multiple cores where each core is a separate solr instance except they all run within one container. each core has its own index, schema and configuration. If you want to compare it to databases, then I guess a core is to Solr Server what a database is to its RDBMS. Khai Doan wrote: Hello, My name is Khai. I am new to Apache Solr. My question is: Can we have more than one schema / table? Thanks! Khai
Re: Case insensitive search and original string
Hi Ahmet, Yes, for display purpose. Okay, so I don't have to copy fields then. Thank you very much. R On Fri, Aug 28, 2009 at 4:57 AM, AHMET ARSLAN iori...@yahoo.com wrote: --- On Thu, 8/27/09, Rihaed Tan tanrihae...@gmail.com wrote: From: Rihaed Tan tanrihae...@gmail.com Subject: Case insensitive search and original string To: solr-user@lucene.apache.org Date: Thursday, August 27, 2009, 10:10 PM Hi, Totally a Solr newbie here. The docs and list have been helpful but I have a question on lowercase / case insensitive search. Do you really need to have another field (copied or not) to retain the original casing of a field? So let's say I have a field with a type that is lowercased during index and query time, where can I pull out the original string (non-lowercased) from the response? Should copyfield be used? Thanks, R Are you asking for displaying purpose? If yes by default Solr gives you original string of a field in the response. Stemming, lowercasing, etc do not effect this behaviour. You can allways display original documents to the users. If you want to capture original words -that matched the query terms- from original documents, then use highlighting. ( hl=truehl.fragsize=0 ) You will find those words between em /em tags in the response.
Re: Can Apache Solr have more than one schema?
Thanks Uri, Now my question is: how can I specify which schema to query against? Thanks! Khai On Thu, Aug 27, 2009 at 5:43 PM, Uri Boness ubon...@gmail.com wrote: Not in the same core. You can define multiple cores where each core is a separate solr instance except they all run within one container. each core has its own index, schema and configuration. If you want to compare it to databases, then I guess a core is to Solr Server what a database is to its RDBMS. Khai Doan wrote: Hello, My name is Khai. I am new to Apache Solr. My question is: Can we have more than one schema / table? Thanks! Khai
Ok, why isn't this working?
I've loaded some data into my solr using the embedded server, and I can see the data using Luke. I start up the web app, and it says cwd=/Users/ptomblin/apache-tomcat-6.0.20 SolrHome=/Users/ptomblin/src/lucidity/solr/ I hit the schema button and it shows the correct schema. However, if I type anything into the query window, it never returns anything. I've tried things that I know for sure are in the default search field, but all I get back is ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime2/int lst name=params str name=indenton/str str name=start0/str str name=qscientist/str str name=rows10/str str name=version2.2/str /lst /lst result name=response numFound=0 start=0/ /response How can I figure out why I'm not getting any results back? Any log files I can look at? -- http://www.linkedin.com/in/paultomblin
Re: Can Apache Solr have more than one schema?
If you have configured multi-core, then all you need to do is use the following url pattern: http://hostname:port/solr/core_name/select?q=... where core_name is the name of the core you wish to query. Uri Khai Doan wrote: Thanks Uri, Now my question is: how can I specify which schema to query against? Thanks! Khai On Thu, Aug 27, 2009 at 5:43 PM, Uri Boness ubon...@gmail.com wrote: Not in the same core. You can define multiple cores where each core is a separate solr instance except they all run within one container. each core has its own index, schema and configuration. If you want to compare it to databases, then I guess a core is to Solr Server what a database is to its RDBMS. Khai Doan wrote: Hello, My name is Khai. I am new to Apache Solr. My question is: Can we have more than one schema / table? Thanks! Khai
UpdateRequestProcessor config location
I've read through the wiki for this and it explains most everything except where in the solrconfig.xml theupdateRequestProcessorChain goes. I tried it at the top level but that doesn't seem to do anything. http://wiki.apache.org/solr/UpdateRequestProcessor
Count of records
Hi, We have integrated Solr index with Carrot2 Search Engine and able to get search results. In my search results page, by default Total Number of records matched for the particular query is not getting displayed. http://localhost:8089/carrot2-webapp-3.0.1/search?source=Solrview=treeskin=simplequery=javaresults=100algorithm=lingoSolrDocumentSource.solrTitleFieldName=title SolrDocumentSource.solrSummaryFieldName=descriptionSolrDocumentSource.solrUrlFieldName=url Currently I am getting like, Results 1 - 100 of about 100 for java Consider I searched for Java; In my Solr index, total number of matches found are 1000. I am interested to display only top 100 results. I should also get total match for the search query. Display should be similar to below: Results 1 - 100 of about 1000 for java Regards Bhaskar
Re: Ok, why isn't this working?
On Thu, Aug 27, 2009 at 9:24 PM, Paul Tomblinptomb...@xcski.com wrote: cwd=/Users/ptomblin/apache-tomcat-6.0.20 SolrHome=/Users/ptomblin/src/lucidity/solr/ Ok, I've spotted the problem - while SolrHome is in the right place, it's still looking for the data in /Users/ptomblin/apache-tomcat-6.0.20/solr/data/ How can I changed that? -- http://www.linkedin.com/in/paultomblin
Re: Pattern matching in Solr
Hi, In Schema.xml file,I am not able ot find splitOnCaseChange=1. I am not looking for case sensitive search. Let me know what file you are refering to?. I am looking for exact match search only Moreover for scenario 2 the KeywordTokenizerFactory and EdgeNGramFilterFactory refers which link in Solr wiki. Regards Bhaskar --- On Thu, 8/27/09, Avlesh Singh avl...@gmail.com wrote: From: Avlesh Singh avl...@gmail.com Subject: Re: Pattern matching in Solr To: solr-user@lucene.apache.org Date: Thursday, August 27, 2009, 2:10 AM In Schema.xml file,I am not able ot find splitOnCaseChange=1. Unless you have modified the stock field type definition of text field in your core's schema.xml you should be able to find this property set for the WordDelimiterFilterFactory. Read more here - http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089 Moreover for scenario 2 the KeywordTokenizerFactory and EdgeNGramFilterFactory refers which link in Solr wiki. Google for these two. Cheers Avlesh On Thu, Aug 27, 2009 at 12:21 PM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, In Schema.xml file,I am not able ot find splitOnCaseChange=1. I am not looking for case sensitive search. Let me know what file you are refering to?. I am looking for exact match search only Moreover for scenario 2 the KeywordTokenizerFactory and EdgeNGramFilterFactory refers which link in Solr wiki. Regards Bhaskar --- On Wed, 8/26/09, Avlesh Singh avl...@gmail.com wrote: From: Avlesh Singh avl...@gmail.com Subject: Re: Pattern matching in Solr To: solr-user@lucene.apache.org Date: Wednesday, August 26, 2009, 11:31 AM You could have used your previous thread itself ( http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr ), Bhaskar. In your scenario one, you need an exact token match, right? You are getting expected results if your field type is text. Look for the WordDelimiterFilterFactory in your field type definition for the text field inside schema.xml. You'll find an attribute splitOnCaseChange=1. Because of this, ChandarBhaskar is converted into two tokens Chandra and Bhaskar and hence the matches. You may choose to remove this attribute if the behaviour is not desired. For your scenario two, you may want to look at the KeywordTokenizerFactory and EdgeNGramFilterFactory on Solr wiki. Generally, for all such use cases people create multiple fields in their schema storing the same data analyzed in different ways. Cheers Avlesh On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, Can any one help me with the below scenario?. Scenario 1: Assume that I give Google as input string i am using Carrot with Solr Carrot is for front end display purpose the issue is Assuming i give BHASKAR as input string It should give me search results pertaining to BHASKAR only. Select * from MASTER where name =Bhaskar; Example:It should not display search results as ChandarBhaskar or BhaskarC. Should display Bhaskar only. Scenario 2: Select * from MASTER where name like %BHASKAR%; It should display records containing the word BHASKAR Ex: Bhaskar ChandarBhaskar BhaskarC Bhaskarabc How to achieve Scenario 1 in Solr ?. Regards Bhaskar __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Why isn't this working?
Yesterday or the day before, I asked specifically if I would need to restart the Solr server if somebody else loaded data into the Solr index using the EmbeddedServer, and I was told confidently that no, the Solr server would see the new data as soon as it was committed. So today I fired up the Solr server (and after making apache-tomcat-6.0.20/solr/data a symlink to where the Solr data really lives and restarting the web server), and did some queries. Then I ran a program that loaded a bunch of data and committed it. Then I did the queries again. And the new data is NOT showing. Using Luke, I can see 10022 documents in the index, but the Solr statistics page (http://localhost:8080/solrChunk/admin/stats.jsp) is still showing 8677, which is how many there were before I reloaded the data. So am I doing something wrong, or was the assurance I got yesterday that this is possible wrong? -- http://www.linkedin.com/in/paultomblin
Single Configuration for Master/Slave Replication - SOLR-1355
Hello, I noticed the the documentation around Solr Replication in the wiki has recently changed to take Paul's patch into account (SOLR-1355). I now see that with the current trunk of SOLR 1.4 it is possible to use a single solrconfig.xml to define both master and slave configurations, with environment variables determining which mode is selected. Can settings outside of the replication handler be set different based on which mode is enabled as well? For example, settings such as cache sizes might differ between a master and a slave configuration (ie autowarming, cache sizes, etc). Can those similarly be wrapped in a lst tag with a name of master or slave set? Thanks, Ilan -- Ilan Rabinovitch i...@fonz.net --- SCALE 8x: 2010 Southern California Linux Expo Feb 19-21, 2010 Los Angeles, CA http://www.socallinuxexpo.org
Re: Why isn't this working?
On Aug 27, 2009, at 10:35 PM, Paul Tomblin wrote: Yesterday or the day before, I asked specifically if I would need to restart the Solr server if somebody else loaded data into the Solr index using the EmbeddedServer, and I was told confidently that no, the Solr server would see the new data as soon as it was committed. So today I fired up the Solr server (and after making apache-tomcat-6.0.20/solr/data a symlink to where the Solr data really lives and restarting the web server), and did some queries. Then I ran a program that loaded a bunch of data and committed it. Then I did the queries again. And the new data is NOT showing. Using Luke, I can see 10022 documents in the index, but the Solr statistics page (http://localhost:8080/solrChunk/admin/stats.jsp) is still showing 8677, which is how many there were before I reloaded the data. So am I doing something wrong, or was the assurance I got yesterday that this is possible wrong? did not follow the advice from yesterday... but... the commit word can be a but misleading, it could also be called reload Say you have an embedded solr server and an http solr server pointed to the same location. 1. make sure only is read only! otherwise you can make a mess. 2. calling commit on the embedded solr instance, will not have any effect on the http instance UNTIL you call commit (reload) on the http instance. ryan
Re: Single Configuration for Master/Slave Replication - SOLR-1355
any attribute specified in solrcore.properties can be referenced in solrconfig.xml/schema.xml. this has nothing specific with replication. On Fri, Aug 28, 2009 at 8:19 AM, Ilan Rabinovitchi...@fonz.net wrote: Hello, I noticed the the documentation around Solr Replication in the wiki has recently changed to take Paul's patch into account (SOLR-1355). I now see that with the current trunk of SOLR 1.4 it is possible to use a single solrconfig.xml to define both master and slave configurations, with environment variables determining which mode is selected. Can settings outside of the replication handler be set different based on which mode is enabled as well? For example, settings such as cache sizes might differ between a master and a slave configuration (ie autowarming, cache sizes, etc). Can those similarly be wrapped in a lst tag with a name of master or slave set? Thanks, Ilan -- Ilan Rabinovitch i...@fonz.net --- SCALE 8x: 2010 Southern California Linux Expo Feb 19-21, 2010 Los Angeles, CA http://www.socallinuxexpo.org -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: UpdateRequestProcessor config location
could you provide more details on what exactly is that you have done? On Fri, Aug 28, 2009 at 7:08 AM, Erik Earleerikea...@yahoo.com wrote: I've read through the wiki for this and it explains most everything except where in the solrconfig.xml theupdateRequestProcessorChain goes. I tried it at the top level but that doesn't seem to do anything. http://wiki.apache.org/solr/UpdateRequestProcessor -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Solr Replication
Each instance has its own ReplicationHandler instance/MBean. I guess the problem is with the jmx implementation. both MBeans may be registered with the same name On Fri, Aug 28, 2009 at 2:04 AM, J Gskinny_joe...@hotmail.com wrote: We have multiple solr webapps all running from the same WAR file. Each webapp is running under the same Tomcat container and I consider each webapp the same thing as a slice (or instance). I've configured the Tomcat container to enable JMX and when I connect using JConsole I only see the replication handler for one of the webapps in the server. I was under the impression each webapp gets its own replication handler. Is this not true? It would be nice to be able to have a JMX MBean for each replication handler in the container so we can get all the same replication information using JMX as in using the replication admin page for each web app. Thanks. From: noble.p...@corp.aol.com Date: Thu, 27 Aug 2009 13:04:38 +0530 Subject: Re: Solr Replication To: solr-user@lucene.apache.org when you say a slice you mean one instance of solr? So your JMX console is connecting to only one solr? On Thu, Aug 27, 2009 at 3:19 AM, J Gskinny_joe...@hotmail.com wrote: Thanks for the response. It's interesting because when I run jconsole all I can see is one ReplicationHandler jmx mbean. It looks like it is defaulting to the first slice it finds on its path. Is there anyway to have multiple replication handlers or at least obtain replication on a per slice/instance via JMX like how you can see attributes for each slice/instance via each replication admin jsp page? Thanks again. From: noble.p...@corp.aol.com Date: Wed, 26 Aug 2009 11:05:34 +0530 Subject: Re: Solr Replication To: solr-user@lucene.apache.org The ReplicationHandler is not enforced as a singleton , but for all practical purposes it is a singleton for one core. If an instance (a slice as you say) is setup as a repeater, It can act as both a master and slave in the repeater the configuration should be as follows MASTER |_SLAVE (I am a slave of MASTER) | REPEATER (I am a slave of MASTER and master to my slaves ) | | REPEATER_SLAVE( of REPEATER) the point is that REPEATER will have a slave section has a masterUrl which points to master and REPEATER_SLAVE will have a slave section which has a masterurl pointing to repeater On Wed, Aug 26, 2009 at 12:40 AM, J Gskinny_joe...@hotmail.com wrote: Hello, We are running multiple slices in our environment. I have enabled JMX and I am inspecting the replication handler mbean to obtain some information about the master/slave configuration for replication. Is the replication handler mbean a singleton? I only see one mbean for the entire server and it's picking an arbitrary slice to report on. So I'm curious if every slice gets its own replication handler mbean? This is important because I have no way of knowing in this specific server any information about the other slices, in particular, information about the master/slave value for the other slices. Reading through the Solr 1.4 replication strategy, I saw that a slice can be configured to be a master and a slave, i.e. a repeater. I'm wondering how repeaters work because let's say I have a slice named 'A' and the master is on server 1 and the slave is on server 2 then how are these two servers communicating to replicate? Looking at the jmx information I have in the MBean both the isSlave and isMaster is set to true for my repeater so how does this solr slice know if it's the master or slave? I'm a bit confused. Thanks. _ With Windows Live, you can organize, edit, and share your photos. http://www.windowslive.com/Desktop/PhotoGallery -- - Noble Paul | Principal Engineer| AOL | http://aol.com _ Hotmail® is up to 70% faster. Now good news travels really fast. http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009 -- - Noble Paul | Principal Engineer| AOL | http://aol.com _ With Windows Live, you can organize, edit, and share your photos. http://www.windowslive.com/Desktop/PhotoGallery -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: SnowballPorterFilterFactory stemming word question
If i analyse this field type in analysis.jsp, the follwoing are the results if i give running its stems word to run which is fine If i give machine why is that it stems to machin, now from where does this word come from If i give revolutionary it stems to revolutionari, i thought it should stem to revolution. Stemmers used in Information Retrieval are not for human consumption. Reducing revolutionary to revolutionari do not change the fact that query revolutionary will return documents containing revolutionary. How does stemming work? Does it reduces adverb to verb etc..., or we have to customize it. Stemmers aim to remove inflectional suffixes from words. Snowball stemmers are rule based stemmers. Rules and endings are defined. e.g. if ending s remove it. apples - apple It will be difficult to customize existing snowball stemmers, i guess. If you are looking for a less aggressive stemmer then you can use KStem.
Re: UpdateRequestProcessor config location
I've implemented a fairly simple UpdateRequestProcessor much like the example here: http://wiki.apache.org/solr/UpdateRequestProcessor I attempted the below configuration in solrconfig.xml (like the above link shows) but nothing happens, no errorsnothing. Is this configuration supposed to be under the config tag? config updateRequestProcessorChain processor class=com.erik.earle.MyUpdateRequestProcessor lst name=default str name=paramlist, of, comma, sep, values/str /lst /processor processor class=solr.RunUpdateProcessorFactory / processor class=solr.LogUpdateProcessorFactory / /updateRequestProcessorChain /config - Original Message From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com To: solr-user@lucene.apache.org Sent: Thursday, August 27, 2009 9:57:54 PM Subject: Re: UpdateRequestProcessor config location could you provide more details on what exactly is that you have done? On Fri, Aug 28, 2009 at 7:08 AM, Erik Earleerikea...@yahoo.com wrote: I've read through the wiki for this and it explains most everything except where in the solrconfig.xml theupdateRequestProcessorChain goes. I tried it at the top level but that doesn't seem to do anything. http://wiki.apache.org/solr/UpdateRequestProcessor -- - Noble Paul | Principal Engineer| AOL | http://aol.com