Re: using DataImportHandler with ExtractRequestHandler ?
Thanks Steven for the quick reply .. On Wed, Oct 14, 2009 at 1:56 AM, Steven A Rowe sar...@syr.edu wrote: See http://issues.apache.org/jira/browse/SOLR-1358 Steve -Original Message- From: abhay kumar [mailto:abhay...@gmail.com] Sent: Tuesday, October 13, 2009 8:59 AM To: solr-user@lucene.apache.org; solr-user- sc.1251278899.kmoigkhhnpcnaplolgcb- abhayait=gmail@lucene.apache.org; solr-user- sc.1253450516.pndkohgcdcidbclnkelo-abhayait=gmail@lucene.apache.org Subject: using DataImportHandler with ExtractRequestHandler ? Hi , We are using solr-1.4 for our search module. We have a long schema (35 fields) whose some field values comes from database some field(Actually 1) value comes from different file formats. We are able to index different file formats using Solr Cell ExtractRequestHandler . Data from database can be indexed using DataImportHandler. Now, I want to call both(DataImportHandler ExtractRequestHandler ) requesthandlers at the same time for each document. Is it possible?How? Or Can DataImportHandler call ExtractRequestHandler or vice versa ? Or Can these two RequestHandlers be called combined for one document ? If yes, How ? *For e.g.* Let's take 2 fields.. resumeContent = it's value is stored in a file(pdf,word,doc) . So we need to use ExtractRequestHandler to get it's value. resumeTitle = It's value is stored in database. So I need to use DataImportHandler to get it's value from database. These 2 fields make one document. How DataImportHandler can be used with ExtractRequestHandler or vice versa for the same document which some field values comes form database some field values comes from different document formats ? I don't want to extract different document formats store it's content(body) in database before indexing . We are in agile development work. So a quick response will be appreciated. Regards, Abhay
Re: Error when indexing XML files
Hi, I am trying to index XML files using SolrJ. The original XML file contains nested elements. For example, the following is the snippet of the XML file. entry nameSOMETHING /name facilitySOME_OTHER_THING/facility /entry I have added the elements name and facility in Schema.xml file to make these elements indexable. I have changed the XML document above to look like - add doc .. field name=nameSOMETHING/field .. /doc /add Can you send us the Schema.xml file you created? I suspect that one of the fields should be multivalued. -- Fergus.
One more happy Solr user ...
I am pleased to announce the latest release of a popular Indian local search portal called http://www.burrp.com http://mumbai.burrp.com. In prior versions of this web application, search was Lucene driven and we had to write our own implementation of search facets amongst other painful tasks. I can't be happier to inform everyone on this list that search/suggest features on the burrp! site are now powered by Solr. Please use it and let me know if we can make it better. Very soon, I'll be back to report another usage of Solr (a grand one by scale). Thank you Solr developers. Cheers Avlesh
Re: One more happy Solr user ...
Hi Nice site. First search I tried was for 'italien' in 'Mumbai' which returned zero results. Are you using spellcheck suggestions? Apart from that it's nice and fast. Regards Andrew McCombe iWebsolutions.co.uk 2009/10/14 Avlesh Singh avl...@gmail.com I am pleased to announce the latest release of a popular Indian local search portal called http://www.burrp.com http://mumbai.burrp.com. In prior versions of this web application, search was Lucene driven and we had to write our own implementation of search facets amongst other painful tasks. I can't be happier to inform everyone on this list that search/suggest features on the burrp! site are now powered by Solr. Please use it and let me know if we can make it better. Very soon, I'll be back to report another usage of Solr (a grand one by scale). Thank you Solr developers. Cheers Avlesh
Re: One more happy Solr user ...
Ah! I knew that was coming :) We are planning a spell-checker integration pretty soon. Thanks for trying out the site Andrew. Cheers Avlesh On Wed, Oct 14, 2009 at 2:53 PM, Andrew McCombe eupe...@gmail.com wrote: Hi Nice site. First search I tried was for 'italien' in 'Mumbai' which returned zero results. Are you using spellcheck suggestions? Apart from that it's nice and fast. Regards Andrew McCombe iWebsolutions.co.uk 2009/10/14 Avlesh Singh avl...@gmail.com I am pleased to announce the latest release of a popular Indian local search portal called http://www.burrp.com http://mumbai.burrp.com. In prior versions of this web application, search was Lucene driven and we had to write our own implementation of search facets amongst other painful tasks. I can't be happier to inform everyone on this list that search/suggest features on the burrp! site are now powered by Solr. Please use it and let me know if we can make it better. Very soon, I'll be back to report another usage of Solr (a grand one by scale). Thank you Solr developers. Cheers Avlesh
Re: One more happy Solr user ...
Hi Avlesh, that is mean to sent something like that http://mumbai.burrp.com/pack/list/kolkata-on-a-roll around at lunch time - in Germany(!). Very very sadly, there are many places in Mumbai that have mastered the art of making authentic Kolkata rolls but I don't know of any here in Munich Congratulations for launching successfully! Chantal Avlesh Singh schrieb: I am pleased to announce the latest release of a popular Indian local search portal called http://www.burrp.com http://mumbai.burrp.com. In prior versions of this web application, search was Lucene driven and we had to write our own implementation of search facets amongst other painful tasks. I can't be happier to inform everyone on this list that search/suggest features on the burrp! site are now powered by Solr. Please use it and let me know if we can make it better. Very soon, I'll be back to report another usage of Solr (a grand one by scale). Thank you Solr developers. Cheers Avlesh
Re: One more happy Solr user ...
If burrp! can keep pace with Solr enhancements, we are not too far from a munich.burrp.com ;) Thanks for checking out the site, Chantal. Cheers Avlesh On Wed, Oct 14, 2009 at 4:47 PM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Hi Avlesh, that is mean to sent something like that http://mumbai.burrp.com/pack/list/kolkata-on-a-roll around at lunch time - in Germany(!). Very very sadly, there are many places in Mumbai that have mastered the art of making authentic Kolkata rolls but I don't know of any here in Munich Congratulations for launching successfully! Chantal Avlesh Singh schrieb: I am pleased to announce the latest release of a popular Indian local search portal called http://www.burrp.com http://mumbai.burrp.com. In prior versions of this web application, search was Lucene driven and we had to write our own implementation of search facets amongst other painful tasks. I can't be happier to inform everyone on this list that search/suggest features on the burrp! site are now powered by Solr. Please use it and let me know if we can make it better. Very soon, I'll be back to report another usage of Solr (a grand one by scale). Thank you Solr developers. Cheers Avlesh
Sorting on Multiple fields
We have come up against a situation we are trying to resolve in our Solr implementation project. This revolves mostly around how to sort results from index data we are likely to store in multiple fields but at runtime we are likely to query on the result of which one is most relevant. A brief example: We have product catalog information in the index which will have multiple prices dependent on the user logged in and other scenarios. For simplification this will look something like this: price_id101 = 100.00 price_id102 = 105.00 price_id103 = 110.00 price_id104 = 95.00 (etc) What we are looking at is at runtime we want to know which one of several selected prices is the minimum (or maximum), but not all prices, just a select set of say 3 or 2 id's. The purpose we are looking at is to determine a sort order to the results. This as we would be aware approaching a SQL respository we would feed it some query logic to say find me the least amount of these set of id's, therefore the search approach here raises some questions. - Do we attempt to raise some sort of functional query to find the least amount of the requested price id's? This would seem to imply some playing around in the query handler to allow a function of this sort. - Do we look at this rather than some internal method to handle the query and sort actions as a matter of relevancy on a calculated field? If so the methods of determining the fields included in the calculated field are alluding me at the moment. So pointers are welcome. - Does this ultimately involve the implementation of some sort of custom type and handler to do this sort of task. I am open to any response as if someone has not come across a similar problem before and can suggest an approach we are willing to open up a patch branch or similar to do some work on the issue. Though if there are no suggestions this will likely move out of our current stream and into future development. Neil
hadoop configuarions for SOLR-1301 patch
Hi, I am using SOLR-1301 path. I have build the solr with given patch. But I am not able to configure Hadoop for above war. I want to run solr(create index) with 3 nodes (1+2) cluster. How to do the Hadoop configurations for above patch? How to set master and slave? Thanks -Pravin DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: Boosting of words
Hi Clark, Thanks for your input.I have a query. I have my XML which contains the following: add doc field name=urlhttp://www.sun.com/field field name=titleinformation/field field name=descriptionjava plays a important role in computer industry for web users/field /doc doc field name=urlhttp://www.askguru.com/field field name=titlehomepage/field field name=descriptionInformation about technology is stored in the web sites/field /doc doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc /add When I give “java technology” as my input in Solr admin page ,At present I get output as doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc Now I need to get doc which has “technology” also When I give “java technology “ I need to get output as,I need to give boosting to doc which has “technology”. It should display in the below order.The output should come as doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc doc field name=urlhttp://www.askguru.com/field field name=titlehomepage/field field name=descriptionInformation about technology is stored in the web sites/field /doc doc field name=urlhttp://www.sun.com/field field name=titleinformation/field field name=descriptionjava plays a important role in computer industry for web users/field /doc Let me know how to achieve the same? Regards Bhaskar --- On Tue, 10/13/09, Nicholas Clark clark...@gmail.com wrote: From: Nicholas Clark clark...@gmail.com Subject: Re: Boosting of words To: solr-user@lucene.apache.org Date: Tuesday, October 13, 2009, 1:01 PM Bhaskar, Read this page, specifically how to query data. http://lucene.apache.org/solr/tutorial.html#Querying+Data It sounds like you are very new to Solr, so I would also suggest reading the wiki. http://wiki.apache.org/solr/ -Nick On Mon, Oct 12, 2009 at 10:02 PM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi Nicholas, Thanks for your input.Where exactly the query q=product:red color:red^10 should be used and defined?. Help me. Regards Bhaskar --- On Mon, 10/12/09, Nicholas Clark clark...@gmail.com wrote: From: Nicholas Clark clark...@gmail.com Subject: Re: Boosting of words To: solr-user@lucene.apache.org Date: Monday, October 12, 2009, 2:13 PM The easiest way to boost your query is to modify your query string. q=product:red color:red^10 In the above example, I have boosted the color field. If red is found in that field, it will get a boost of 10. If it is only found in the product field, then there will be no boost. Here's more information: http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms Once you're comfortable with that, I suggest that you look into using the DisMax request handler. It will allow you to easily search across multiple fields with custom boost values. http://wiki.apache.org/solr/DisMaxRequestHandler -Nick On Sun, Oct 11, 2009 at 12:26 PM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, I would like to know how can i give boosting to search input in Solr. Where exactly should i make the changes?. Regards Bhaskar
lazy loading error usin Solr Cell
Hi, I'm new to Solr and Java in general. I'd like to index rich documents with Solr Cell for my Intranet, so I downloaded the last Solr nightly build (solr-2009-10-14.tgz) and tried to follow the Solr Cell tutorial at wiki.apache.org/solr/ExtractingRequestHandler. I started Solr, copied a simple html file (prova.html) into the example directory, moved to that directory and from there tried: curl 'http://localhost:8983/solr/update/extract?literal.id=doc1commit=true' -F myfi...@prova.html But I received a lazy loading error. If someone could help me I copy here the output. Thanks in advance. Ste Output: html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prelazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: java.lang.IllegalStateException: Unable to create a XmlRootExtractor at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:135) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:58) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:75) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:96) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:85) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:76) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:173) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:165) at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:80) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) ...21 more Caused by: org.xml.sax.SAXNotSupportedException: http://javax.xml.XMLConstants/feature/secure-processing at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90) at org.apache.tika.detect.XmlRootExtractor.lt;initgt;(XmlRootExtractor.java:47) at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:133) ...31 more /pre pRequestURI=/solr/update/extract/ppismalla href= http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html
Re: lazy loading error usin Solr Cell
Hmmm, I just tried the first steps of the Solr Cell tutorial, and it worked fine for me (well, with the exception that there is no site directory... I went to docs instead - I'll fix that). Oh wait - I see your problem: at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90) You're path picked up gcj, which is not supported by Solr. You need to use a different JVM. If you didn't have anything else in mind, I'd recommend just going with what's most widely used - the latest released Sun JVM (currently 1.6_16) or OpenJDK. -Yonik http://www.lucidimagination.com On Wed, Oct 14, 2009 at 9:09 AM, Stefano Nannetti stefano.nanne...@gmail.com wrote: Hi, I'm new to Solr and Java in general. I'd like to index rich documents with Solr Cell for my Intranet, so I downloaded the last Solr nightly build (solr-2009-10-14.tgz) and tried to follow the Solr Cell tutorial at wiki.apache.org/solr/ExtractingRequestHandler. I started Solr, copied a simple html file (prova.html) into the example directory, moved to that directory and from there tried: curl 'http://localhost:8983/solr/update/extract?literal.id=doc1commit=true' -F myfi...@prova.html But I received a lazy loading error. If someone could help me I copy here the output. Thanks in advance. Ste Output: html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prelazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: java.lang.IllegalStateException: Unable to create a XmlRootExtractor at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:135) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:58) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:75) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:96) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:85) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:76) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:173) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:165) at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:80) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) ...21 more Caused by: org.xml.sax.SAXNotSupportedException: http://javax.xml.XMLConstants/feature/secure-processing at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90) at org.apache.tika.detect.XmlRootExtractor.lt;initgt;(XmlRootExtractor.java:47) at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:133) ...31 more /pre pRequestURI=/solr/update/extract/ppismalla href= http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html
Re: Boosting of words
Hi Clark, Thanks for your input. I have a query. I have my XML which contains the following: add doc field name=urlhttp://www.sun.com/field field name=titleinformation/field field name=descriptionjava plays a important role in computer industry for web users/field /doc doc field name=urlhttp://www.askguru.com/field field name=titlehomepage/field field name=descriptionInformation about technology is stored in the web sites/field /doc doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc /add When I give “java technology” as my input in Solr admin page ,At present I get output as doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc Now I need to get doc which has “technology” also When I give “java technology “ I need to get output as,I need to give boosting to doc which has “technology”. It should display in the below order.The output should come as doc field name=urlhttp://www.techie.com/field field name=titlepost queries/field field name=descriptionThis web site have more java technology related to web/field /doc doc field name=urlhttp://www.askguru.com/field field name=titlehomepage/field field name=descriptionInformation about technology is stored in the web sites/field /doc doc field name=urlhttp://www.sun.com/field field name=titleinformation/field field name=descriptionjava plays a important role in computer industry for web users/field /doc Let me know how to achieve the same? The query : java^1 OR technology^100 will do it. Results will be in this order: 1-)This web site have more java technology related to web 2-)Information about technology is stored in the web sites 3-)java plays a important role in computer industry for web users 1-) contains both java and technology 2-) contains only technology 3-) contains only java Is that what you want? Note that there is no quotes in the query above. And you can adjust boost factors (1 and 100) according to your needs. Use OR operator between terms. You set individual terms boost with ^ operator. hope this helps.
Re: Sorting on Multiple fields
Do we attempt to raise some sort of functional query to find the least amount of the requested price id's? This would seem to imply some playing around in the query handler to allow a function of this sort. Unless I am missing something, this information can always be obtained by post-processing the data obtained from search results. Isn't it? Do we look at this rather than some internal method to handle the query and sort actions as a matter of relevancy on a calculated field? If so the methods of determining the fields included in the calculated field are alluding me at the moment. So pointers are welcome. I really did not understand the question. Is it related to sorting of results? Does this ultimately involve the implementation of some sort of custom type and handler to do this sort of task. If the answer to my previous question is affirmative, then yes, you would need to implement custom sorting behavior. It can be achieved in multiple ways depending upon your requirement. From something as simple as function-queries to using the power of dynamic fields to writing a custom field-type to writing a custom implementation of Lucene's Similarity .. any of these can be a potential answer to custom sorting. Cheers Avlesh On Wed, Oct 14, 2009 at 5:53 PM, Neil Lunn neil.l...@trixan.com wrote: We have come up against a situation we are trying to resolve in our Solr implementation project. This revolves mostly around how to sort results from index data we are likely to store in multiple fields but at runtime we are likely to query on the result of which one is most relevant. A brief example: We have product catalog information in the index which will have multiple prices dependent on the user logged in and other scenarios. For simplification this will look something like this: price_id101 = 100.00 price_id102 = 105.00 price_id103 = 110.00 price_id104 = 95.00 (etc) What we are looking at is at runtime we want to know which one of several selected prices is the minimum (or maximum), but not all prices, just a select set of say 3 or 2 id's. The purpose we are looking at is to determine a sort order to the results. This as we would be aware approaching a SQL respository we would feed it some query logic to say find me the least amount of these set of id's, therefore the search approach here raises some questions. - Do we attempt to raise some sort of functional query to find the least amount of the requested price id's? This would seem to imply some playing around in the query handler to allow a function of this sort. - Do we look at this rather than some internal method to handle the query and sort actions as a matter of relevancy on a calculated field? If so the methods of determining the fields included in the calculated field are alluding me at the moment. So pointers are welcome. - Does this ultimately involve the implementation of some sort of custom type and handler to do this sort of task. I am open to any response as if someone has not come across a similar problem before and can suggest an approach we are willing to open up a patch branch or similar to do some work on the issue. Though if there are no suggestions this will likely move out of our current stream and into future development. Neil
Solr 1.4 release candidate
Folks, we've been in code freeze since Monday and a test release candidate was created yesterday, however it already had to be updated last night due to a serious bug found in Lucene. For now you can use the latest nightly build to get any recent changes like this: http://people.apache.org/builds/lucene/solr/nightly/ We'll probably release the final bits next week, so in the meantime, download the latest nightly build and give it a spin! -Yonik http://www.lucidimagination.com
Lucene's CachingTokenFilter in index analyzer chain
Hi all, I'm trying to add a CachingTokenFilter derived filter to the index analyzer chain for field text. I need to work with CachingTokenFilter because I need to look-ahead in the token stream (my filter is a stop phrases filter, where I look ahead in the index to see if a stop phrase is found and then remove it from the token stream). When I test the correctness of the chain using this query: /solr/analysis/field?analysis.fieldname=descriptionanalysis.fieldtype=textanalysis.fieldvalue=... everything seems ok (I see that the stop phrases are removed from the token stream). But when I index documents, the index is totally empty: all searches on text fields give no results at all! Here is my index chain, where StopPhrasesFilterFactory is my custom filter which derives from CachingTokenFilter: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=org.apache.solr.analysis.StopPhrasesFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Italian protected=protwords.txt/ analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=Italian protected=protwords.txt/ /analyzer /fieldType Is it wrong to use CachingTokenFilter in the index chain? Regards Enrico
Re: FACET_SORT_INDEX descending?
Thanks for the answer and the alternative idea.--Gerald Chris Hostetter wrote: : Reverse alphabetical ordering. The option index provides alphabetical : ordering. be careful: index doesn't mean alphabetical -- it means the natural ordering of terms as they exist in the index. for non ascii characters this is not neccessarily something that could be considered alphabetical (or sensical in terms of the locale). The short answer is: no, there is no way to get reverse index order at the moment. : I have a year_facet field, that I would like to display in reverse order (most : recent years first). Perhaps there is some other way to accomplish this. the simplest way is to encode the year in some format thta will cause it to naturally sort in the order you want - so instead of indexing 1976 and 2007 you could index 8024:1976 and 7993:2007 and then only display the part that comes after the : -Hoss
POST queries to Solr instead of HTTP Gets with query string parameters
Is a way to POST queries to Solr instead of supplying query string parameters ? Some of our queries may hit up against URL size limits. If so, can someone provide an example ? Thanks in advance
Re: hadoop configuarions for SOLR-1301 patch
On Wed, Oct 14, 2009 at 6:15 PM, Pravin Karne pravin_ka...@persistent.co.in wrote: Hi, I am using SOLR-1301 path. I have build the solr with given patch. But I am not able to configure Hadoop for above war. I want to run solr(create index) with 3 nodes (1+2) cluster. How to do the Hadoop configurations for above patch? How to set master and slave? Pravin, questions on specific patches are best asked on the Jira issue. -- Regards, Shalin Shekhar Mangar.
Re: One more happy Solr user ...
On Wed, Oct 14, 2009 at 2:16 PM, Avlesh Singh avl...@gmail.com wrote: I am pleased to announce the latest release of a popular Indian local search portal called http://www.burrp.com http://mumbai.burrp.com. In prior versions of this web application, search was Lucene driven and we had to write our own implementation of search facets amongst other painful tasks. I can't be happier to inform everyone on this list that search/suggest features on the burrp! site are now powered by Solr. Please use it and let me know if we can make it better. This is great! Can you please add burrp to http://wiki.apache.org/solr/PublicServers? -- Regards, Shalin Shekhar Mangar.
Re: POST queries to Solr instead of HTTP Gets with query string parameters
On Wed, Oct 14, 2009 at 8:06 PM, Glock, Thomas thomas.gl...@pfizer.comwrote: Is a way to POST queries to Solr instead of supplying query string parameters ? All Solr requests are normal HTTP requests. Most HTTP client libraries in various languages have a way to select POST instead of GET. If you are using Solrj client, then you can use QueryRequest#setMethod(SolrRequest.METHOD.POST) -- Regards, Shalin Shekhar Mangar.
Re: lazy loading error usin Solr Cell
I removed the existing JVM from my Ubuntu 9.04 and installed OpenJDK. Now it's working fine. Thanks, now I can go deeper in the use of Solr!! Ste Yonik Seeley ha scritto: Hmmm, I just tried the first steps of the Solr Cell tutorial, and it worked fine for me (well, with the exception that there is no site directory... I went to docs instead - I'll fix that). Oh wait - I see your problem: at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90) You're path picked up gcj, which is not supported by Solr. You need to use a different JVM. If you didn't have anything else in mind, I'd recommend just going with what's most widely used - the latest released Sun JVM (currently 1.6_16) or OpenJDK. -Yonik http://www.lucidimagination.com On Wed, Oct 14, 2009 at 9:09 AM, Stefano Nannetti stefano.nanne...@gmail.com wrote: Hi, I'm new to Solr and Java in general. I'd like to index rich documents with Solr Cell for my Intranet, so I downloaded the last Solr nightly build (solr-2009-10-14.tgz) and tried to follow the Solr Cell tutorial at wiki.apache.org/solr/ExtractingRequestHandler. I started Solr, copied a simple html file (prova.html) into the example directory, moved to that directory and from there tried: curl 'http://localhost:8983/solr/update/extract?literal.id=doc1commit=true' -F myfi...@prova.html But I received a lazy loading error. If someone could help me I copy here the output. Thanks in advance. Ste Output: html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 500 /title /head bodyh2HTTP ERROR: 500/h2prelazy loading error org.apache.solr.common.SolrException: lazy loading error at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:249) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:231) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Caused by: java.lang.IllegalStateException: Unable to create a XmlRootExtractor at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:135) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:58) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:75) at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:90) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:96) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:85) at org.apache.tika.config.TikaConfig.lt;initgt;(TikaConfig.java:76) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:173) at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:165) at org.apache.solr.handler.extraction.ExtractingRequestHandler.inform(ExtractingRequestHandler.java:80) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:244) ...21 more Caused by: org.xml.sax.SAXNotSupportedException: http://javax.xml.XMLConstants/feature/secure-processing at gnu.xml.stream.SAXParserFactory.setFeature(libgcj.so.90) at org.apache.tika.detect.XmlRootExtractor.lt;initgt;(XmlRootExtractor.java:47) at org.apache.tika.mime.MimeTypes.lt;initgt;(MimeTypes.java:133) ...31 more /pre pRequestURI=/solr/update/extract/ppismalla href= http://jetty.mortbay.org/;Powered by Jetty:///a/small/i/pbr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html
RE: POST queries to Solr instead of HTTP Gets with query string parameters
Solrj1.4 supports QueryRequest#setMethod(SolrRequest.METHOD.POST) but Solrj1.3 does not. -Ankit From: Shalin Shekhar Mangar [shalinman...@gmail.com] Sent: Wednesday, October 14, 2009 11:08 AM To: solr-user@lucene.apache.org Subject: Re: POST queries to Solr instead of HTTP Gets with query string parameters On Wed, Oct 14, 2009 at 8:06 PM, Glock, Thomas thomas.gl...@pfizer.comwrote: Is a way to POST queries to Solr instead of supplying query string parameters ? All Solr requests are normal HTTP requests. Most HTTP client libraries in various languages have a way to select POST instead of GET. If you are using Solrj client, then you can use QueryRequest#setMethod(SolrRequest.METHOD.POST) -- Regards, Shalin Shekhar Mangar.
Re: Solr 1.4 release candidate
maybe im just not familiar with the way the version numbers works in trunk but when i build the latest nightly the jars have names like *-1.5-dev.jar, is that normal? On Wed, Oct 14, 2009 at 7:01 AM, Yonik Seeley yo...@lucidimagination.com wrote: Folks, we've been in code freeze since Monday and a test release candidate was created yesterday, however it already had to be updated last night due to a serious bug found in Lucene. For now you can use the latest nightly build to get any recent changes like this: http://people.apache.org/builds/lucene/solr/nightly/ We'll probably release the final bits next week, so in the meantime, download the latest nightly build and give it a spin! -Yonik http://www.lucidimagination.com
Solr/Lucene keeps eating up memory while idling
I'm curious why this is occurring and whether i can prevent it. This is my scenario: Locally I have an idle running solr 1.3 service using lucene 2.4.1 which has an index of ~330K documents containing ~10 fields each(total size ~12GB). Currently I've turned off all caching, lazy field loading, however i do have facet fields set for some request handlers. What i'm seeing is heap space usage increasing by ~1.2MB per 2 sec (by java.lang.String objects). I'm assuming they're being used by lucene but i may be wrong about that, since i have no actual data to confirm it. Why exactly is this happening, considering no requests are being serviced? Shouldn't the memory usage stabilise with a certain set of information and only be affected on requests? Additionally there is a full GC every half hour, which seems very unreasonable on a machine that isn't actually being used as a service. I really hope there's just a certain setting that i've overlooked, or a concept i'm not understanding because otherwise this behaviour seems very unreasonable... Thanks beforehand, Tony -- View this message in context: http://www.nabble.com/Solr-Lucene-keeps-eating-up-memory-while-idling-tp25894357p25894357.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 release candidate
On Wed, Oct 14, 2009 at 12:04 PM, Joe Calderon calderon@gmail.com wrote: maybe im just not familiar with the way the version numbers works in trunk but when i build the latest nightly the jars have names like *-1.5-dev.jar, is that normal? Looks like Grant switched the version number a little early - nothing to worry about though. When we build official releases, we explicitly specify the version number anyway. -Yonik http://www.lucidimagination.com On Wed, Oct 14, 2009 at 7:01 AM, Yonik Seeley yo...@lucidimagination.com wrote: Folks, we've been in code freeze since Monday and a test release candidate was created yesterday, however it already had to be updated last night due to a serious bug found in Lucene. For now you can use the latest nightly build to get any recent changes like this: http://people.apache.org/builds/lucene/solr/nightly/ We'll probably release the final bits next week, so in the meantime, download the latest nightly build and give it a spin! -Yonik http://www.lucidimagination.com
Re: Letters with accent in query
Correct. Apparently, Firefox is the only browser that translate it é to %E9. On Wed, Oct 14, 2009 at 3:06 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I'm querying with an accented keyword such as café but the debug info : shows that it is only searching for caf. I'm using the ISOLatin1Accent ... : http://localhost:8983/solr/select?q=%E9debugQuery=true : : Params return shows this: : lst name=params : str name=q/ ...that's a pretty good tip off that you aren't URL encoding the character they way your servlet container is expecting it. I suspect what you really want is... http://localhost:8983/solr/select?q=%C3%A9debugQuery=true -Hoss
Re: POST queries to Solr instead of HTTP Gets with query string parameters
On Wed, Oct 14, 2009 at 8:54 PM, Ankit Bhatnagar abhatna...@vantage.comwrote: Solrj1.4 supports QueryRequest#setMethod(SolrRequest.METHOD.POST) but Solrj1.3 does not. I just checked the 1.3 release. It most definitely exists in 1.3 -- Regards, Shalin Shekhar Mangar.
Adding callback url to data import handler...Is this possible?
Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill
Re: capitalization and delimiters
On Mon, Oct 12, 2009 at 9:09 PM, Audrey Foo au...@hotmail.com wrote: In my search docs, I have content such as 'powershot' and 'powerShot'. I would expect 'powerShot' would be searched as 'power', 'shot' and 'powershot', so that results for all these are returned. Instead, only results for 'power' and 'shot' are returned. Any suggestions? In the schema, index analyzer:filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=0/filter class=solr.LowerCaseFilterFactory/ In the schema, query analyzerfilter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/filter class=solr.LowerCaseFilterFactory/ I find your index-time and query-time configuration very strange. Assuming that you also have a lowercase filter, it seems that a token powerShot will not be split and indexed as powershot. Then during query, both power and shot will match nothing. I suggest you start with the configuration given in the example schema. Else, it'd be easier for us if you can help us understand the reasons behind changing these parameters. -- Regards, Shalin Shekhar Mangar.
Re: Adding callback url to data import handler...Is this possible?
Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.comwrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill
Re: http replication transfer speed
Queries on the slave could be one reason. However, I see that in the perf test on the wiki also shows the same transfer speed (with rsync too!). Not sure whats up. 2009/10/12 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Did you try w/o firing queries on the slave? On Sun, Oct 11, 2009 at 6:05 AM, Mark Miller markrmil...@gmail.com wrote: On a drive that can do 40+ that's getting query load might have it's writes knocked down to that? - Mark http://www.lucidimagination.com (mobile) On Oct 10, 2009, at 6:41 PM, Mark Miller markrmil...@gmail.com wrote: Anyone know why you would see a transfer speed of just 10-20MB over a gigbit network connection? Even with standard drives, I would expect to at least see around 40MB. Has anyone seen over 10-20 using replication? Any ideas on what the bottleneck should be? I think even a standard drive can do writes of a bit of 40MB/s, and certainly reads over that. Thoughts? -- - Mark http://www.lucidimagination.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com -- Regards, Shalin Shekhar Mangar.
Re: Adding callback url to data import handler...Is this possible?
Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.comwrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill
RE: Lucene Merge Threads
Does anyone know the correct syntax to specify the maximum number of threads for the ConcurrentMergeScheduler? Also, is there any concrete way to know when the merge is actually complete (aside from profiling the machine)? Thanks, Gio. -Original Message- From: Giovanni Fernandez-Kincade Sent: Tuesday, October 13, 2009 7:59 PM To: Giovanni Fernandez-Kincade; 'solr-user@lucene.apache.org'; 'noble.p...@gmail.com' Subject: RE: Lucene Merge Threads I'm still getting the error after getting the latest from trunk and building it. This is what I added to the solrconfig.xml: mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxThreadCount5/int /mergeScheduler Any other ideas? Thanks, Gio. SEVERE: org.apache.solr.common.SolrException: Error loading class ' 5 ' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:81) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:178) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:172) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:400) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: 5 at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.$$YJP$$doPrivileged(Native Method) at java.security.AccessController.doPrivileged(Unknown Source) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.net.FactoryURLClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) at java.lang.Class.$$YJP$$forName0(Native Method) at java.lang.Class.forName0(Unknown Source) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294) ... 28 more -Original Message- From: Giovanni Fernandez-Kincade Sent: Tuesday, October 13, 2009 10:50 AM To: solr-user@lucene.apache.org; 'noble.p...@gmail.com' Subject: RE: Lucene Merge Threads Here's the version information from the admin page: Solr Specification Version: 1.3.0.2009.07.28.18.51.06 Solr Implementation Version: 1.4-dev ${svnversion} - gkincade - 2009-07-28 18:51:06 Lucene Specification Version: 2.9-dev Lucene Implementation Version: 2.9-dev 794238 - 2009-07-15 18:05:08 -Original Message- From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf
Re: Adding callback url to data import handler...Is this possible?
Hmmm ... I think this is a valid use case and it might be a good idea to support it in someway. I will post this thread on the dev-mailing list to seek opinion. Cheers Avlesh On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.comwrote: Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com wrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill
how to get field contents out of Document object
hello *, sorry if this seems like a dumb question, im still fairly new to working with lucene/solr internals. given a Document object, what is the proper way to fetch an integer value for a field called num_in_stock, it is both indexed and stored thx much --joe
Opaque replication failures
Hi, I have a multicore Solr 1.4 setup. core_master is a 3.7G master for replication, and core_slave is a 500 byte slave pointing to the master. I'm using the example replication configuration from solrconfig.xml, with ${enable.master} and ${enable.slave} properties so that the master and slave can use the same solrconfig.xml. When I attempt to replicate (every 60 seconds or by pressing the button on the slave replication admin page), it doesn't work. Unfortunately, neither the admin page nor the REST API details command show anything useful, and the logs show no errors. How can I get insight into what is causing the failure? I assume it's some configuration problem but don't know where to start. Thanks in advance for any help! Config files are below. Michael Here is my solr.xml: ?xml version='1.0' encoding='UTF-8'?solr sharedLib=lib persistent=true cores adminPath=/admin/cores shareSchema=true core name=core_master instanceDir=. dataDir=/home/search/solr/data/5 property name=enable.master value=true / /core core name=core_slave instanceDir=. dataDir=/home/search/solr/data/1 property name=enable.slave value=true / /core /cores /solr And here's the relevant chunk of my solrconfig.xml: requestHandler name=/replication class=solr.ReplicationHandler lst name=master str name=enable${enable.master:false}/str str name=replicateAftercommit/str /lst lst name=slave str name=enable${enable.slave:false}/str str name=masterUrlhttp://localhost:31000/solr/core_master/replication/str str name=pollInterval00:00:60/str /lst /requestHandler Here's what the details command on the slave has to say -- nothing explanatory that I can see. Is the isReplicating=false worrying? lst name=details str name=indexSize589 bytes/str str name=indexPath/home/search/solr/data/1/index/str arr name=commits/ str name=isMasterfalse/str str name=isSlavetrue/str long name=indexVersion1254772638413/long long name=generation2/long lst name=slave lst name=masterDetails str name=indexSize3.75 GB/str str name=indexPath/home/search/solr/data/5/index/str arr name=commits/ str name=isMastertrue/str str name=isSlavefalse/str long name=indexVersion1254772639291/long long name=generation156/long /lst str name=masterUrlhttp://localhost:31000/solr/core_master/replication/str str name=pollInterval00:00:60/str str name=indexReplicatedAtWed Oct 14 14:25:22 EDT 2009/str arr name=indexReplicatedAtList strWed Oct 14 14:25:22 EDT 2009/str strWed Oct 14 14:25:22 EDT 2009/str strWed Oct 14 14:25:21 EDT 2009/str strWed Oct 14 14:24:27 EDT 2009/str (etc) /arr arr name=replicationFailedAtList strWed Oct 14 14:25:22 EDT 2009/str strWed Oct 14 14:25:22 EDT 2009/str strWed Oct 14 14:25:21 EDT 2009/str strWed Oct 14 14:24:27 EDT 2009/str (etc) /arr str name=timesIndexReplicated1481/str str name=lastCycleBytesDownloaded0/str str name=timesFailed1481/str str name=replicationFailedAtWed Oct 14 14:25:22 EDT 2009/str str name=previousCycleTimeInSeconds0/str str name=isReplicatingfalse/str /lst /lst
(Solr 1.4 dev) Why solr.common.* packages are in solrj-*.jar ?
I've downloaded solr-2009-10-12.zip and tried to compile my TokenizerFactory impelmentation against this version of Solr. Compilation failed. One of the causes is that the compiler couldn't find org.apache.solr.common.ReosourceLoader. I discovered this class in apache-solr-solrj-nightly.jar. I didn't add this classpath at the first time because this jar sounds like the jar for building Java client. I needed ResourceLoader to write my TokenizerFactory. I wonder why the common classes are in the solrj JAR? Is the solrj JAR not just for the clients? BTW, is there some sort of transition guide for Solr 1.4? I see there are changes how classes are divided into JARs like above, and there are some incompatible API changes. It'll be greate if such information can be part of CHANGES.txt. -kuro
Re: solr IOException
Hi Yonik, I tried the POST method in my ajax request in javascript. It does not work. I still get the same error message. Elaine On Tue, Oct 13, 2009 at 5:12 PM, Yonik Seeley ysee...@gmail.com wrote: Jetty has a maximum request size for HTTP-GET... can you use POST instead? -Yonik http://www.lucidimagination.com On Tue, Oct 13, 2009 at 4:33 PM, Elaine Li elaine.bing...@gmail.com wrote: Hi, In my query, i have around 80 boolean clauses. I don't know if it is because the number of boolean clauses are too big, so I got into this problem. My solr config file actually says the max number to be 1024. Can any one help? _header=[1515632954,1939520811,m=3653,g=4096,p=4096,c=4096]={sauidp=U601264301252517927557; CoreID6=01421694673512525179481ci=90130510,90175093,90175119,90175106; DEFAULTFORMAT=specific; BUGLIST=5%3A11%3A12%3A36%3A39%3A63%3A77%3A80%3A100%3A106%3A109%3A111%3A114%3A119%3A122%3A125%3A127%3A138%3A142%3A152%3A153%3A154%3A155%3A156%3A157%3A158%3A169%3A178%3A180%3A182%3A183%3A186%3A188%3A190%3A194%3A198%3A199%3A200%3A202%3A206%3A209%3A211%3A212%3A213%3A217%3A219%3A220%3A233%3A236%3A242%3A243%3A249%3A255%3}{} _buffer=[1515632954,1939520811,m=3653,g=4096,p=4096,c=4096]={sauidp=U601264301252517927557; CoreID6=01421694673512525179481ci=90130510,90175093,90175119,90175106; DEFAULTFORMAT=specific; BUGLIST=5%3A11%3A12%3A36%3A39%3A63%3A77%3A80%3A100%3A106%3A109%3A111%3A114%3A119%3A122%3A125%3A127%3A138%3A142%3A152%3A153%3A154%3A155%3A156%3A157%3A158%3A169%3A178%3A180%3A182%3A183%3A186%3A188%3A190%3A194%3A198%3A199%3A200%3A202%3A206%3A209%3A211%3A212%3A213%3A217%3A219%3A220%3A233%3A236%3A242%3A243%3A249%3A255%3}{} 2009-10-13 16:20:28.800::WARN: handle failed java.io.IOException: FULL at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:274) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Thanks. Elaine
RE: Lucene Merge Threads
In case anyone is having the same problem, I finally got this working, using the nightly build link that Yonik sent around: http://people.apache.org/builds/lucene/solr/nightly/ Thanks, Gio. -Original Message- From: Giovanni Fernandez-Kincade Sent: Wednesday, October 14, 2009 2:10 PM To: Giovanni Fernandez-Kincade; solr-user@lucene.apache.org; noble.p...@gmail.com Subject: RE: Lucene Merge Threads Does anyone know the correct syntax to specify the maximum number of threads for the ConcurrentMergeScheduler? Also, is there any concrete way to know when the merge is actually complete (aside from profiling the machine)? Thanks, Gio. -Original Message- From: Giovanni Fernandez-Kincade Sent: Tuesday, October 13, 2009 7:59 PM To: Giovanni Fernandez-Kincade; 'solr-user@lucene.apache.org'; 'noble.p...@gmail.com' Subject: RE: Lucene Merge Threads I'm still getting the error after getting the latest from trunk and building it. This is what I added to the solrconfig.xml: mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxThreadCount5/int /mergeScheduler Any other ideas? Thanks, Gio. SEVERE: org.apache.solr.common.SolrException: Error loading class ' 5 ' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:310) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:325) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:81) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:178) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:123) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:172) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:400) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:168) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:215) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:188) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:172) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:117) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:108) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:174) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:875) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:665) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:528) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:81) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:689) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.ClassNotFoundException: 5 at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.$$YJP$$doPrivileged(Native Method) at java.security.AccessController.doPrivileged(Unknown Source) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.net.FactoryURLClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClassInternal(Unknown Source) at java.lang.Class.$$YJP$$forName0(Native Method) at java.lang.Class.forName0(Unknown Source) at java.lang.Class.forName(Unknown Source) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:294) ... 28 more -Original Message- From: Giovanni Fernandez-Kincade Sent: Tuesday, October 13, 2009 10:50 AM To: solr-user@lucene.apache.org; 'noble.p...@gmail.com' Subject: RE:
Re: how to get field contents out of Document object
On Wed, Oct 14, 2009 at 2:24 PM, Joe Calderon calderon@gmail.com wrote: hello *, sorry if this seems like a dumb question, im still fairly new to working with lucene/solr internals. given a Document object, what is the proper way to fetch an integer value for a field called num_in_stock, it is both indexed and stored FieldType controls translation back and forth between Fields and Strings/Objects. See FieldType.toObject() or FieldType.storedToReadable() -Yonik http://www.lucidimagination.com
Re: DataImportHandler problem: Feeding the XPathEntityProcessor with the FieldReaderDataSource
See SOLR-1511 2009/10/7 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com hi Lance. db.blob is the correct field name so that is fine. you can probbaly open an issue and provide the testcase as a patch. That can help us track this better On Wed, Oct 7, 2009 at 12:45 AM, Lance Norskog goks...@gmail.com wrote: A side note that might help: if I change the dataField from 'db.blob' to 'blob', this DIH stack emits no documents. On 10/5/09, Lance Norskog goks...@gmail.com wrote: I've added a unit test for the problem down below. It feeds document field data into the XPathEntityProcessor via the FieldReaderDataSource, and the XPath EP does not emit unpacked fields. Running this under the debugger, I can see the supplied StringReader, with the XML string, being piped into the XPath EP. But somehow the XPath EP does not pick it apart the right way. Here is the DIH configuration file separately. dataConfig dataSource type='FieldReaderDataSource' name='fc' / dataSource type='MockDataSource' name='db' / document entity name='db' query='select * from x' dataSource='db' field column='dbid' / field column='tag' / field column='blob' / entity name='unpack' dataSource='fc' processor='XPathEntityProcessor' forEach='/names' dataField='db.blob' field column='name' xpath='/names/name' / /entity /entity /document /dataConfig Any ideas? --- package org.apache.solr.handler.dataimport; import static org.apache.solr.handler.dataimport.AbstractDataImportHandlerTest.createMap; import junit.framework.TestCase; import java.util.ArrayList; import java.util.HashMap; import java.util.List; import java.util.Map; import org.apache.solr.common.SolrInputDocument; import org.apache.solr.common.SolrInputField; import org.apache.solr.handler.dataimport.TestDocBuilder.SolrWriterImpl; import org.junit.Test; /* * Demonstrate problem feeding XPathEntity from a FieldReaderDatasource */ public class TestFieldReaderXPath extends TestCase { static final String KISSINGER = namesnameHenry/name/names; static final String[][][] DBDOCS = { {{dbid, 1}, {blob, KISSINGER}}, }; /* * Receive a row from SQL and fetch a row from Solr - no value matching * stolen from TestDocBuilder * */ @Test public void testSolrEmbedded() throws Exception { try { DataImporter di = new DataImporter(); di.loadDataConfig(dih_config_FR_into_XP); DataImporter.RequestParams rp = new DataImporter.RequestParams(); rp.command = full-import; rp.requestParams = new HashMapString, Object(); DataConfig cfg = di.getConfig(); DataConfig.Entity entity = cfg.document.entities.get(0); ListMapString,Object l = new ArrayListMapString,Object(); addDBDocuments(l); MockDataSource.setIterator(select * from x, l.iterator()); entity.dataSrc = new MockDataSource(); entity.isDocRoot = true; SolrWriterImpl swi = new SolrWriterImpl(); di.runCmd(rp, swi); assertEquals(1, swi.docs.size()); SolrInputDocument doc = swi.docs.get(0); SolrInputField field; field = doc.getField(dbid); assertEquals(field.getValue().toString(), 1); field = doc.getField(blob); assertEquals(field.getValue().toString(), KISSINGER); field = doc.getField(name); assertNotNull(field); assertEquals(field.getValue().toString(), Henry); } finally { MockDataSource.clearCache(); } } private void addDBDocuments(ListMapString, Object l) { for(String[][] dbdoc: DBDOCS) { l.add(createMap(dbdoc[0][0], dbdoc[0][1], dbdoc[1][0], dbdoc[1][1])); } } String dih_config_FR_into_XP = dataConfig\r\n + dataSource type='FieldReaderDataSource' name='fc' /\r\n + dataSource type='MockDataSource' name='db' /\r\n + document\r\n + entity name='db' query='select * from x' dataSource='db'\r\n + field column='dbid' /\r\n + field column='tag' /\r\n + field column='blob' /\r\n + entity name='unpack' dataSource='fc'
Re: Lucene Merge Threads
Gio, Also, is there any concrete way to know when the merge is actually complete (aside from profiling the machine)? This would be a great feature to add to the Solr web UI. The ability to monitor merges in progress and log how much time each used. -J
'Down' boosting shorter docs
Our index has some items in it which basically contain a title and a single word body. If the user searches for a word in the title (especially if title is of itself only oen word) then that doc will get scored quite highly, despite the fact that, in this case, it's not really relevant. I've tried something like qf=title^2.0 content^0.5 bf=num_pages but that disproportionally boosts long documents to the detriment of relevancy bf=product(num_pages,0.05) has no effect but bf=product(num_pages,0.06) has a bunch of long documents which don't seem to return any highlighted fields plus the short document with only the query in the title which is progress in that it's almost exactly the opposite of what I want. Any suggestions? Am I going to need to reindex and add the length in bytes or characters of the document? Simon
Re: advice on failover setup
Don, Sorry, yes the features are under development and also hopefully the wikis as well. :) When they become available, well I can say personally I need the Katta integration working in the next few months. Jason Venner got it working over at his company. It might be good to describe your use case to see what is a good fit for you. -J On Wed, Oct 14, 2009 at 4:20 PM, Don Clore don.cl...@5to1.com wrote: I'm sorry, for clarification, is it the *wiki# pages that are under development, or the features (I'm guessing the latter)? If the latter (ZooKeeperIntegration and KattaIntegration are not available yet), is there any sort of guess as to when these features might become available? thanks, Don On Wed, Oct 14, 2009 at 2:13 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Dan, For automatic failover there are 2 wiki pages that may be helpful, however both are in the development stage. http://wiki.apache.org/solr/ZooKeeperIntegration http://wiki.apache.org/solr/KattaIntegration -J On Wed, Oct 14, 2009 at 12:48 PM, Katz, Dan dan.k...@fepoc.com wrote: Hi folks, I'm tasked with designing a failover architecture for our new Solr server. I've read the Replication section in the docs (http://wiki.apache.org/solr/SolrReplication) and I need some clarification/insight. My questions: 1. Is there such a thing as master/master replication? 2. If we have one master and one slave server, and the master goes down, does the slave automatically become the master? What's the process for brining the server back up and getting the two back in sync? Is it a manual process always? 3. We're running Solr inside Tomcat on Windows currently. Any suggestions for a load balancer that will automatically switch to the alternate server if one goes down? Thanks in advance, -- Dan Katz Lead Web Developer FEP Operations Center(r) 202.203.2572 (Direct) dan.k...@fepoc.com Unauthorized interception of this communication could be a violation of Federal and State Law. This communication and any files transmitted with it are confidential and may contain protected health information. This communication is solely for the use of the person or entity to whom it was addressed. If you are not the intended recipient, any use, distribution, printing or acting in reliance on the contents of this message is strictly prohibited. If you have received this message in error, please notify the sender and destroy any and all copies. Thank you. ***
Re: Right place to put my Tokenizer jars
You're better off putting extensions like these in solr-home/lib and letting Solr load them rather than putting them in a container classpath like Jetty's lib/ext. As you've seen, conflicts occur because of class loader visibility. Erik On Oct 14, 2009, at 7:28 PM, Teruhiko Kurosaka wrote: I have my custom Tokenizer and TokenizerFactory in a jar, and I've been putting it in example/lib/ext. and it's been working fine with Solr 1.3. This jar uses SLF4J as a logging API, and I had the SLF4J jars in the same place, example/lib/ext. Because Solr 1.4 uses SLF4J too and have it builtin, I thought I wouldn't need to have another set of the same jars, I removed them from example/lib/ext. Then, when my TokenizerFactory is run, I've got a NoClassDefFoundError error. This error can be fixed by putting another set of SLF4J jars in example/lib/ext, but I don't understand why. After all, my jar can access Lucene and Solr APIs whose jars resides elsewhere than example/lib/ext. Why only SLF4J jars must be duplicated and exist in example/lib/ext? Why SLF4J jars are special? Is this somethng to do with the fact that SLF4J jars are needed at the static initialization time? What is the correct place to put my Tokenizer(Filter) jars? -kuro
Re: Right place to put my Tokenizer jars
Hi Kurosaka-san, I think you got a kind of class loader problem. I usually put my plugin jars under the lib directory of solr home. http://wiki.apache.org/solr/SolrPlugins#How_to_Load_Plugins Koji Teruhiko Kurosaka wrote: I have my custom Tokenizer and TokenizerFactory in a jar, and I've been putting it in example/lib/ext. and it's been working fine with Solr 1.3. This jar uses SLF4J as a logging API, and I had the SLF4J jars in the same place, example/lib/ext. Because Solr 1.4 uses SLF4J too and have it builtin, I thought I wouldn't need to have another set of the same jars, I removed them from example/lib/ext. Then, when my TokenizerFactory is run, I've got a NoClassDefFoundError error. This error can be fixed by putting another set of SLF4J jars in example/lib/ext, but I don't understand why. After all, my jar can access Lucene and Solr APIs whose jars resides elsewhere than example/lib/ext. Why only SLF4J jars must be duplicated and exist in example/lib/ext? Why SLF4J jars are special? Is this somethng to do with the fact that SLF4J jars are needed at the static initialization time? What is the correct place to put my Tokenizer(Filter) jars? -kuro -- http://www.rondhuit.com/en/
RE: Right place to put my Tokenizer jars
Actually, I meant to say I have my Tokenizer jars in solr/lib. I have the jars that my Tokenizer jars depend in lib/ext, as I wanted them to be loaded only once per container due to their internal description. Bad idea? -kuro From: Teruhiko Kurosaka Sent: Wednesday, October 14, 2009 4:28 PM To: solr-user@lucene.apache.org Subject: Right place to put my Tokenizer jars I have my custom Tokenizer and TokenizerFactory in a jar, and I've been putting it in example/lib/ext. and it's been working fine with Solr 1.3. This jar uses SLF4J as a logging API, and I had the SLF4J jars in the same place, example/lib/ext. Because Solr 1.4 uses SLF4J too and have it builtin, I thought I wouldn't need to have another set of the same jars, I removed them from example/lib/ext. Then, when my TokenizerFactory is run, I've got a NoClassDefFoundError error. This error can be fixed by putting another set of SLF4J jars in example/lib/ext, but I don't understand why. After all, my jar can access Lucene and Solr APIs whose jars resides elsewhere than example/lib/ext. Why only SLF4J jars must be duplicated and exist in example/lib/ext? Why SLF4J jars are special? Is this somethng to do with the fact that SLF4J jars are needed at the static initialization time? What is the correct place to put my Tokenizer(Filter) jars? -kuro
Re: 'Down' boosting shorter docs
A multiplicative boost may work better than one added in: http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html -Yonik http://www.lucidimagination.com On Wed, Oct 14, 2009 at 7:21 PM, Simon Wistow si...@thegestalt.org wrote: Our index has some items in it which basically contain a title and a single word body. If the user searches for a word in the title (especially if title is of itself only oen word) then that doc will get scored quite highly, despite the fact that, in this case, it's not really relevant. I've tried something like qf=title^2.0 content^0.5 bf=num_pages but that disproportionally boosts long documents to the detriment of relevancy bf=product(num_pages,0.05) has no effect but bf=product(num_pages,0.06) has a bunch of long documents which don't seem to return any highlighted fields plus the short document with only the query in the title which is progress in that it's almost exactly the opposite of what I want. Any suggestions? Am I going to need to reindex and add the length in bytes or characters of the document? Simon
Re: Adding callback url to data import handler...Is this possible?
I can understand the concern that you do not wish to write Java code . But a callback url is a very specific requirement. We plan to extend javascript support to the EventListener callback . Will it help? On Wed, Oct 14, 2009 at 11:47 PM, Avlesh Singh avl...@gmail.com wrote: Hmmm ... I think this is a valid use case and it might be a good idea to support it in someway. I will post this thread on the dev-mailing list to seek opinion. Cheers Avlesh On Wed, Oct 14, 2009 at 11:39 PM, William Pierce evalsi...@hotmail.comwrote: Thanks, Avlesh. Yes, I did take a look at the event listeners. As I mentioned this would require us to write Java code. Our app(s) are entirely windows/asp.net/C# so while we could add Java in a pinch, we'd prefer to stick to using SOLR using its convenient REST-style interfaces which makes no demand on our app environment. Thanks again for your suggestion! Cheers, Bill -- From: Avlesh Singh avl...@gmail.com Sent: Wednesday, October 14, 2009 10:59 AM To: solr-user@lucene.apache.org Subject: Re: Adding callback url to data import handler...Is this possible? Had a look at EventListeners in DIH?http://wiki.apache.org/solr/DataImportHandler#EventListeners Cheers Avlesh On Wed, Oct 14, 2009 at 11:21 PM, William Pierce evalsi...@hotmail.com wrote: Folks: I am pretty happy with DIH -- it seems to work very well for my situation. Thanks!!! The one issue I see has to do with the fact that I need to keep polling url/dataimport to check if the data import completed successfully. I need to know when/if the import is completed (successfully or otherwise) so that I can update appropriate structures in our app. What I would like is something like what Google Checkout API offers -- a callback URL. That is, I should be able to pass along a URL to DIH. Once it has completed the import, it can invoke the provided URL. This provides a callback mechanism for those of us who don't have the liberty to change SOLR source code. We can then do the needful upon receiving this callback. If this functionality is already provided in some form/fashion, I'd love to know. All in all, great functionality that has significantly helped me out! Cheers, - Bill -- - Noble Paul | Principal Engineer| AOL | http://aol.com
storing multiple type of records (Parent - Child Relationship)
Hi All , I have a specific requirement of storing multiple type of records. but dont know how to do it . First let me tell the requirement. I have a table called user table and a user can be mapped to multiple projects. User table details are User Name , User Id , address , and other details . I have stored them in solr but now the mapping between user and project has to be stored . Project table have (project name , location , business unit ,etc) I can still go ahead and store user has single record with project details as indvidual fields , like UserId:user1 UserAddress: india ProjectNames: project1,project2 ProjectBU: retail , finance ProjectLocation:UK,US Here i will search in fields like UserId , ProjectBU ,ProjectLocation and have made UserAddress, ProjectLocation as facets but is there a way where we can store user records separately and project records separately . and jut give the link in solr ?? like mentioned below and still making it searchable and facetable ?? User Details = UserId:user1 UserAddress: india ProjectId:1,2 Project Details == ProjectId:1 ProjectNames: project1 ProjectBU: retail ProjectLocation:UK ProjectId:2 ProjectNames: project2 ProjectBU:finance ProjectLocation:US -- View this message in context: http://www.nabble.com/storing-multiple-type-of-records-%28Parent---Child-Relationship%29-tp25902894p25902894.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: (Solr 1.4 dev) Why solr.common.* packages are in solrj-*.jar ?
I wonder why the common classes are in the solrj JAR? Is the solrj JAR not just for the clients? the solr server uses solrj for distributed search. This makes solrj the general way to talk to solr (even from within solr)
Re: Error when indexing XML files
Hi, Please find the schema file attached. Please let me know what I am doing wrong. Regards Chaitali --- On Wed, 10/14/09, Fergus McMenemie fer...@twig.me.uk wrote: From: Fergus McMenemie fer...@twig.me.uk Subject: Re: Error when indexing XML files To: solr-user@lucene.apache.org Date: Wednesday, October 14, 2009, 2:25 AM Hi, I am trying to index XML files using SolrJ. The original XML file contains nested elements. For example, the following is the snippet of the XML file. entry nameSOMETHING /name facilitySOME_OTHER_THING/facility /entry I have added the elements name and facility in Schema.xml file to make these elements indexable. I have changed the XML document above to look like - add doc .. field name=nameSOMETHING/field .. /doc /add Can you send us the Schema.xml file you created? I suspect that one of the fields should be multivalued. -- Fergus. ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- This is the Solr schema file. This file should be named schema.xml and should be in the conf directory under the solr home (i.e. ./solr/conf/schema.xml by default) or located where the classloader for the Solr webapp can find it. This example schema is the recommended starting point for users. It should be kept correct and concise, usable out-of-the-box. For more information, on how to customize this file, please see http://wiki.apache.org/solr/SchemaXml -- schema name=example version=1.1 !-- attribute name is the name of this schema and is only used for display purposes. Applications should change this to reflect the nature of the search collection. version=1.1 is Solr's version number for the schema syntax and semantics. It should not normally be changed by applications. 1.0: multiValued attribute did not exist, all fields are multiValued by nature 1.1: multiValued attribute introduced, false by default -- types !-- field type definitions. The name attribute is just a label to be used by field definitions. The class attribute and any other attributes determine the real behavior of the fieldType. Class names starting with solr refer to java classes in the org.apache.solr.analysis package. -- !-- The StrField type is not analyzed, but indexed/stored verbatim. - StrField and TextField support an optional compressThreshold which limits compression (if enabled in the derived fields) to values which exceed a certain size (in characters). -- fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ !-- boolean type: true or false -- fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ !-- The optional sortMissingLast and sortMissingFirst attributes are currently supported on types that are sorted internally as strings. - If sortMissingLast=true, then a sort on this field will cause documents without the field to come after documents with the field, regardless of the requested sort order (asc or desc). - If sortMissingFirst=true, then a sort on this field will cause documents without the field to come before documents with the field, regardless of the requested sort order. - If sortMissingLast=false and sortMissingFirst=false (the default), then default lucene sorting will be used which places docs without the field first in an ascending sort and last in a descending sort. -- !-- numeric field types that store and index the text value verbatim (and hence don't support range queries, since the lexicographic ordering isn't equal to the numeric ordering) -- fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ !-- Numeric field types that manipulate the value into a string value that isn't human-readable in its internal form, but with a lexicographic ordering the same as the numeric