Re: Decompound German Words
Dear Satish, did you found a decompounding dictionary for german? Best Regards Martin -- View this message in context: http://lucene.472066.n3.nabble.com/Decompound-German-Words-tp3708194p3966013.html Sent from the Solr - User mailing list archive at Nabble.com.
Solritas in production
Hi, We're currently evaluating Solr as a Sphinx replacement. Our site has 1.000.000+ pageviews a day, it's a real estate search engine. The development is almost done, and it seems to be working fine, however some of my colleagues come with the idea that we're using it wrong. We're using it as a service from PHP/Symfony. They think we should use Solritas as a frontend, so site visitors will directly use it, so no PHP will be involved, so it will be use much less infrastructure. One of them said that even mobile.de using it that way (I have found no clue about it at all). Do you think is it a good idea? Do you know services using Solritas as a frontend on a public site? My personal opinion is that using Solritas in production is a very bad idea for us, but have not so much experience with Solr yet, and Solritas documentation is far from a detailed, up-to-date one, so don't really know what is it really usable for. Thanks, Andras
Re: question about NRT(soft commit) and Transaction Log in trunk
This is a good question... I don't know much about how Solr's transaction log works, but, peeking in the code, I do see it fsync'ing (look in TransactionLog.java, in the finish method), but only if the SyncLevel is FSYNC. If the default is really flush, I don't see how the transaction log helps on recovery...? Should we change the default ot FSYNC? Mike McCandless http://blog.mikemccandless.com On Sat, Apr 28, 2012 at 7:11 AM, Li Li fancye...@gmail.com wrote: hi I checked out the trunk and played with its new soft commit feature. it's cool. But I've got a few questions about it. By reading some introductory articles and wiki, and hasted code reading, my understand of it's implementation is: For normal commit(hard commit), we should flush all into disk and commit it. flush is not very time consuming because of os level cache. the most time consuming one is sync in commit process. Soft commit just flush postings and pending deletions into disk and generating new segments. Then solr can use a new searcher to read the latest indexes and warm up and then register itself. if there is no hard commit and the jvm crashes, then new data may lose. if my understanding is correct, then why we need transaction log? I found in DirectUpdateHandler2, every time a command is executed, TransactionLog will record a line in log. But the default sync level in RunUpdateProcessorFactory is flush, which means it will not sync the log file. does this make sense? in database implementation, we usually write log and modify data in memory because log is smaller than real data. if crashes. we can redo the unfinished log and make data correct. will Solr leverage this log like this? if it is, why it's not synced?
Partition Question
Hi All We have an index of ~2,000,000,000 Documents and the query and facet times are too slow for us. Before using the shards solution for improving performance, we thought about using the multicore feature (our goal is to maximize performance for a single machine). Most of our queries will be limited by time, hence we want to partition the data by date/time. We want to partition the data because the index size is too big and doesn't fit into memory (80 Gb's). 1. Is multi core the best way to implement my requirement? 2. I noticed there are some LOAD / UNLOAD actions on a core - should i use these action when managing my cores? if so how can i LOAD a core that i have unloaded for example: I have 7 partitions / cores - one for each day of the week In most cases I will search for documents only on the last day core. Once every 1 queries I need documents from all cores. Question: Do I need to unload all of the old cores and then load them on demand (when i see i need data from these cores)? 3. If the question to the last answer is no, how do i ensure that only cores that are loaded into memory are the ones I want? Thanks Yuval
Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!
Thank you very much for responding Mr.Erickson. You may be right on old version index, I will reindex. However we have a 2 separate/disjoint master-slave setup...only one query node/slave has this issue. if it was really incompatible indexes why isnt the other query server also throwing errors? that's what is throwing my debugging thought process off. Thanks Ravi Kiran Bhaskar Principal Software Engineer Washington Post Digital 1150 15th Street NW, Washington, DC 20071 On Sat, May 5, 2012 at 12:53 PM, Erick Erickson erickerick...@gmail.com wrote: The first thing I'd check is if, in the log, there is a replication happening immediately prior to the error. I confess I'm not entirely up on the version thing, but is it possible you're replicating an index that is built with some other version of Solr? That would at least explain your statement that it runs OK, but then fails sometime later. Best Erick On Fri, May 4, 2012 at 1:50 PM, Ravi Solr ravis...@gmail.com wrote: Hello, We Recently we migrated our SOLR 3.6 server OS from Solaris to CentOS and from then on we started seeing Invalid version (expected 2, but 60) errors on one of the query servers (oddly one other query server seems fine). If we restart the server having issue everything will be alright, but the next day in the morning again we get the same exception. I made sure that all the client applications are using SOLR 3.6 version. The Glassfish on which all the applications and SOLR are deployed use Java 1.6.0_29. The only difference I could see 1. The process indexing to the server having issues is using java1.6.0_31 2. The process indexing to the server that DOES NOT have issues is using java1.6.0_29 Could the Java minor version being greater than the SOLR instance be the cause of this issue ??? Can anybody please help me debug this a bit more ? what else can I look at to understand the underlying problem. The stack trace is given below [#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5; org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311) at com.wpost.ipad.feeds.FeedController.findLinksetNewsBySection(FeedController.java:743) at com.wpost.ipad.feeds.FeedController.findNewsBySection(FeedController.java:347) at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549) at javax.servlet.http.HttpServlet.service(HttpServlet.java:734) at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) at org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587) at
Re: Solritas in production
Hi, Solritas (Velocity Response Writer) is NOT intended for production use. The simple reason, apart from that it is not production grade quality, is that it requires direct access to the Solr instance, as it is simply a response writer. You MUST use a separate front end layer above Solr and never expose Solr directly to the world. So you should feel totally comfortable continuing to use Solr over HTTP from PHP! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 6. mai 2012, at 14:02, András Bártházi wrote: Hi, We're currently evaluating Solr as a Sphinx replacement. Our site has 1.000.000+ pageviews a day, it's a real estate search engine. The development is almost done, and it seems to be working fine, however some of my colleagues come with the idea that we're using it wrong. We're using it as a service from PHP/Symfony. They think we should use Solritas as a frontend, so site visitors will directly use it, so no PHP will be involved, so it will be use much less infrastructure. One of them said that even mobile.de using it that way (I have found no clue about it at all). Do you think is it a good idea? Do you know services using Solritas as a frontend on a public site? My personal opinion is that using Solritas in production is a very bad idea for us, but have not so much experience with Solr yet, and Solritas documentation is far from a detailed, up-to-date one, so don't really know what is it really usable for. Thanks, Andras
Re: Partition Question
Hi, First you need to investigate WHY faceting and querying is too slow. What exactly do you mean by slow? Can you please tell us more about your setup? * How large documents and how many fields? * What kind of queries? How many hits? How many facets? Have you studies debugQuery=true output? * Do you use filter queries (fq) extensively? * What data do you facet on? Many unique values per field? Text or ranges? What facet.method? * What kind of hardware? RAM/CPU * How have you configured your JVM? How much memory? GC? As you see, you will have to provide a lot more information on your use case and setup in order for us to judge correct action to take. You might need to adjust your config, or to optimize your queries or caches, slim your schema, buy some more RAM, or an SSD :) Normally, going multi core on one box will not necessarily help in itself, as there is overhead in sharding multi cores as well. However, it COULD be a solution since you say that most of the time you only need to consider 1/7 of your data. I would perhaps consider one hot core for last 24h, and one archive core for older data. You could then tune these differently regarding caches etc. Can you get back with some more details? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 6. mai 2012, at 17:07, Yuval Dotan wrote: Hi All We have an index of ~2,000,000,000 Documents and the query and facet times are too slow for us. Before using the shards solution for improving performance, we thought about using the multicore feature (our goal is to maximize performance for a single machine). Most of our queries will be limited by time, hence we want to partition the data by date/time. We want to partition the data because the index size is too big and doesn't fit into memory (80 Gb's). 1. Is multi core the best way to implement my requirement? 2. I noticed there are some LOAD / UNLOAD actions on a core - should i use these action when managing my cores? if so how can i LOAD a core that i have unloaded for example: I have 7 partitions / cores - one for each day of the week In most cases I will search for documents only on the last day core. Once every 1 queries I need documents from all cores. Question: Do I need to unload all of the old cores and then load them on demand (when i see i need data from these cores)? 3. If the question to the last answer is no, how do i ensure that only cores that are loaded into memory are the ones I want? Thanks Yuval
Re: Solritas in production
Hi Jan, I would answer András exactly the oposite :-) I would like to understand and ask you something. Would you see any problem if he had a Apache Httpd configured as reverse proxy (no PHP in it) in front of Solr just to restrict user access to only the Solritas's URL? This way Solr would not be directly exposed and he would not need to develop a PHP site/application. Maybe a Varnish layer would be even better as he has 1.000.000+ pageviews a day. Again, no PHP in this scenario. What's your opinion about both solutions? Thanks in advance, Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786 On Sun, May 6, 2012 at 7:42 PM, Jan Høydahl jan@cominvent.com wrote: Hi, Solritas (Velocity Response Writer) is NOT intended for production use. The simple reason, apart from that it is not production grade quality, is that it requires direct access to the Solr instance, as it is simply a response writer. You MUST use a separate front end layer above Solr and never expose Solr directly to the world. So you should feel totally comfortable continuing to use Solr over HTTP from PHP! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 6. mai 2012, at 14:02, András Bártházi wrote: Hi, We're currently evaluating Solr as a Sphinx replacement. Our site has 1.000.000+ pageviews a day, it's a real estate search engine. The development is almost done, and it seems to be working fine, however some of my colleagues come with the idea that we're using it wrong. We're using it as a service from PHP/Symfony. They think we should use Solritas as a frontend, so site visitors will directly use it, so no PHP will be involved, so it will be use much less infrastructure. One of them said that even mobile.de using it that way (I have found no clue about it at all). Do you think is it a good idea? Do you know services using Solritas as a frontend on a public site? My personal opinion is that using Solritas in production is a very bad idea for us, but have not so much experience with Solr yet, and Solritas documentation is far from a detailed, up-to-date one, so don't really know what is it really usable for. Thanks, Andras
Re: Solritas in production
Do not directly expose Solr to WWW traffic. It isn't designed for that. For example, the admin pages have no access controls. I can change my request parameters to request a million rows and put a huge load on your server. A few of those, and you are off the air. I can fetch your config, then send a command to DIH to do a full import. And so on. wunder On May 6, 2012, at 5:50 PM, Marcelo Carvalho Fernandes wrote: Hi Jan, I would answer András exactly the oposite :-) I would like to understand and ask you something. Would you see any problem if he had a Apache Httpd configured as reverse proxy (no PHP in it) in front of Solr just to restrict user access to only the Solritas's URL? This way Solr would not be directly exposed and he would not need to develop a PHP site/application. Maybe a Varnish layer would be even better as he has 1.000.000+ pageviews a day. Again, no PHP in this scenario. What's your opinion about both solutions? Thanks in advance, Marcelo Carvalho Fernandes +55 21 8272-7970 +55 21 2205-2786 On Sun, May 6, 2012 at 7:42 PM, Jan Høydahl jan@cominvent.com wrote: Hi, Solritas (Velocity Response Writer) is NOT intended for production use. The simple reason, apart from that it is not production grade quality, is that it requires direct access to the Solr instance, as it is simply a response writer. You MUST use a separate front end layer above Solr and never expose Solr directly to the world. So you should feel totally comfortable continuing to use Solr over HTTP from PHP! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 6. mai 2012, at 14:02, András Bártházi wrote: Hi, We're currently evaluating Solr as a Sphinx replacement. Our site has 1.000.000+ pageviews a day, it's a real estate search engine. The development is almost done, and it seems to be working fine, however some of my colleagues come with the idea that we're using it wrong. We're using it as a service from PHP/Symfony. They think we should use Solritas as a frontend, so site visitors will directly use it, so no PHP will be involved, so it will be use much less infrastructure. One of them said that even mobile.de using it that way (I have found no clue about it at all). Do you think is it a good idea? Do you know services using Solritas as a frontend on a public site? My personal opinion is that using Solritas in production is a very bad idea for us, but have not so much experience with Solr yet, and Solritas documentation is far from a detailed, up-to-date one, so don't really know what is it really usable for. Thanks, Andras -- Walter Underwood wun...@wunderwood.org
Re: Solritas in production
Dne 6.5.2012 14:02, András Bártházi napsal(a): We're currently evaluating Solr as a Sphinx replacement. The development is almost done, and it seems to be working fine why you want to replace sphinx with solr?
Re: Solritas in production
Hi, We're currently evaluating Solr as a Sphinx replacement. The development is almost done, and it seems to be working fine why you want to replace sphinx with solr? (is it ontopic here?) Solr+Lucene has far more features, and seems to be more extendable as well. - we had problems about speed related to faceting (while I know it can be implemented), Solr's faceting looks promising - we had to do some workaround to implement some other features on the site, the Solr version of these are simpler - and we miss a good solution for scalability and availability like Solr Cloud. Anyway, Sphinx works really well, but Solr seems to be better for us. Bye, Andras
Re: Solritas in production
Hi, - Original Message - Dne 6.5.2012 14:02, András Bártházi napsal(a): We're currently evaluating Solr as a Sphinx replacement. The development is almost done, and it seems to be working fine why you want to replace sphinx with solr? That's an interesting question. Until recently, I never saw any Sematext clients coming to us wanting to replace Sphinx. Then one company came to us with this request and I jokingly said how we never see people using Sphinx. Then suddenly, in the last few months, we've had a number of clients who mentioned Sphinx in one context or another. Is something (not) happening with Sphinx? Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm