Re: SolrCloud setup - any advice?
Sorry, my bad. For SolrCloud soft commits are enabled (every 15 seconds). I do a hard commit from an external cron task via curl every 15 minutes. The version I'm using for the SolrCloud setup is 4.4.0. Document cache warm-up times are 0ms. Filter cache warm-up times are between 3 and 7 seconds. Query result cache warm-up times are between 0 and 2 seconds. I haven't tried disabling the caches, I'll give that a try and see what happens. This isn't a static index. We are indexing documents into it. We're keeping up with our normal update load, which is to make updates to a percentage of the documents (thousands, not hundreds). On 19 September 2013 20:33, Shreejay Nair shreej...@gmail.com wrote: Hi Neil, Although you haven't mentioned it, just wanted to confirm - do you have soft commits enabled? Also what's the version of solr you are using for the solr cloud setup? 4.0.0 had lots of memory and zk related issues. What's the warmup time for your caches? Have you tried disabling the caches? Is this is static index or you documents are added continuously? The answers to these questions might help us pin point the issue... On Thursday, September 19, 2013, Neil Prosser wrote: Apologies for the giant email. Hopefully it makes sense. We've been trying out SolrCloud to solve some scalability issues with our current setup and have run into problems. I'd like to describe our current setup, our queries and the sort of load we see and am hoping someone might be able to spot the massive flaw in the way I've been trying to set things up. We currently run Solr 4.0.0 in the old style Master/Slave replication. We have five slaves, each running Centos with 96GB of RAM, 24 cores and with 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but aren't slow either. Our GC parameters aren't particularly exciting, just -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11. Our index size ranges between 144GB and 200GB (when we optimise it back down, since we've had bad experiences with large cores). We've got just over 37M documents some are smallish but most range between 1000-6000 bytes. We regularly update documents so large portions of the index will be touched leading to a maxDocs value of around 43M. Query load ranges between 400req/s to 800req/s across the five slaves throughout the day, increasing and decreasing gradually over a period of hours, rather than bursting. Most of our documents have upwards of twenty fields. We use different fields to store territory variant (we have around 30 territories) values and also boost based on the values in some of these fields (integer ones). So an average query can do a range filter by two of the territory variant fields, filter by a non-territory variant field. Facet by a field or two (may be territory variant). Bring back the values of 60 fields. Boost query on field values of a non-territory variant field. Boost by values of two territory-variant fields. Dismax query on up to 20 fields (with boosts) and phrase boost on those fields too. They're pretty big queries. We don't do any index-time boosting. We try to keep things dynamic so we can alter our boosts on-the-fly. Another common query is to list documents with a given set of IDs and select documents with a common reference and order them by one of their fields. Auto-commit every 30 minutes. Replication polls every 30 minutes. Document cache: * initialSize - 32768 * size - 32768 Filter cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 Query result cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 After a replicated core has finished downloading (probably while it's warming) we see requests which usually take around 100ms taking over 5s. GC logs show concurrent mode failure. I was wondering whether anyone can help with sizing the boxes required to split this index down into shards for use with SolrCloud and roughly how much memory we should be assigning to the JVM. Everything I've read suggests that running with a 48GB heap is way too high but every attempt I've made to reduce the cache sizes seems to wind up causing out-of-memory problems. Even dropping all cache sizes by 50% and reducing the heap by 50% caused problems. I've already tried using SolrCloud 10 shards (around 3.7M documents per shard, each with one replica) and kept the cache sizes low: Document cache: * initialSize - 1024 * size - 1024 Filter cache: * autowarmCount - 128 * initialSize - 512 * size - 512 Query result cache: * autowarmCount - 32 * initialSize - 128 * size - 128 Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB memory) and four shards on two boxes and three on the rest I still see concurrent mode failure. This looks like it's causing
Spellchecking
Hi, i'd like to know if is it possibile to have suggests only of a part of indexes. for example: an ecommerce: there are a lot of typologies of products (book, dvd, cd..) if i search inside books, i want only suggests of books products, not cds but the spellchecking indexs are all together. is it possibile to divided indexes or have suggests only of a typology? thanx -- Gastone
Hash range to shard assignment
Hello folks, we would like to have control of where certain hash values or ranges are being located. The reason is that we want to shard per user but we know ahead that one or more specific users could grow way faster than others. Therefore we would like to locate them on separate shards (which may be on the same server initially and can be moved out later). So my question: can we control the hash-ranges and hash-range to shard assignment in SolrCloud ? Regards, Lochri -- View this message in context: http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hash range to shard assignment
This would need you to plug your own router . It is not yet possible But , you can split that shard repeatedly and keep the no:of users in that shard limited On Fri, Sep 20, 2013 at 3:52 PM, lochri loc...@web.de wrote: Hello folks, we would like to have control of where certain hash values or ranges are being located. The reason is that we want to shard per user but we know ahead that one or more specific users could grow way faster than others. Therefore we would like to locate them on separate shards (which may be on the same server initially and can be moved out later). So my question: can we control the hash-ranges and hash-range to shard assignment in SolrCloud ? Regards, Lochri -- View this message in context: http://lucene.472066.n3.nabble.com/Hash-range-to-shard-assignment-tp4091204.html Sent from the Solr - User mailing list archive at Nabble.com. -- - Noble Paul
RE: Spellchecking
If you're using spellcheck.collate you can also set spellcheck.maxCollationTries to validate each collation against the index before suggesting it. This validation takes into account any fq parameters on your query, so if your original query has fq=Product:Book, then the collations returned will all be vetted by internally running the query with that filter applied. If for some reason your main query does not have fq=Product:Book, but you want it considered when collations are being built, you can include spellcheck.collateParam.fq=Product:Book. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate and following sections. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Gastone Penzo [mailto:gastone.pe...@gmail.com] Sent: Friday, September 20, 2013 4:00 AM To: solr-user@lucene.apache.org Subject: Spellchecking Hi, i'd like to know if is it possibile to have suggests only of a part of indexes. for example: an ecommerce: there are a lot of typologies of products (book, dvd, cd..) if i search inside books, i want only suggests of books products, not cds but the spellchecking indexs are all together. is it possibile to divided indexes or have suggests only of a typology? thanx -- Gastone
Re: Will Solr work with a mapped drive?
Hi, Try the UNC path instead: http://wiki.apache.org/tomcat/FAQ/Windows#Q6 Regards, Aloke On 9/20/13, johnmu...@aol.com johnmu...@aol.com wrote: Hi, I'm having this same problem as described here: http://stackoverflow.com/questions/17708163/absolute-paths-in-solr-xml-configuration-using-tomcat6-on-windows Any one knows if this is a limitation of Solr or not? I searched the web, nothing came up. Thanks!!! -- MJ
Re: check which file/document cause solr to work hard
you can always commit them one at a time to the ExtractingRequestHandler http://wiki.apache.org/solr/ExtractingRequestHandler Best, Erick On Tue, Sep 17, 2013 at 6:47 AM, Yossi Nachum nachum...@gmail.com wrote: Hi, I am trying to index my windows pc files with manifoldcf version 1.3 and solr version 4.4. Few minutes after I start the crawler job I see that tomcat process constantly consume 100% of one cpu (I have two cpu's). I check the thread dump in solr admin and saw that the following threads take the most cpu/user time http-8080-3 (32) - java.io.FileInputStream.readBytes(Native Method) - java.io.FileInputStream.read(FileInputStream.java:236) - java.io.BufferedInputStream.fill(BufferedInputStream.java:235) - java.io.BufferedInputStream.read1(BufferedInputStream.java:275) - java.io.BufferedInputStream.read(BufferedInputStream.java:334) - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) - java.io.FilterInputStream.read(FilterInputStream.java:133) - org.apache.tika.io.TailStream.read(TailStream.java:117) - org.apache.tika.io.TailStream.skip(TailStream.java:140) - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283) - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160) - org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193) - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) - org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) - org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) - org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) - org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) - org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) - org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) - org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) - org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) - org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) - org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) - org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) - org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) - org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) - java.lang.Thread.run(Thread.java:679) how can I check which file cause tika to work so hard? I don't see anything in the log files and I am stuck Thanks, Yossi
Re: Solr node goes down while trying to index records
What happens if you bump up you zookeeper timeout? This has been an issue at times in the past. Best, Erick On Tue, Sep 17, 2013 at 1:48 PM, Furkan KAMACI furkankam...@gmail.comwrote: Could you give some information about your jetty.xml and give more info about your index rate and RAM usage of your machines? 17 Eylül 2013 Salı tarihinde neoman harira...@gmail.com adlı kullanıcı şöyle yazdı: yes. the nodes go down while indexing. if we stop indexing, it does not go down. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610p4090644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents
You're probably exceeding the size that your servlet container allows. This assumes you're using curl or some such. You can change it. How big is the document and how are you sending it to Solr? Best, Erick On Tue, Sep 17, 2013 at 2:24 PM, Furkan KAMACI furkankam...@gmail.comwrote: Currently I hafer over 50+ millions documents at my index and as I mentiod before at another question I have some problems while indexing (jetty EOF exception) I know that problem may not be about index size but just I want to learn that is there any limit for document size at Solr that if I exceed it I can have some problems? I am not talking about the theoretical limit. What are the maximim index size for folks and what they to handle heavy index rate when having millions of documents. What tuning strategies they do? PS: I have 18 machines, 9 shards, each machine has 48 GB RAM and I use Solr 4.2.1 for my SolrCloud.
Need help understanding the use cases behind core auto-discovery
Trying to add some information about core.properties and auto-discovery in Solr in Action and am at a loss for what to tell the reader is the purpose of this feature. Can anyone point me to any background information about core auto-discovery? I'm not interested in the technical implementation details. Mainly I'm trying to understand the motivation behind having this feature as it seems unnecessary with the Core Admin API. Best I can tell is it removes a manual step of firing off a call to the Core Admin API or loading a core from the Admin UI. If that's it and I'm overthinking it, then cool but was expecting more of an ah-ha moment with this feature ;-) Any insights you can share are appreciated. Thanks. Tim
Problem running EmbeddedSolr (spring data)
What is the cause of this Stactrace? Working with the following solr maven dependancies solr-core-version4.4.0/ solr-core-version spring-data-solr-version1.0.0.RC1/spring-data-solr-version Stacktrace SEVERE: Exception sending context initialized event to listener instance of class org.springframework.web.context.ContextLoaderListener org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'solrServerFactoryBean' defined in class path resource [com/project/core/config/EmbeddedSolrContext.class]: Invocation of init method failed; nested exception is java.lang.NoSuchMethodError: org.apache.solr.core.CoreContainer.init(Ljava/lang/String;Ljava/io/File;)V at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1482) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:521) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:458) at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:295) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:223) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:292) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:608) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:932) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:479) at org.springframework.web.context.ContextLoader.configureAndRefreshWebApplicationContext(ContextLoader.java:389) at org.springframework.web.context.ContextLoader.initWebApplicationContext(ContextLoader.java:294) at org.springframework.web.context.ContextLoaderListener.contextInitialized(ContextLoaderListener.java:112) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:4887) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5381) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1559) at org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1549) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.NoSuchMethodError: org.apache.solr.core.CoreContainer.init(Ljava/lang/String;Ljava/io/File;)V at org.springframework.data.solr.server.support.EmbeddedSolrServerFactory.createPathConfiguredSolrServer(EmbeddedSolrServerFactory.java:96) at org.springframework.data.solr.server.support.EmbeddedSolrServerFactory.initSolrServer(EmbeddedSolrServerFactory.java:72) at org.springframework.data.solr.server.support.EmbeddedSolrServerFactoryBean.afterPropertiesSet(EmbeddedSolrServerFactoryBean.java:41) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1541) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1479) ... 22 more //Config Class @Configuration @EnableSolrRepositories(core.solr.repository) @Profile(dev) @PropertySource(classpath:solr.properties) public class EmbeddedSolrContext { @Resource private Environment environment; @Bean public EmbeddedSolrServerFactoryBean solrServerFactoryBean() { EmbeddedSolrServerFactoryBean factory = new EmbeddedSolrServerFactoryBean(); factory.setSolrHome(environment.getRequiredProperty(solr.solr.home)); return factory; } @Bean public SolrTemplate solrTemplate() throws Exception { return new SolrTemplate(solrServerFactoryBean().getObject()); } } Solr.properties solr.server.url=http://localhost:8983/solr/ solr.solr.home=classpath*:com/project/core/solr -- NOTE: points to an empty package inside the project
Re: Need help understanding the use cases behind core auto-discovery
On Fri, Sep 20, 2013 at 11:56 AM, Timothy Potter thelabd...@gmail.com wrote: Trying to add some information about core.properties and auto-discovery in Solr in Action and am at a loss for what to tell the reader is the purpose of this feature. IMO, it was more a removal of unnecessary central configuration. You previously had to list the core in solr.xml, and now you don't. Cores should be fully self-describing so that it should be easy to move them in the future just by moving the core directory (although that may not yet work...) -Yonik http://lucidworks.com Can anyone point me to any background information about core auto-discovery? I'm not interested in the technical implementation details. Mainly I'm trying to understand the motivation behind having this feature as it seems unnecessary with the Core Admin API. Best I can tell is it removes a manual step of firing off a call to the Core Admin API or loading a core from the Admin UI. If that's it and I'm overthinking it, then cool but was expecting more of an ah-ha moment with this feature ;-) Any insights you can share are appreciated. Thanks. Tim
Re: Need help understanding the use cases behind core auto-discovery
Exactly the insight I was looking for! Thanks Yonik ;-) On Fri, Sep 20, 2013 at 10:37 AM, Yonik Seeley yo...@lucidworks.com wrote: On Fri, Sep 20, 2013 at 11:56 AM, Timothy Potter thelabd...@gmail.com wrote: Trying to add some information about core.properties and auto-discovery in Solr in Action and am at a loss for what to tell the reader is the purpose of this feature. IMO, it was more a removal of unnecessary central configuration. You previously had to list the core in solr.xml, and now you don't. Cores should be fully self-describing so that it should be easy to move them in the future just by moving the core directory (although that may not yet work...) -Yonik http://lucidworks.com Can anyone point me to any background information about core auto-discovery? I'm not interested in the technical implementation details. Mainly I'm trying to understand the motivation behind having this feature as it seems unnecessary with the Core Admin API. Best I can tell is it removes a manual step of firing off a call to the Core Admin API or loading a core from the Admin UI. If that's it and I'm overthinking it, then cool but was expecting more of an ah-ha moment with this feature ;-) Any insights you can share are appreciated. Thanks. Tim
Re: Solr node goes down while trying to index records
What happens if you bump up you zookeeper timeout? This has been an issue at times in the past. Best, Erick On Tue, Sep 17, 2013 at 1:48 PM, Furkan KAMACI furkankam...@gmail.com wrote: Could you give some information about your jetty.xml and give more info about your index rate and RAM usage of your machines? 17 Eylül 2013 Salı tarihinde neoman harira...@gmail.com adlı kullanıcı şöyle yazdı: yes. the nodes go down while indexing. if we stop indexing, it does not go down. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-node-goes-down-while-trying-to-index-records-tp4090610p4090644.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: check which file/document cause solr to work hard
you can always commit them one at a time to the ExtractingRequestHandler http://wiki.apache.org/solr/ExtractingRequestHandler Best, Erick On Tue, Sep 17, 2013 at 6:47 AM, Yossi Nachum nachum...@gmail.com wrote: Hi, I am trying to index my windows pc files with manifoldcf version 1.3 and solr version 4.4. Few minutes after I start the crawler job I see that tomcat process constantly consume 100% of one cpu (I have two cpu's). I check the thread dump in solr admin and saw that the following threads take the most cpu/user time http-8080-3 (32) - java.io.FileInputStream.readBytes(Native Method) - java.io.FileInputStream.read(FileInputStream.java:236) - java.io.BufferedInputStream.fill(BufferedInputStream.java:235) - java.io.BufferedInputStream.read1(BufferedInputStream.java:275) - java.io.BufferedInputStream.read(BufferedInputStream.java:334) - org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99) - java.io.FilterInputStream.read(FilterInputStream.java:133) - org.apache.tika.io.TailStream.read(TailStream.java:117) - org.apache.tika.io.TailStream.skip(TailStream.java:140) - org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283) - org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160) - org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193) - org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) - org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) - org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) - org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) - org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) - org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241) - org.apache.solr.core.SolrCore.execute(SolrCore.java:1904) - org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362) - org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) - org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) - org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) - org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) - org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) - org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) - org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) - org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) - org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) - org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:857) - org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) - org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) - java.lang.Thread.run(Thread.java:679) how can I check which file cause tika to work so hard? I don't see anything in the log files and I am stuck Thanks, Yossi
Re: Migrating from Endeca
On 9/19/2013 5:50 AM, Gareth Poulton wrote: A customer wants us to move their entire enterprise platform - of which one of the many components is Oracle Endeca - to open source. However, customers being the way they are, they don't want to have to give up any of the features they currently use, the most prominent of which are user friendly web-based editors for non-technical people to be able to edit things like: - Schema - Dimensions (i.e. facets) - Dimension groups (not sure what these are) - Thesaurus - Stopwords - Report generation - Boosting individual records (i.e. sponsored links) - Relevance ranking settings - Process pipeline editor for, e.g. adding new languages -...all without touching any xml. I think Jack and Alexandre have pretty much covered what exists now for Solr without paying someone for features and support - not much. There is however some background work underway to bring features exactly like this to Solr. Except for the Schema REST API that exists right now, I don't think any of it has much priority. The priority is likely to increase in the future, but probably not fast enough for your needs. There is a strong desire among the top Solr developers to have Solr always be in SolrCloud mode in a future major version release -- which means it would use Zookeeper to store all config information, just like SolrCloud does now. When your config is in a separate network service instead of traditional config files, the ability to edit the config using API calls is very important. Creating a UI front-end that uses the API and doesn't require editing XML would be EXTREMELY nice. I'm pretty sure that this is the goal with the current work on the Schema REST API. If you have any idea how to bring these features to Solr, patches are always welcome! Some of the things in your list, particularly facets and grouping (which is what dimension groups might be equivalent to) are normally handled in client code. The application creates the parameters it needs and handles the response. With Solr they aren't normally configured on the server side. You could do so, by putting parameters in request handler definitions. Thanks, Shawn
Cause of NullPointer Exception? (Solr with Spring Data)
I am unsure about the cause of the following NullPointer Exception. Any Ideas? Thanks Exception in thread main org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'aDocumentService': Injection of autowired dependencies failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: com.project.core.solr.repository.DocumentRepository com.project.core.solr.service.impl.DocumentServiceImpl.DocRepo; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'DocumentRepository': FactoryBean threw exception on object creation; nested exception is java.lang.NullPointerException at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:288) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.populateBean(AbstractAutowireCapableBeanFactory.java:1116) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:519) at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:458) at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:295) at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:223) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:292) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194) at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:626) at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:932) at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:479) at org.springframework.context.annotation.AnnotationConfigApplicationContext.init(AnnotationConfigApplicationContext.java:73) at com.project.core.solr..DocumentTester.main(DocumentTester.java:18) Caused by: org.springframework.beans.factory.BeanCreationException: Could not autowire field: com.project.core.solr.repository.DocumentRepository com.project.core.solr.service.impl.DocumentServiceImpl.DocRepo; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'DocumentRepository': FactoryBean threw exception on object creation; nested exception is java.lang.NullPointerException at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:514) at org.springframework.beans.factory.annotation.InjectionMetadata.inject(InjectionMetadata.java:87) at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor.postProcessPropertyValues(AutowiredAnnotationBeanPostProcessor.java:285) ... 12 more Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'DocumentRepository': FactoryBean threw exception on object creation; nested exception is java.lang.NullPointerException at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:149) at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:102) at org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1454) at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:306) at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:194) at org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:910) at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:853) at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:768) at org.springframework.beans.factory.annotation.AutowiredAnnotationBeanPostProcessor$AutowiredFieldElement.inject(AutowiredAnnotationBeanPostProcessor.java:486) ... 14 more Caused by: java.lang.NullPointerException at org.springframework.data.solr.repository.support.MappingSolrEntityInformation.getIdAttribute(MappingSolrEntityInformation.java:68) at org.springframework.data.solr.repository.support.SimpleSolrRepository.init(SimpleSolrRepository.java:73) at
Re: SolrCloud setup - any advice?
On 9/19/2013 9:20 AM, Neil Prosser wrote: Apologies for the giant email. Hopefully it makes sense. Because of its size, I'm going to reply inline like this and I'm going to trim out portions of your original message. I hope that's not horribly confusing to you! Looking through my archive of the mailing list, I see that I have given you some of this information before. Our index size ranges between 144GB and 200GB (when we optimise it back down, since we've had bad experiences with large cores). We've got just over 37M documents some are smallish but most range between 1000-6000 bytes. We regularly update documents so large portions of the index will be touched leading to a maxDocs value of around 43M. Query load ranges between 400req/s to 800req/s across the five slaves throughout the day, increasing and decreasing gradually over a period of hours, rather than bursting. With indexes of that size and 96GB of RAM, you're starting to get into the size range where severe performance problems begin happening. Also, with no GC tuning other than turning on CMS (and a HUGE 48GB heap on top of that), you're going to run into extremely long GC pause times. Your query load is what I would call quite high, which will make those GC problems quite frequent. This is the problem I was running into with only an 8GB heap, with similar tuning where I just turned on CMS. When Solr disappears for 10+ seconds at a time for garbage collection, the load balancer will temporarily drop that server from the available pool. I'm aware that this is your old setup, so we'll put it aside for now so we can concentrate on your SolrCloud setup. Most of our documents have upwards of twenty fields. We use different fields to store territory variant (we have around 30 territories) values and also boost based on the values in some of these fields (integer ones). So an average query can do a range filter by two of the territory variant fields, filter by a non-territory variant field. Facet by a field or two (may be territory variant). Bring back the values of 60 fields. Boost query on field values of a non-territory variant field. Boost by values of two territory-variant fields. Dismax query on up to 20 fields (with boosts) and phrase boost on those fields too. They're pretty big queries. We don't do any index-time boosting. We try to keep things dynamic so we can alter our boosts on-the-fly. The nature of your main queries (and possibly your filters) is probably always going to be a little memory hungry, but it sounds like the facets are probably what's requiring such incredible amounts of heap RAM. Try putting a facet.method parameter into your request handler defaults and set it to enum. The default is fc which means fieldcache - it basically loads all the indexed terms for that field on the entire index into the field cache. Multiply that by the number of fields that you facet on (across all your queries), and it can be a real problem. Memory is always going to be required for quick facets, but it's generally better to let the OS handle it automatically with disk caching than to load it into the java heap. Your next paragraph (which I trimmed) talks about sorting, which is another thing that eats up java heap. The amount taken is based on the number of documents in the index, and a chunk is taken for every field that you use for sorting. See if you can reduce the number of fields you use for sorting. Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB memory) and four shards on two boxes and three on the rest I still see concurrent mode failure. This looks like it's causing ZooKeeper to mark the node as down and things begin to struggle. Is concurrent mode failure just something that will inevitably happen or is it avoidable by dropping the CMSInitiatingOccupancyFraction? I assume that concurrent mode failure is what gets logged preceding a full garbage collection. Aggressively tuning your GC will help immensely. The link below has what I am currently using. Someone on IRC was saying that they have a 48GB heap with similar settings and they never see huge pauses. These tuning parameters don't use fixed memory sizes, so it should work with any size max heap: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning Otis has mentioned G1. What I found when I used G1 was that it worked extremely well *almost* all of the time. The occasions for full garbage collections were a LOT less frequent, but when they happened, the pause was *even longer* than the untuned CMS. That caused big problems for me and my load balancer. Until someone can come up with some awesome G1 tuning parameters, I personally will continue to avoid it except for small-heap applications. G1 is an awesome idea. If it can be tuned, it will probably be better than a tuned CMS. Switching to facet.method=enum as outlined above will probably do the most for letting you decrease your max java heap.
Re: JVM Crash using solr 4.4 on Centos
Thanks Michael, I thought I had the latest but it turned out to be from July 2011. Working Fine with the latest build :-) On Thu, Sep 19, 2013 at 7:29 PM, Michael Ryan mr...@moreover.com wrote: This is a known bug in that JDK version. Upgrade to a newer version of JDK 7 (any build within the last two years or so should be fine). If that's not possible for you, you can add -XX:-UseLoopPredicate as a command line option to java to work around this. -Michael -Original Message- From: Oak McIlwain [mailto:oak.mcilw...@gmail.com] Sent: Thursday, September 19, 2013 10:10 PM To: solr-user@lucene.apache.org Subject: JVM Crash using solr 4.4 on Centos I have solr 4.4 running on tomcat 7 on my local development environment which is ubuntu based and it works fine (Querying, Posting Documents, Data Import etc.) I am trying to move into a staging environment which is Centos based (still using tomcat 7 and solr 4.4 however when attempting to post documents and do a data import from mysql through jdbc, after a few hundred documents, the tomcat server crashes and it logs: # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fb4d8fe5e85, pid=10620, tid=140414656674112 # # JRE version: 7.0-b147 # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode linux-amd64 compressed oops) # Problematic frame: # J org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z I'm using Sun Java JDK 1.7.0 Anyone got any ideas I can pursue to resolve this?
java.lang.LinkageError when using custom filters in multiple cores
I have two cores favorite and user running in the same Tomcat instance. In each of these cores I have identical field types text_en, text_de, text_fr, and text_ja. These fields use some custom token filters I've written. Everything was going smoothly when I only had the favorite core. When I added the user core, I started getting java.lang.LinkageErrors being thrown when I start up Tomcat. The error always happens with one of the classes I've written, but it's unpredictable which class the classloader chokes on. Here's the really strange part. I comment out the text_* fields in the user core and the errors go away (makes sense). I add text_en back in, no error (OK). I add text_fr back in, no error (OK). I add text_de back in, and I get the error (ah ha!). I comment text_de out again, and I still get the same error (wtf?). I also put a break point at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424), and when I load everything one at a time, I don't get any errors. I'm running Tomcat 5.5.28, Java version 1.6.0_39 and Solr 4.2.0. I'm running this all within Eclipse 1.5.1 on a mac. I have not tested this on a production-like system yet. Here's an example stack trace. In this case it was one of my Japanese filters, but other times it will choke on my synonym filter, or my compound word filter. The specific class it fails on doesn't seem to be relevant. SEVERE: null:java.lang.LinkageError: loader (instance of org/apache/catalina/loader/WebappClassLoader): attempted duplicate class definition for name: com/shopstyle/solrx/KatakanaVuFilterFactory at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.java:904) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1353) at java.lang.ClassLoader.loadClass(ClassLoader.java:295) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:627) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:249) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:424) at org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:462) at org.apache.solr.util.plugin.AbstractPluginLoader.create(AbstractPluginLoader.java:89) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:392) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:86) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:373) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:121) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1018) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) - Hayden
Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents
A, good to know Shawn... Erick On Fri, Sep 20, 2013 at 1:04 PM, Shawn Heisey s...@elyograg.org wrote: On 9/20/2013 12:34 PM, Erick Erickson wrote: You're probably exceeding the size that your servlet container allows. This assumes you're using curl or some such. You can change it. How big is the document and how are you sending it to Solr? The maximum form size is configurable in Solr, not sure whether that change went in for 4.1 or 4.2. Solr will override what the servlet container itself has configured. In the requestDispatcher section of solrconfig.xml, you can have a requestParsers tag. One of the attributes for that tag can be formdataUploadLimitInKB. The default value for that setting is 2048, for a maximum POST size of 2MB. This should be described in the example solrconfig.xml file. Thanks, Shawn
Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents
On 9/20/2013 12:34 PM, Erick Erickson wrote: You're probably exceeding the size that your servlet container allows. This assumes you're using curl or some such. You can change it. How big is the document and how are you sending it to Solr? The maximum form size is configurable in Solr, not sure whether that change went in for 4.1 or 4.2. Solr will override what the servlet container itself has configured. In the requestDispatcher section of solrconfig.xml, you can have a requestParsers tag. One of the attributes for that tag can be formdataUploadLimitInKB. The default value for that setting is 2048, for a maximum POST size of 2MB. This should be described in the example solrconfig.xml file. Thanks, Shawn
Getting term offsets from Solr
Hi, We're looking at implementing highlighting for some fields which may be too large to store in the index. As an alternative to using the Solr Highlighter (which needs fields to be stored), I was wondering if a) the offsets of terms are stored BY DEFAULT in the index (even if we're not using the TermVectorComponent) and if so, b) is there a way to get the offset information from Solr. Thanks, Nalini
Re: Getting term offsets from Solr
Set: termVectors=true termPositions=true termOffsets=true And use the fast vector highlighter. -- Jack Krupansky -Original Message- From: Nalini Kartha Sent: Friday, September 20, 2013 7:34 PM To: solr-user@lucene.apache.org Subject: Getting term offsets from Solr Hi, We're looking at implementing highlighting for some fields which may be too large to store in the index. As an alternative to using the Solr Highlighter (which needs fields to be stored), I was wondering if a) the offsets of terms are stored BY DEFAULT in the index (even if we're not using the TermVectorComponent) and if so, b) is there a way to get the offset information from Solr. Thanks, Nalini
Re: Limits of Document Size at SolrCloud and Faced Problems with Large Size of Documents
You're probably exceeding the size that your servlet container allows. This assumes you're using curl or some such. You can change it. How big is the document and how are you sending it to Solr? Best, Erick On Tue, Sep 17, 2013 at 4:28 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi 50m docs across 18 servers 48gb RAM ain't much. I doubt you are hitting any limits in lucene or solr. How heavy is your index rate? Otis Solr ElasticSearch Support http://sematext.com/ On Sep 17, 2013 5:25 PM, Furkan KAMACI furkankam...@gmail.com wrote: Currently I hafer over 50+ millions documents at my index and as I mentiod before at another question I have some problems while indexing (jetty EOF exception) I know that problem may not be about index size but just I want to learn that is there any limit for document size at Solr that if I exceed it I can have some problems? I am not talking about the theoretical limit. What are the maximim index size for folks and what they to handle heavy index rate when having millions of documents. What tuning strategies they do? PS: I have 18 machines, 9 shards, each machine has 48 GB RAM and I use Solr 4.2.1 for my SolrCloud.
Re: Getting term offsets from Solr
Thanks for the reply. We tried enabling these options but that's also causing too much index bloat so I was wondering if there's a way to get at the offset information more cheaply? Thanks, Nalini On Fri, Sep 20, 2013 at 4:41 PM, Jack Krupansky j...@basetechnology.comwrote: Set: termVectors=true termPositions=true termOffsets=true And use the fast vector highlighter. -- Jack Krupansky -Original Message- From: Nalini Kartha Sent: Friday, September 20, 2013 7:34 PM To: solr-user@lucene.apache.org Subject: Getting term offsets from Solr Hi, We're looking at implementing highlighting for some fields which may be too large to store in the index. As an alternative to using the Solr Highlighter (which needs fields to be stored), I was wondering if a) the offsets of terms are stored BY DEFAULT in the index (even if we're not using the TermVectorComponent) and if so, b) is there a way to get the offset information from Solr. Thanks, Nalini
Re: Getting term offsets from Solr
I'm wondering if storing just the offset as a payload would be cheaper from storage perspective than enabling termOffsets, termVectors and termPositions? Maybe we could get the offset info to return with results from there then? Thanks, Nalini On Fri, Sep 20, 2013 at 5:02 PM, Nalini Kartha nalinikar...@gmail.comwrote: Thanks for the reply. We tried enabling these options but that's also causing too much index bloat so I was wondering if there's a way to get at the offset information more cheaply? Thanks, Nalini On Fri, Sep 20, 2013 at 4:41 PM, Jack Krupansky j...@basetechnology.comwrote: Set: termVectors=true termPositions=true termOffsets=true And use the fast vector highlighter. -- Jack Krupansky -Original Message- From: Nalini Kartha Sent: Friday, September 20, 2013 7:34 PM To: solr-user@lucene.apache.org Subject: Getting term offsets from Solr Hi, We're looking at implementing highlighting for some fields which may be too large to store in the index. As an alternative to using the Solr Highlighter (which needs fields to be stored), I was wondering if a) the offsets of terms are stored BY DEFAULT in the index (even if we're not using the TermVectorComponent) and if so, b) is there a way to get the offset information from Solr. Thanks, Nalini