Re: FAcet with values are displayes in output
q=country:[* TO *] will find all docs that have a value in a field. However, it seems you have a space, which *is* a value. I think Eric is right - track down that record and fix the data. Upayavira On Wed, Sep 18, 2013, at 09:23 AM, Prasi S wrote: How to filter them in the query itself? Thanks, Prasi On Wed, Sep 18, 2013 at 1:06 PM, Upayavira u...@odoko.co.uk wrote: Filter them out in your query, or in your display code. Upayavira On Wed, Sep 18, 2013, at 06:36 AM, Prasi S wrote: Hi , Im using solr 4.4 for our search. When i query for a keyword, it returns empty valued facets in the response lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=Country *int name=1/int* int name=USA1/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst I have also tried using facet.missing parameter., but no change. How can we handle this. Thanks, Prasi
Re: Memory Using In Faceted Search (UnInvertedField's)
I ran some load tests and working memory usage was always about 10-11 Gb (very slowly raising - that should be cause of query cache being filled in, I think). 6 Gb was always heap size while 4-5 Gb was reported as shareable memory. First, I became afraid that Solr could continue taking memory up to all available, but looks like it stops somewhere after fieldValueCache is filled in. Shawn, I had swap file growing (up to 50-60%) and working while load tests ran. Did you configure 'swapiness' on your Linux box (set it to 0 earlier, maybe)? If not, my Windows OS could be cause of that difference. I'm not sure if that's completely an issue about shareable memory or some missing JVM configurations (I don't have anything special except -Xmx, -Xms and -XX:MaxPermSize=512M) or some Solr memory leak. I'd appreciate any thoughts on that. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Memory-Using-In-Faceted-Search-UnInvertedField-s-tp4090889p4091014.html Sent from the Solr - User mailing list archive at Nabble.com.
Migrating from Endeca
Hi, A customer wants us to move their entire enterprise platform - of which one of the many components is Oracle Endeca - to open source. However, customers being the way they are, they don't want to have to give up any of the features they currently use, the most prominent of which are user friendly web-based editors for non-technical people to be able to edit things like: - Schema - Dimensions (i.e. facets) - Dimension groups (not sure what these are) - Thesaurus - Stopwords - Report generation - Boosting individual records (i.e. sponsored links) - Relevance ranking settings - Process pipeline editor for, e.g. adding new languages -...all without touching any xml. My question is, are there any solr features, plugins, modules, third party applications, or the like that will do this for us? Or will we have to develop all the above from scratch? thanks, Gareth
solr atomic updates stored=true, and copyField limitation
Hello, I'm using solr 4.4. I have a solr core with a schema defining a bunch of different fields, and among them, a date field: - date: indexed and stored // the date used at search time In practice it's a TrieDateField but I think that's not relevant for the concern. It also has a multi valued, not required, string field named tags which contains, well a list of tags, for some of the documents. So far, so good: everything works as expected and I'm glad. I'm able to perform partial (or atomic) updates on the tags field whenever it gets modified, and I love it. Now I have an new source that also pushes updates to the same solr core. Unfortunately, that source's incoming documents have their date in an other field, of the same type, named created_time instead of date. - created_time: stored only // some documents come in with this field set To be able to sort any document by time, I decided to ask solr to copy the contents of the field created_time to the field named date: copyField source=created_date dest=date / I updated my schema and reloaded my core and everything seemed fine. In fact, I did break something 8-) But I figured it out later… Quoting http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations : all fields in your SchemaXml must be configured as stored=true except for fields which are copyField/ destinations -- which must be configured as stored=false However at that time, I was not aware of the limitation and I was able to sort by time across all the documents in my solr core. I then decided to make sure that partial (or atomic) updates could still be performed, and then I was surprised: * documents from the more recent source (having both a date and a created_time field) are updated fine, the date field is kept (the copyField directive is replayed, I guess) * documents from the first source (having only the date field set) are however a little bit less lucky: the date gets lost in process (looks like the date field was overridden by the execution of the copyField directive with nothing in its source field) I then became aware of the caveats and limitations of atomic updates, but now I want to understand why ;-) So my question is: What differs concerning copyField behaviours between a normal (classic) and a partial (atomic) update? In practice, I don't understand why the targets of every copyField directives are *always* cleared during partial updates? Could the clearing of the destination field be performed if one of the source field of a copyField is present in the atomic update only? May be we didn't want to do that because that would have put some complexity where it should not be (updates must be fast), but that's just an idea. I have two ways to handle my problem: 1/ Create a stored=false search_date field and have two copyFields directives, one for the original date field an another one for the newer created_time field, and make the search application rely on the search_date field 2/ Since I have some control over the second source pushing documents, I can make sure that documents are pushed with the same date field, and work around the limitation by removing the copyField directive entirely. Since it simplifies my solr schema, I chose the option #2 Thank you very much for your attention Tanguy
Re: Migrating from Endeca
Take a look at LucidWorks Enterprise. It has a graphical UI. But if you must meet all of the listed requirements and Lucid doesn't meet all of them, then... you will have to develop everything on your own. Or, maybe Lucid might be interested in partnering with you to allow your to add extensions to their UI. If you really are committed to a deep replacement of Endeca's UI, then rolling your own is probably the way to go. Then the question is whether you should open source that UI. You can also consider extending the Solr Admin UI. It does not do most of your listed features, but having better integration with the Solr Admin UI is a good idea. -- Jack Krupansky -Original Message- From: Gareth Poulton Sent: Thursday, September 19, 2013 7:50 AM To: solr-user@lucene.apache.org Subject: Migrating from Endeca Hi, A customer wants us to move their entire enterprise platform - of which one of the many components is Oracle Endeca - to open source. However, customers being the way they are, they don't want to have to give up any of the features they currently use, the most prominent of which are user friendly web-based editors for non-technical people to be able to edit things like: - Schema - Dimensions (i.e. facets) - Dimension groups (not sure what these are) - Thesaurus - Stopwords - Report generation - Boosting individual records (i.e. sponsored links) - Relevance ranking settings - Process pipeline editor for, e.g. adding new languages -...all without touching any xml. My question is, are there any solr features, plugins, modules, third party applications, or the like that will do this for us? Or will we have to develop all the above from scratch? thanks, Gareth
How to highlight multiple words in document
Hi All, I want to highlight multiple words in document. e.g If I search for Rework AND Build then after opening the document returned by search result should highlight both words(Rework as well as Build) in that document. Currently I am adding word to highlight in highlight field. In this example I am setting highlight = Rework AND Build. But it is considering this as single word and highlighting this occurrence in that document. Thanks in advance. - Bramha -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-highlight-multiple-words-in-document-tp4091021.html Sent from the Solr - User mailing list archive at Nabble.com.
solr4.4 admin page show loading
hi, I have installed solr4.4 on tomcat7.0. the problem is I can't see the solr admin page, it's always show loading. I can't find any error in tomcat logs, and I can send search request, and get the result. what can I do? please help me, thank you very much. -- View this message in context: http://lucene.472066.n3.nabble.com/solr4-4-admin-page-show-loading-tp4091039.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrcloud - adding a node as a replica?
Thanks Furkan, That's exactly what I was looking for. On Wed, Sep 18, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.comwrote: Are yoh looking for that: http://lucene.472066.n3.nabble.com/SOLR-Cloud-Collection-Management-quesiotn-td4063305.html 18 Eylül 2013 Çarşamba tarihinde didier deshommes dfdes...@gmail.com adlı kullanıcı şöyle yazdı: Hi, How do I add a node as a replica to a solrcloud cluster? Here is my situation: some time ago, I created several collections with replicationFactor=2. Now I need to add a new replica. I thought just starting a new node and re-using the same zokeeper instance would make it automatically a replica, but that isn't the case. Do I need to delete and re-create my collections with the right replicationFactor (3 in this case) again? I am using solr 4.3.0. Thanks, didier
Re: Solrcloud - adding a node as a replica?
Do not hesitate to ask questions if you have any problems about it. 2013/9/19 didier deshommes dfdes...@gmail.com Thanks Furkan, That's exactly what I was looking for. On Wed, Sep 18, 2013 at 4:21 PM, Furkan KAMACI furkankam...@gmail.com wrote: Are yoh looking for that: http://lucene.472066.n3.nabble.com/SOLR-Cloud-Collection-Management-quesiotn-td4063305.html 18 Eylül 2013 Çarşamba tarihinde didier deshommes dfdes...@gmail.com adlı kullanıcı şöyle yazdı: Hi, How do I add a node as a replica to a solrcloud cluster? Here is my situation: some time ago, I created several collections with replicationFactor=2. Now I need to add a new replica. I thought just starting a new node and re-using the same zokeeper instance would make it automatically a replica, but that isn't the case. Do I need to delete and re-create my collections with the right replicationFactor (3 in this case) again? I am using solr 4.3.0. Thanks, didier
I can't open the admin page, it's always loading.
Hi, I followed the tutoral to download solr4.4 and unzip it, and then i started jetty. i can post data and search correctly, but when i try to open admin page, it's always show loading. and then i setup solr on tomcat 7.0, but it's the same. what's wrong? please help, thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud setup - any advice?
Apologies for the giant email. Hopefully it makes sense. We've been trying out SolrCloud to solve some scalability issues with our current setup and have run into problems. I'd like to describe our current setup, our queries and the sort of load we see and am hoping someone might be able to spot the massive flaw in the way I've been trying to set things up. We currently run Solr 4.0.0 in the old style Master/Slave replication. We have five slaves, each running Centos with 96GB of RAM, 24 cores and with 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but aren't slow either. Our GC parameters aren't particularly exciting, just -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11. Our index size ranges between 144GB and 200GB (when we optimise it back down, since we've had bad experiences with large cores). We've got just over 37M documents some are smallish but most range between 1000-6000 bytes. We regularly update documents so large portions of the index will be touched leading to a maxDocs value of around 43M. Query load ranges between 400req/s to 800req/s across the five slaves throughout the day, increasing and decreasing gradually over a period of hours, rather than bursting. Most of our documents have upwards of twenty fields. We use different fields to store territory variant (we have around 30 territories) values and also boost based on the values in some of these fields (integer ones). So an average query can do a range filter by two of the territory variant fields, filter by a non-territory variant field. Facet by a field or two (may be territory variant). Bring back the values of 60 fields. Boost query on field values of a non-territory variant field. Boost by values of two territory-variant fields. Dismax query on up to 20 fields (with boosts) and phrase boost on those fields too. They're pretty big queries. We don't do any index-time boosting. We try to keep things dynamic so we can alter our boosts on-the-fly. Another common query is to list documents with a given set of IDs and select documents with a common reference and order them by one of their fields. Auto-commit every 30 minutes. Replication polls every 30 minutes. Document cache: * initialSize - 32768 * size - 32768 Filter cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 Query result cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 After a replicated core has finished downloading (probably while it's warming) we see requests which usually take around 100ms taking over 5s. GC logs show concurrent mode failure. I was wondering whether anyone can help with sizing the boxes required to split this index down into shards for use with SolrCloud and roughly how much memory we should be assigning to the JVM. Everything I've read suggests that running with a 48GB heap is way too high but every attempt I've made to reduce the cache sizes seems to wind up causing out-of-memory problems. Even dropping all cache sizes by 50% and reducing the heap by 50% caused problems. I've already tried using SolrCloud 10 shards (around 3.7M documents per shard, each with one replica) and kept the cache sizes low: Document cache: * initialSize - 1024 * size - 1024 Filter cache: * autowarmCount - 128 * initialSize - 512 * size - 512 Query result cache: * autowarmCount - 32 * initialSize - 128 * size - 128 Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB memory) and four shards on two boxes and three on the rest I still see concurrent mode failure. This looks like it's causing ZooKeeper to mark the node as down and things begin to struggle. Is concurrent mode failure just something that will inevitably happen or is it avoidable by dropping the CMSInitiatingOccupancyFraction? If anyone has anything that might shove me in the right direction I'd be very grateful. I'm wondering whether our set-up will just never work and maybe we're expecting too much. Many thanks, Neil
Problem with stopword
Hello everybody, I have a problem with stopwords, I have an index with some stopwords and when I search by one of them only, solr dont select any document. ¿How can I fix this? I need all the documents. Example: *Stopwords*: hello, goodbye *Query*: http://localhost:8893/solr/select?q=hello *DebugQuery*: str name=parsedquery_toString/ *Total Results*: 0 I try do this with e dismax, but only works if I do a call to solr without q, no when q is empty by stopwords. http://localhost:8983/solr/select?q=defType=edismaxq.alt=*:* Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-stopword-tp4091064.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing several sub-fields in one solr field
Hello, I'd like to index into Solr (4.4.0) documents that I previously annotated with GATE (7.1). I use Behemoth to be able to run my GATE application on a corpus of documents on Hadoop, and then Behemoth allows me to directly send my annotated documents to solr. But my question is not about the Behemoth or Hadoop parts. The annotations produced by my GATE application usually have several features (for example, annotation type Person has the following features : Person.title, Person.firstName, Person.lastName, Person.gender). Each of my documents may contain more than one Person annotation, which is why I would like to index all the features for one annotation in one field in solr. How do I do that ? I thought I'd add the following lines in schema.xml : types ... fieldType name=person class=solr.StrField subSuffix=_person / ... /types ... fields ... field name=personinfo type=person indexed=true stored=true multiValued=true / dynamicField name=*_person type=text_general indexed=true stored=false / ... /fields But as soon as I start my solr instances and try to access solr from my browser, I get an HTTP ERROR 500 : Problem accessing /solr/. Reason: {msg=SolrCore 'collection1' is not available due to init failure: Plugin Initializing failure for [schema.xml] fieldType,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Plugin Initializing failure for [schema.xml] fieldType at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:287) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin Initializing failure for [schema.xml] fieldType at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:193) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at
decimal numeric queries are too slow in solr
Hi all, I am using solr 3.4 and index size is around 250gb. the issue that I am facing is the queries which have a decimal number in it is taking long time to execute. I am using dismax query handler with *qf* (15 fields) and *pf * (4 fields) and a boost function on time. Also I am using worddelimitorfilterfactory with following options (only mentioning options related to numbers) *generateNumberParts=1 * *preserveOriginal=1 * *catenateNumbers=1** ** * Example Query : solr 3.4 takes about 20 seconds solr 3 takes less than 1 second Couldn't understand the reason of so much difference. I can understand the internally 3.4 will translate into something like this (3.4 3) (4 34) because of worddelimitorfilterfactory, but still the difference is quite huge. On what factors query execution time depends? Any help which helps me in knowing the reason will be appreciated. Regards, Karan
Question on ICUFoldingFilterFactory
Hello, I was wondering if anybody who has experience with ICUFoldingFilterFactory can help out with the following issue. Thank you so much in advance. Raj -- Problem: When a document is created/updated, the value's casing is indexed properly. However, when it's queried, the value is returned in lowercase. Example: Document input: NBAE Document value: NBAE Query input: NBAE,nbae,Nbae...etc Query Output: nbae If I remove the ICUFoldingFilterFactory filter, the casing problem goes away, but I then searches for nbae (lowercase) or Nbae (mix case) return no values. Field Type: fieldType name=text_phrase class=solr.TextField positionIncrementGap=20 autoGeneratePhraseQueries=true analyzer filter class=solr.PatternReplaceFilterFactory pattern=\samp;\s replacement=\sand\s/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\p{Punct}\u00BF\u00A1] replaceWith= / tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=[\p{Cntrl}] replacement=/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / /analyzer /fieldType Let me know if that makes sense. I'm curious if the solr.ICUFoldingFilterFactory has additional attributes that I can use to control the casing behavior but retain it's other filtering properties (ASCIIFoldingFilter, and ICUNormalizer2Filter) Thanks!!!
RE: Solr 4.4.0: Plugin init failure for [schema.xml] analyzer/tokenizer
Ok, first off -- let's clear up some confusion... 1) except for needing to put hte logging jars in your servlet container's to level classpath, you should not be putting any jars that come from solr or lucene, or any jars for custom plugins you have written in tomcat/lib 2) you should never manually add/remove any jars or any kind from solr's WEB-INF/lib/ directory. 3) if you have custom plugins you want to load, you should create a *new* lib directory, and put your custom plugin jars 9and their dependencies) in that directory, and configure it (either with sharedLib in solr.xml, or with a lib directive in your solrconfig.xml file) As for the situation situation you find yourself in... here are the big, gigantic (the size of football fields even) red flags that jump out at me as being the sort of thing that could cause all sorts of classloader nightmares with your setup... : * Following are the jars placed in tomcat/lib dir: ... : lucene-core.jar : solr-core-1.3.0.jar : solr-dataimporthandler-4.4.0.jar : solr-dataimporthandler-extras-4.4.0.jar : solr-solrj-4.4.0.jar : lucene-analyzers-common-4.2.0.jar ... : Jars in tomcat/ webapps/ROOT/WEB-INF/lib/ ... : lucene-core-4.4.0.jar : nps-solr-plugin-1.0-SNAPSHOT.jar : solr-core-4.4.0.jar : solr-dataimporthandler-4.4.0.jar : lucene-analyzers-common-4.4.0.jar : solr-solrj-4.4.0.jar ... You clearly have two radically diff versions of solr-core and lucene-core in your classpath, which could easily explain the problems of ClassCastExceptions related to hte TokenizerFactory class -- because there are going ot be two radically differnet versions of that class in the classpath, and who knows which one java is trying to cast your custom impl to. seperate from that: even if the multiple solr-dataimporthandler, lucene-analyzers-common, solr-solrj jars in each of those directories are the exact same binary files, when loaded into the hierarchical classloaders of a servlet container, they produce differnt copies of the same java classes -- so you can again have classloader problems of some execution paths using a leaf classloader to access ClassX while another thread might use a parent classloader to access ClassX -- these differnet class instances will have different static fields, and instances of these classes will (probably) not be .equals(), etc -Hoss
Re: SolrCloud setup - any advice?
Hi Neil, Although you haven't mentioned it, just wanted to confirm - do you have soft commits enabled? Also what's the version of solr you are using for the solr cloud setup? 4.0.0 had lots of memory and zk related issues. What's the warmup time for your caches? Have you tried disabling the caches? Is this is static index or you documents are added continuously? The answers to these questions might help us pin point the issue... On Thursday, September 19, 2013, Neil Prosser wrote: Apologies for the giant email. Hopefully it makes sense. We've been trying out SolrCloud to solve some scalability issues with our current setup and have run into problems. I'd like to describe our current setup, our queries and the sort of load we see and am hoping someone might be able to spot the massive flaw in the way I've been trying to set things up. We currently run Solr 4.0.0 in the old style Master/Slave replication. We have five slaves, each running Centos with 96GB of RAM, 24 cores and with 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but aren't slow either. Our GC parameters aren't particularly exciting, just -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11. Our index size ranges between 144GB and 200GB (when we optimise it back down, since we've had bad experiences with large cores). We've got just over 37M documents some are smallish but most range between 1000-6000 bytes. We regularly update documents so large portions of the index will be touched leading to a maxDocs value of around 43M. Query load ranges between 400req/s to 800req/s across the five slaves throughout the day, increasing and decreasing gradually over a period of hours, rather than bursting. Most of our documents have upwards of twenty fields. We use different fields to store territory variant (we have around 30 territories) values and also boost based on the values in some of these fields (integer ones). So an average query can do a range filter by two of the territory variant fields, filter by a non-territory variant field. Facet by a field or two (may be territory variant). Bring back the values of 60 fields. Boost query on field values of a non-territory variant field. Boost by values of two territory-variant fields. Dismax query on up to 20 fields (with boosts) and phrase boost on those fields too. They're pretty big queries. We don't do any index-time boosting. We try to keep things dynamic so we can alter our boosts on-the-fly. Another common query is to list documents with a given set of IDs and select documents with a common reference and order them by one of their fields. Auto-commit every 30 minutes. Replication polls every 30 minutes. Document cache: * initialSize - 32768 * size - 32768 Filter cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 Query result cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 After a replicated core has finished downloading (probably while it's warming) we see requests which usually take around 100ms taking over 5s. GC logs show concurrent mode failure. I was wondering whether anyone can help with sizing the boxes required to split this index down into shards for use with SolrCloud and roughly how much memory we should be assigning to the JVM. Everything I've read suggests that running with a 48GB heap is way too high but every attempt I've made to reduce the cache sizes seems to wind up causing out-of-memory problems. Even dropping all cache sizes by 50% and reducing the heap by 50% caused problems. I've already tried using SolrCloud 10 shards (around 3.7M documents per shard, each with one replica) and kept the cache sizes low: Document cache: * initialSize - 1024 * size - 1024 Filter cache: * autowarmCount - 128 * initialSize - 512 * size - 512 Query result cache: * autowarmCount - 32 * initialSize - 128 * size - 128 Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB memory) and four shards on two boxes and three on the rest I still see concurrent mode failure. This looks like it's causing ZooKeeper to mark the node as down and things begin to struggle. Is concurrent mode failure just something that will inevitably happen or is it avoidable by dropping the CMSInitiatingOccupancyFraction? If anyone has anything that might shove me in the right direction I'd be very grateful. I'm wondering whether our set-up will just never work and maybe we're expecting too much. Many thanks, Neil
Will Solr work with a mapped drive?
Hi, I'm having this same problem as described here: http://stackoverflow.com/questions/17708163/absolute-paths-in-solr-xml-configuration-using-tomcat6-on-windows Any one knows if this is a limitation of Solr or not? I searched the web, nothing came up. Thanks!!! -- MJ
Re: Indexing several sub-fields in one solr field
There is no such fieldType attribute as subSuffix. Solr is just complaining about extraneous, junk attributes. Delete the crap. -- Jack Krupansky -Original Message- From: jimmy nguyen Sent: Thursday, September 19, 2013 12:43 PM To: solr-user@lucene.apache.org Subject: Indexing several sub-fields in one solr field Hello, I'd like to index into Solr (4.4.0) documents that I previously annotated with GATE (7.1). I use Behemoth to be able to run my GATE application on a corpus of documents on Hadoop, and then Behemoth allows me to directly send my annotated documents to solr. But my question is not about the Behemoth or Hadoop parts. The annotations produced by my GATE application usually have several features (for example, annotation type Person has the following features : Person.title, Person.firstName, Person.lastName, Person.gender). Each of my documents may contain more than one Person annotation, which is why I would like to index all the features for one annotation in one field in solr. How do I do that ? I thought I'd add the following lines in schema.xml : types ... fieldType name=person class=solr.StrField subSuffix=_person / ... /types ... fields ... field name=personinfo type=person indexed=true stored=true multiValued=true / dynamicField name=*_person type=text_general indexed=true stored=false / ... /fields But as soon as I start my solr instances and try to access solr from my browser, I get an HTTP ERROR 500 : Problem accessing /solr/. Reason: {msg=SolrCore 'collection1' is not available due to init failure: Plugin Initializing failure for [schema.xml] fieldType,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Plugin Initializing failure for [schema.xml] fieldType at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:860) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:287) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin Initializing failure for [schema.xml] fieldType at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:193) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:467) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:164) at org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55) at org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69) at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:268) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:655) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:364) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:356) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at
Re: Indexing several sub-fields in one solr field
Hello, thanks for the answer. Sorry, I actually meant attribute subFieldSuffix. So, in order to be able to index several features in one solr field, should I program a new Java class inheriting AbstractSubTypeFieldType ? Or is there another way to do it ? Thanks ! Jim On Thu, Sep 19, 2013 at 4:05 PM, Jack Krupansky j...@basetechnology.comwrote: There is no such fieldType attribute as subSuffix. Solr is just complaining about extraneous, junk attributes. Delete the crap. -- Jack Krupansky -Original Message- From: jimmy nguyen Sent: Thursday, September 19, 2013 12:43 PM To: solr-user@lucene.apache.org Subject: Indexing several sub-fields in one solr field Hello, I'd like to index into Solr (4.4.0) documents that I previously annotated with GATE (7.1). I use Behemoth to be able to run my GATE application on a corpus of documents on Hadoop, and then Behemoth allows me to directly send my annotated documents to solr. But my question is not about the Behemoth or Hadoop parts. The annotations produced by my GATE application usually have several features (for example, annotation type Person has the following features : Person.title, Person.firstName, Person.lastName, Person.gender). Each of my documents may contain more than one Person annotation, which is why I would like to index all the features for one annotation in one field in solr. How do I do that ? I thought I'd add the following lines in schema.xml : types ... fieldType name=person class=solr.StrField subSuffix=_person / ... /types ... fields ... field name=personinfo type=person indexed=true stored=true multiValued=true / dynamicField name=*_person type=text_general indexed=true stored=false / ... /fields But as soon as I start my solr instances and try to access solr from my browser, I get an HTTP ERROR 500 : Problem accessing /solr/. Reason: {msg=SolrCore 'collection1' is not available due to init failure: Plugin Initializing failure for [schema.xml] fieldType,trace=org.apache.**solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Plugin Initializing failure for [schema.xml] fieldType at org.apache.solr.core.**CoreContainer.getCore(**CoreContainer.java:860) at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** SolrDispatchFilter.java:287) at org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** SolrDispatchFilter.java:158) at org.eclipse.jetty.servlet.**ServletHandler$CachedChain.** doFilter(ServletHandler.java:**1419) at org.eclipse.jetty.servlet.**ServletHandler.doHandle(** ServletHandler.java:455) at org.eclipse.jetty.server.**handler.ScopedHandler.handle(** ScopedHandler.java:137) at org.eclipse.jetty.security.**SecurityHandler.handle(** SecurityHandler.java:557) at org.eclipse.jetty.server.**session.SessionHandler.** doHandle(SessionHandler.java:**231) at org.eclipse.jetty.server.**handler.ContextHandler.** doHandle(ContextHandler.java:**1075) at org.eclipse.jetty.servlet.**ServletHandler.doScope(** ServletHandler.java:384) at org.eclipse.jetty.server.**session.SessionHandler.** doScope(SessionHandler.java:**193) at org.eclipse.jetty.server.**handler.ContextHandler.** doScope(ContextHandler.java:**1009) at org.eclipse.jetty.server.**handler.ScopedHandler.handle(** ScopedHandler.java:135) at org.eclipse.jetty.server.**handler.**ContextHandlerCollection.**handle(** ContextHandlerCollection.java:**255) at org.eclipse.jetty.server.**handler.HandlerCollection.** handle(HandlerCollection.java:**154) at org.eclipse.jetty.server.**handler.HandlerWrapper.handle(** HandlerWrapper.java:116) at org.eclipse.jetty.server.**Server.handle(Server.java:368) at org.eclipse.jetty.server.**AbstractHttpConnection.**handleRequest(** AbstractHttpConnection.java:**489) at org.eclipse.jetty.server.**BlockingHttpConnection.**handleRequest(** BlockingHttpConnection.java:**53) at org.eclipse.jetty.server.**AbstractHttpConnection.**headerComplete(** AbstractHttpConnection.java:**942) at org.eclipse.jetty.server.**AbstractHttpConnection$** RequestHandler.headerComplete(**AbstractHttpConnection.java:**1004) at org.eclipse.jetty.http.**HttpParser.parseNext(**HttpParser.java:640) at org.eclipse.jetty.http.**HttpParser.parseAvailable(** HttpParser.java:235) at org.eclipse.jetty.server.**BlockingHttpConnection.handle(** BlockingHttpConnection.java:**72) at org.eclipse.jetty.server.bio.**SocketConnector$**ConnectorEndPoint.run(** SocketConnector.java:264) at org.eclipse.jetty.util.thread.**QueuedThreadPool.runJob(** QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.**QueuedThreadPool$3.run(** QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.**java:722) Caused by: org.apache.solr.common.**SolrException: Plugin Initializing failure for [schema.xml] fieldType at org.apache.solr.util.plugin.**AbstractPluginLoader.load(** AbstractPluginLoader.java:193) at
Re: SOLR-5250
: widget, BUT while researching for this message, I've learned about the : important difference between a text field and a string field in solr and : it appears that by default, the Drupal apachesolr module indexes text : fields as text and not strings. Now I just need to figure out how to : alter this process to suit my own needs. I'll update that d.org ticket : with my findings so hopefully that will prevent some other future, : confused developer from reaching out to the Apache Foundation : prematurely. John: glad to hear you wer able to track down the root cause of your problem. Thanks for closing the loop, and good luck on finding a solution that works nicely with the drupal bridge you are using. Please feel free to folow up on this list with any additional questions you have on the solr side of things. -Hoss
Re: I can't open the admin page, it's always loading.
Could you paste your jetty logs of when you try to open admin page. 19 Eylül 2013 Perşembe tarihinde Micheal Chao fisher030...@hotmail.com adlı kullanıcı şöyle yazdı: Hi, I followed the tutoral to download solr4.4 and unzip it, and then i started jetty. i can post data and search correctly, but when i try to open admin page, it's always show loading. and then i setup solr on tomcat 7.0, but it's the same. what's wrong? please help, thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with stopword
Firstly, you houl read here: https://cwiki.apache.org/confluence/display/solr/Running+Your+Analyzer Secondly, when you write a quey stop word are filtered from your query if you use stop word analyzer so there will not be anything else to search. 19 Eylül 2013 Perşembe tarihinde mpcmarcos mpcmar...@gmail.com adlı kullanıcı şöyle yazdı: Hello everybody, I have a problem with stopwords, I have an index with some stopwords and when I search by one of them only, solr dont select any document. ¿How can I fix this? I need all the documents. Example: *Stopwords*: hello, goodbye *Query*: http://localhost:8893/solr/select?q=hello *DebugQuery*: str name=parsedquery_toString/ *Total Results*: 0 I try do this with e dismax, but only works if I do a call to solr without q, no when q is empty by stopwords. http://localhost:8983/solr/select?q=defType=edismaxq.alt=*:* Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-stopword-tp4091064.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR-5250
Greetings, This is a follow up to https://issues.apache.org/jira/browse/SOLR-5250 where I reported a possible issue with sorting content which contains hyphens. Hoss Man suggested that I likely have a misconfiguration on my field settings and that I send a message to this list. I am using the Drupal apachesolr module version 1.4 (Where I actually also posted an issue at https://drupal.org/node/2092363) with a hosted Acquia solr index. So the schema settings will reflect what is packaged with the apachesolr module in drupal-3.0-rc2-solr3. I wasn't initially familiar with how Drupal field types are mapped to Solr field types, and the field in question is using the Text field widget, BUT while researching for this message, I've learned about the important difference between a text field and a string field in solr and it appears that by default, the Drupal apachesolr module indexes text fields as text and not strings. Now I just need to figure out how to alter this process to suit my own needs. I'll update that d.org ticket with my findings so hopefully that will prevent some other future, confused developer from reaching out to the Apache Foundation prematurely. -- John P. Brandenburg Developer jbrandenb...@forumone.com www.forumone.com 703-894-4362 Forum One Communications Communicate • Collaborate • Change the World
Re: I can't open the admin page, it's always loading.
: Hi, I followed the tutoral to download solr4.4 and unzip it, and then i : started jetty. i can post data and search correctly, but when i try to open : admin page, it's always show loading. the admin UI is entirely rendered by client side javascript in your browser -- so the most important question we need to know is what OS browser you are using to access the web UI. if your browser has a debug/error console available, it would also help to know if it mentions any errors/warnings. -Hoss
Re: SolrCloud setup - any advice?
Hi Neil, Consider using G1 instead. See http://blog.sematext.com/?s=g1 If that doesn't help, we can play with various JVM parameters. The latest version of SPM for Solr exposes information about sizes and utilization of JVM memory pools, which may help you understand which JVM params you need to change, how, and whether your changes are achieving the desired effect. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 19, 2013 11:21 AM, Neil Prosser neil.pros...@gmail.com wrote: Apologies for the giant email. Hopefully it makes sense. We've been trying out SolrCloud to solve some scalability issues with our current setup and have run into problems. I'd like to describe our current setup, our queries and the sort of load we see and am hoping someone might be able to spot the massive flaw in the way I've been trying to set things up. We currently run Solr 4.0.0 in the old style Master/Slave replication. We have five slaves, each running Centos with 96GB of RAM, 24 cores and with 48GB assigned to the JVM heap. Disks aren't crazy fast (i.e. not SSDs) but aren't slow either. Our GC parameters aren't particularly exciting, just -XX:+UseConcMarkSweepGC. Java version is 1.7.0_11. Our index size ranges between 144GB and 200GB (when we optimise it back down, since we've had bad experiences with large cores). We've got just over 37M documents some are smallish but most range between 1000-6000 bytes. We regularly update documents so large portions of the index will be touched leading to a maxDocs value of around 43M. Query load ranges between 400req/s to 800req/s across the five slaves throughout the day, increasing and decreasing gradually over a period of hours, rather than bursting. Most of our documents have upwards of twenty fields. We use different fields to store territory variant (we have around 30 territories) values and also boost based on the values in some of these fields (integer ones). So an average query can do a range filter by two of the territory variant fields, filter by a non-territory variant field. Facet by a field or two (may be territory variant). Bring back the values of 60 fields. Boost query on field values of a non-territory variant field. Boost by values of two territory-variant fields. Dismax query on up to 20 fields (with boosts) and phrase boost on those fields too. They're pretty big queries. We don't do any index-time boosting. We try to keep things dynamic so we can alter our boosts on-the-fly. Another common query is to list documents with a given set of IDs and select documents with a common reference and order them by one of their fields. Auto-commit every 30 minutes. Replication polls every 30 minutes. Document cache: * initialSize - 32768 * size - 32768 Filter cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 Query result cache: * autowarmCount - 128 * initialSize - 8192 * size - 8192 After a replicated core has finished downloading (probably while it's warming) we see requests which usually take around 100ms taking over 5s. GC logs show concurrent mode failure. I was wondering whether anyone can help with sizing the boxes required to split this index down into shards for use with SolrCloud and roughly how much memory we should be assigning to the JVM. Everything I've read suggests that running with a 48GB heap is way too high but every attempt I've made to reduce the cache sizes seems to wind up causing out-of-memory problems. Even dropping all cache sizes by 50% and reducing the heap by 50% caused problems. I've already tried using SolrCloud 10 shards (around 3.7M documents per shard, each with one replica) and kept the cache sizes low: Document cache: * initialSize - 1024 * size - 1024 Filter cache: * autowarmCount - 128 * initialSize - 512 * size - 512 Query result cache: * autowarmCount - 32 * initialSize - 128 * size - 128 Even when running on six machines in AWS with SSDs, 24GB heap (out of 60GB memory) and four shards on two boxes and three on the rest I still see concurrent mode failure. This looks like it's causing ZooKeeper to mark the node as down and things begin to struggle. Is concurrent mode failure just something that will inevitably happen or is it avoidable by dropping the CMSInitiatingOccupancyFraction? If anyone has anything that might shove me in the right direction I'd be very grateful. I'm wondering whether our set-up will just never work and maybe we're expecting too much. Many thanks, Neil
Re: Unknown attribute id in add:allowDups
: I'm working with the Pecl package, with Solr 4.3.1. I have a doc defined in my ... : $client = new SolrClient($options); : $doc = new SolrInputDocument(); : $doc-addField('id', 12345); : $doc-addField('description', 'This is the content of the doc'); : $updateResponse = $client-addDocument($doc); : : When I do this, the doc is not added to the index, and I get the following : error in the logs in admin : : Unknown attribute id in add:allowDups id is a red herring here -- it's not refering to your id field it's refering to the fact that an XML attribute node exists with an XML id that it doesn't recognize. or to put it another way: Pecl is generating an add xml element that contains an attribute like this: allowDups=false|true ...and solr doesn't know what to do with that. allowDups was an option that existed prior to 4.0, but is no longer supported (the overwrite attribute now takes it's place) So my best guess is that the Pecl code you are using was designed for 3.x, and doesn't entirely work correctly with 4.x. the warning you are getting isn't fatal or anything -- it's just letting you know that unknown attribute is being ignored -- but you may want to look into wether there is an updated Pecl library (for example: if you really wanted to use allowDups=true you should now be using overwrite=false and maybe the newer version of your client library will let you) I've updated some places in the ref guide and wiki were it wasn't obvious that allowDups is gone, gone, gone ... i'll also update that error message so it will be more clear starting in 4.6... https://issues.apache.org/jira/browse/SOLR-5257 -Hoss
Re: solrcloud shards backup/restoration
Hi, Sorry for the late followup on this. Let me put in more details here. *The problem:* Cannot successfully restore back the index backed up with '/replication?command=backup'. The backup was generated as * snapshot.mmdd* *My setup and steps:* * * 6 solrcloud instances 7 zookeepers instances Steps: 1. Take snapshot using *http://host1:8893/solr/replication?command=backup*, on one host only. move *snapshot.mmdd *to some reliable storage. 2. Stop all 6 solr instances, all 7 zk instances. 3. Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting the index data completely. 4. Delete zookeeper/data/version*/* on all zookeeper nodes. 5. Copy back index from backup to one of the nodes. \ cp *snapshot.mmdd/* *../collectionname/data/index/* 6. Restart all zk instances. Restart all solrcloud instances. *Outcome:* * * All solr instances are up. However, *num of docs = 0 *for all nodes. Looking at the node where the index was restored, there is a new index.yymmddhhmmss directory being created and index.properties pointing to it. That explains why no documents are reported. How do I have solrcloud pickup data from the index directory on a restart ? Thanks in advance, Aditya On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja aditya.sakh...@gmail.comwrote: Thanks Shalin and Mark for your responses. I am on the same page about the conventions for taking the backup. However, I am less sure about the restoration of the index. Lets say we have 3 shards across 3 solrcloud servers. 1. I am assuming we should take a backup from each of the shard leaders to get a complete collection. do you think that will get the complete index ( not worrying about what is not hard committed at the time of backup ). ? 2. How do we go about restoring the index in a fresh solrcloud cluster ? From the structure of the snapshot I took, I did not see any replication.properties or index.properties which I see normally on a healthy solrcloud cluster nodes. if I have the snapshot named snapshot.20130905 does the snapshot.20130905/* go into data/index ? Thanks Aditya On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote: Phone typing. The end should not say don't hard commit - it should say do a hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote: I don't know that it's too bad though - its always been the case that if you do a backup while indexing, it's just going to get up to the last hard commit. With SolrCloud that will still be the case. So just make sure you do a hard commit right before taking the backup - yes, it might miss a few docs in the tran log, but if you are taking a back up while indexing, you don't have great precision in any case - you will roughly get a snapshot for around that time - even without SolrCloud, if you are worried about precision and getting every update into that backup, you want to stop indexing and commit first. But if you just want a rough snapshot for around that time, in both cases you can still just don't hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar. -- Regards, -Aditya Sakhuja -- Regards, -Aditya Sakhuja
[ANN] Lux Release 0.10.5
I'm pleased to announce the release of the XML search engine Lux, version 0.10.5. There has been a lot of progress made since our last announced release, which was 0.9.1. Some highlights: The app server now provides full access to HTTP request data and control of HTTP responses. We've implemented the excellent EXPath specification for this (http://expath.org/spec/webapp http://expath.org/spec/webapp/editor) with only a few gaps (eg. no binary file upload yet). Range comparisons (like [@title 'median'] are now rewritten by the optimizer to use the lux:key() function when a suitable index is available, and comparisons involving lux:key() are optimized using the Lucene index. and there have been numerous performance optimizations and bug fixes, detailed at http://issues.luxdb.org/ and in the release notes here: http://luxdb.org/RELEASE-0.10.html. Lots more information, including downloads, documentation and setup instructions, is available at http://luxdb.org http://luxdb.org/, source code is at http://github.com/msokolov/lux, and there is an email list: lu...@luxdb.org, archived at https://groups.google.com/forum/?fromgroups#!topic/luxdb https://groups.google.com/forum/?fromgroups#%21topic/luxdb. Finally, I'll be presenting Lux at Lucene/Solr Revolution in Dublin Nov. 6-7, so if you're anywhere nearby, I encourage you to come, and I look forward to seeing you there! -Mike Sokolov soko...@falutin.net
Re: I can't open the admin page, it's always loading.
You may have some over-eager Ad Blockers! Check the network panel of Firebug/Chrome console/whatever you have. See if some resources are not loaded. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Sep 19, 2013 at 9:21 PM, Micheal Chao fisher030...@hotmail.comwrote: Hi, I followed the tutoral to download solr4.4 and unzip it, and then i started jetty. i can post data and search correctly, but when i try to open admin page, it's always show loading. and then i setup solr on tomcat 7.0, but it's the same. what's wrong? please help, thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Question on ICUFoldingFilterFactory
What do you mean by output? Are you looking at fields in returned documents? In which case you should see original stored field. Or are you - for example - looking at facet/group values which are using tokenized post-processed results? Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Sep 20, 2013 at 2:22 AM, Nemani, Raj raj.nem...@turner.com wrote: Hello, I was wondering if anybody who has experience with ICUFoldingFilterFactory can help out with the following issue. Thank you so much in advance. Raj -- Problem: When a document is created/updated, the value's casing is indexed properly. However, when it's queried, the value is returned in lowercase. Example: Document input: NBAE Document value: NBAE Query input: NBAE,nbae,Nbae...etc Query Output: nbae If I remove the ICUFoldingFilterFactory filter, the casing problem goes away, but I then searches for nbae (lowercase) or Nbae (mix case) return no values. Field Type: fieldType name=text_phrase class=solr.TextField positionIncrementGap=20 autoGeneratePhraseQueries=true analyzer filter class=solr.PatternReplaceFilterFactory pattern=\samp;\s replacement=\sand\s/ charFilter class=solr.PatternReplaceCharFilterFactory pattern=[\p{Punct}\u00BF\u00A1] replaceWith= / tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / filter class=solr.PatternReplaceFilterFactory pattern=[\p{Cntrl}] replacement=/ filter class=solr.ICUFoldingFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / /analyzer /fieldType Let me know if that makes sense. I'm curious if the solr.ICUFoldingFilterFactory has additional attributes that I can use to control the casing behavior but retain it's other filtering properties (ASCIIFoldingFilter, and ICUNormalizer2Filter) Thanks!!!
Re: Memory Using In Faceted Search (UnInvertedField's)
On 9/19/2013 3:14 AM, Anton M wrote: Shawn, I had swap file growing (up to 50-60%) and working while load tests ran. Did you configure 'swapiness' on your Linux box (set it to 0 earlier, maybe)? If not, my Windows OS could be cause of that difference. The vm.swappiness sysctl setting is 1. I have used 0 as well. I don't want it to start swapping unless it *REALLY* needs to. The default of 60 is pretty aggressive. I'm not sure if that's completely an issue about shareable memory or some missing JVM configurations (I don't have anything special except -Xmx, -Xms and -XX:MaxPermSize=512M) or some Solr memory leak. I'd appreciate any thoughts on that. As I said before, I think that the memory reported as shareable is not actually allocated. It probably should be listed under virtual memory. Our app rarely does facets, and it typically sorts on one field, so I have absolutely no idea what's being measured in the 11g of shared memory for the solr process. I was present for a conversation between Lucene committers on IRC where they seemed to be discussing this issue, and it sounded like it is a side effect of using MMap in a particular way. It sounded like they didn't want to change the way its used, because it was the correct way of using it. The conversation went way over my head for the most part. Thanks, Shawn
Re: I can't open the admin page, it's always loading.
I'm using windows7 and IE8, i debuged the script, it showed error: var d3_formatPrefixes = [y,z,a,f,p,n,μ,m,,k,M,G,T,P,E,Z,Y].map(d3_formatPrefix); can't find object's method. so i changed my browse, and it works. thanks a lot. is this a bug of solr? -- View this message in context: http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051p4091157.html Sent from the Solr - User mailing list archive at Nabble.com.
JVM Crash using solr 4.4 on Centos
I have solr 4.4 running on tomcat 7 on my local development environment which is ubuntu based and it works fine (Querying, Posting Documents, Data Import etc.) I am trying to move into a staging environment which is Centos based (still using tomcat 7 and solr 4.4 however when attempting to post documents and do a data import from mysql through jdbc, after a few hundred documents, the tomcat server crashes and it logs: # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fb4d8fe5e85, pid=10620, tid=140414656674112 # # JRE version: 7.0-b147 # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode linux-amd64 compressed oops) # Problematic frame: # J org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z I'm using Sun Java JDK 1.7.0 Anyone got any ideas I can pursue to resolve this?
Re: I can't open the admin page, it's always loading.
I think IE8 itself might the bug! :-) Many popular libraries have dropped =IE7 support completely and are phasing out IE8 as well. Looks like D3 - a visualization library used for some of Admin stuff - is doing that as well. Though I thought Admin javascript loading was more robust than that. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Sep 20, 2013 at 8:48 AM, Micheal Chao fisher030...@hotmail.comwrote: I'm using windows7 and IE8, i debuged the script, it showed error: var d3_formatPrefixes = [y,z,a,f,p,n,μ,m,,k,M,G,T,P,E,Z,Y].map(d3_formatPrefix); can't find object's method. so i changed my browse, and it works. thanks a lot. is this a bug of solr? -- View this message in context: http://lucene.472066.n3.nabble.com/I-can-t-open-the-admin-page-it-s-always-loading-tp4091051p4091157.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: JVM Crash using solr 4.4 on Centos
This is a known bug in that JDK version. Upgrade to a newer version of JDK 7 (any build within the last two years or so should be fine). If that's not possible for you, you can add -XX:-UseLoopPredicate as a command line option to java to work around this. -Michael -Original Message- From: Oak McIlwain [mailto:oak.mcilw...@gmail.com] Sent: Thursday, September 19, 2013 10:10 PM To: solr-user@lucene.apache.org Subject: JVM Crash using solr 4.4 on Centos I have solr 4.4 running on tomcat 7 on my local development environment which is ubuntu based and it works fine (Querying, Posting Documents, Data Import etc.) I am trying to move into a staging environment which is Centos based (still using tomcat 7 and solr 4.4 however when attempting to post documents and do a data import from mysql through jdbc, after a few hundred documents, the tomcat server crashes and it logs: # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x7fb4d8fe5e85, pid=10620, tid=140414656674112 # # JRE version: 7.0-b147 # Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode linux-amd64 compressed oops) # Problematic frame: # J org.apache.lucene.analysis.en.PorterStemFilter.incrementToken()Z I'm using Sun Java JDK 1.7.0 Anyone got any ideas I can pursue to resolve this?
Re: solrcloud shards backup/restoration
How does one recover from an index corruption ? That's what I am trying to eventually tackle here. Thanks Aditya On Thursday, September 19, 2013, Aditya Sakhuja wrote: Hi, Sorry for the late followup on this. Let me put in more details here. *The problem:* Cannot successfully restore back the index backed up with '/replication?command=backup'. The backup was generated as * snapshot.mmdd* *My setup and steps:* * * 6 solrcloud instances 7 zookeepers instances Steps: 1. Take snapshot using *http://host1:8893/solr/replication?command=backup *, on one host only. move *snapshot.mmdd *to some reliable storage. 2. Stop all 6 solr instances, all 7 zk instances. 3. Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting the index data completely. 4. Delete zookeeper/data/version*/* on all zookeeper nodes. 5. Copy back index from backup to one of the nodes. \ cp *snapshot.mmdd/* *../collectionname/data/index/* 6. Restart all zk instances. Restart all solrcloud instances. *Outcome:* * * All solr instances are up. However, *num of docs = 0 *for all nodes. Looking at the node where the index was restored, there is a new index.yymmddhhmmss directory being created and index.properties pointing to it. That explains why no documents are reported. How do I have solrcloud pickup data from the index directory on a restart ? Thanks in advance, Aditya On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja aditya.sakh...@gmail.comwrote: Thanks Shalin and Mark for your responses. I am on the same page about the conventions for taking the backup. However, I am less sure about the restoration of the index. Lets say we have 3 shards across 3 solrcloud servers. 1. I am assuming we should take a backup from each of the shard leaders to get a complete collection. do you think that will get the complete index ( not worrying about what is not hard committed at the time of backup ). ? 2. How do we go about restoring the index in a fresh solrcloud cluster ? From the structure of the snapshot I took, I did not see any replication.properties or index.properties which I see normally on a healthy solrcloud cluster nodes. if I have the snapshot named snapshot.20130905 does the snapshot.20130905/* go into data/index ? Thanks Aditya On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote: Phone typing. The end should not say don't hard commit - it should say do a hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote: I don't know that it's too bad though - its always been the case that if you do a backup while indexing, it's just going to get up to the last hard commit. With SolrCloud that will still be the case. So just make sure you do a hard commit right before taking the backup - yes, it might miss a few docs in the tran log, but if you are taking a back up while indexing, you don't have great precision in any case - you will roughly get a snapshot for around that time - even without SolrCloud, if you are worried about precision and getting every update into that backup, you want to stop indexing and commit first. But if you just want a rough snapshot for around that time, in both cases you can still just don't hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar. -- Regards, -Aditya Sakhuja -- Regards, -Aditya Sakhuja -- Regards, -Aditya Sakhuja
Re: Migrating from Endeca
I think Hue ( http://cloudera.github.io/hue/ ) which Cloudera uses for Solr search among other things has some of UI customization. And it is open-source, so would make for much better base. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Sep 19, 2013 at 8:21 PM, Jack Krupansky j...@basetechnology.comwrote: Take a look at LucidWorks Enterprise. It has a graphical UI. But if you must meet all of the listed requirements and Lucid doesn't meet all of them, then... you will have to develop everything on your own. Or, maybe Lucid might be interested in partnering with you to allow your to add extensions to their UI. If you really are committed to a deep replacement of Endeca's UI, then rolling your own is probably the way to go. Then the question is whether you should open source that UI. You can also consider extending the Solr Admin UI. It does not do most of your listed features, but having better integration with the Solr Admin UI is a good idea. -- Jack Krupansky -Original Message- From: Gareth Poulton Sent: Thursday, September 19, 2013 7:50 AM To: solr-user@lucene.apache.org Subject: Migrating from Endeca Hi, A customer wants us to move their entire enterprise platform - of which one of the many components is Oracle Endeca - to open source. However, customers being the way they are, they don't want to have to give up any of the features they currently use, the most prominent of which are user friendly web-based editors for non-technical people to be able to edit things like: - Schema - Dimensions (i.e. facets) - Dimension groups (not sure what these are) - Thesaurus - Stopwords - Report generation - Boosting individual records (i.e. sponsored links) - Relevance ranking settings - Process pipeline editor for, e.g. adding new languages -...all without touching any xml. My question is, are there any solr features, plugins, modules, third party applications, or the like that will do this for us? Or will we have to develop all the above from scratch? thanks, Gareth