Can we manipulate termfreq to count as 1 for multiple matches?
Hi All I am wondering if there is a way to alter term frequency of a certain field as 1, even if there are multiple matches in that document? Use Case is: Let's say that I have a document with 2 fields - Name and - Description And, there is a document with data like this Document_1 Name = Blue Jeans Description = This jeans is very soft. Jeans is pretty nice. Now, If I Search for Jeans then Jeans is found in 2 places in Description field. Term Frequency for Description is 2 I want Solr to count term frequency for Description as 1 even if Jeans is found multiple times in this field. For all other fields, i do want to get the term frequency, as it is. Is this doable in Solr with any of the functions? Any inputs are welcome. Thanks Saroj
Re: can we configure spellcheck to be invoked after request processing?
James, You are right. I was setting up spell checker incorrectly. It works correctly as you described. Spell checker is invoked after the query component and it does not stop Solr from executing query. Thanks for correcting me. Saroj On Fri, Mar 1, 2013 at 7:30 AM, Dyer, James james.d...@ingramcontent.comwrote: I'm a little confused here because if you are searching q=jeap OR denim , then you should be getting both documents back. Having spellcheck configured does not affect your search results at all. Having it in your request will sometime result in spelling suggestions, usually if one or more terms you queried is not in the index. But if all of your query terms are optional then you need only have 1 term match anything to get results. You should get the same results regardless of whether or not you have spellcheck in the request. While spellcheck does not affect your query results, the results do affect spellcheck. This is why you should put spellcheck in the last-components section of your request handler configuration. This ensures that the query is run before spellcheck. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: roz dev [mailto:rozde...@gmail.com] Sent: Thursday, February 28, 2013 6:33 PM To: solr-user@lucene.apache.org Subject: can we configure spellcheck to be invoked after request processing? Hi All, I may be asking a stupid question but please bear with me. Is it possible to configure Spell check to be invoked after Solr has processed the original query? My use case is : I am using DirectSpellChecker and have a document which has Denim as a term and there is another document which has Jeap. I am issuing a Search as Jean or Denim I am finding that this Solr query is giving me ZERO results and suggesting Jeap as an alternative. I want Solr to try to run the query for Jean or Denim and if there are no results found then only suggest Jeap as an alternative Is this doable in Solr? Any suggestions. -Saroj
Re: How to re-read the config files in Solr, on a commit
Erick We have a requirement where seach admin can add or remove some synonyms and would want these changes to be reflected in search thereafter. yes, we looked at reload command and it seems to be suitable for that purpose. We have a master and slave setup so it should be OK to issue reload command on master. I expect that slaves will pull the latest config files. Is reload operation very costly, in terms of time and cpu? We have a multicore setup and would need to issue reload on multiple cores. Thanks Saroj On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.comwrote: Not that I know of. This would be extremely expensive in the usual case. Loading up configs, reconfiguring all the handlers etc. would add a huge amount of overhead to the commit operation, which is heavy enough as it is. What's the use-case here? Changing your configs really often and reading them on commit sounds like a way to make for a very confusing application! But if you really need to re-read all this info on a running system, consider the core admin RELOAD command. Best Erick On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote: Hi All I am keen to find out if Solr exposes any event listener or other hooks which can be used to re-read configuration files. I know that we have firstSearcher event but I am not sure if it causes request handlers to reload themselves and read the conf files again. For example, if I change the synonym file and solr gets a commit, will it re-initialize request handlers and re-read the conf files. Or, are there some events which can be listened to? Any inputs are welcome. Thanks Saroj
Re: How to re-read the config files in Solr, on a commit
Thanks Otis for pointing this out. We may end up using search time synonyms for single word synonym and use index time synonym for multi world synonyms. -Saroj On Tue, Nov 6, 2012 at 8:09 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Note about modifying synonyms - you need to reindex, really, if using index-time synonyms. And if you're using search-time synonyms you have multi-word synonym issue described on the Wiki. Otis -- Performance Monitoring - http://sematext.com/spm On Nov 6, 2012 11:02 PM, roz dev rozde...@gmail.com wrote: Erick We have a requirement where seach admin can add or remove some synonyms and would want these changes to be reflected in search thereafter. yes, we looked at reload command and it seems to be suitable for that purpose. We have a master and slave setup so it should be OK to issue reload command on master. I expect that slaves will pull the latest config files. Is reload operation very costly, in terms of time and cpu? We have a multicore setup and would need to issue reload on multiple cores. Thanks Saroj On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com wrote: Not that I know of. This would be extremely expensive in the usual case. Loading up configs, reconfiguring all the handlers etc. would add a huge amount of overhead to the commit operation, which is heavy enough as it is. What's the use-case here? Changing your configs really often and reading them on commit sounds like a way to make for a very confusing application! But if you really need to re-read all this info on a running system, consider the core admin RELOAD command. Best Erick On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote: Hi All I am keen to find out if Solr exposes any event listener or other hooks which can be used to re-read configuration files. I know that we have firstSearcher event but I am not sure if it causes request handlers to reload themselves and read the conf files again. For example, if I change the synonym file and solr gets a commit, will it re-initialize request handlers and re-read the conf files. Or, are there some events which can be listened to? Any inputs are welcome. Thanks Saroj
Re: How to change the boost of fields in edismx at runtime
Thanks Hoss. Yes, that approach would work as I can change the query. Is there a way to extend the Edismax Handler to read a config file at startup and then use some events like commit to instruct edismax handler to re-read the config file. That way, I can ensure that my boost params are just on on Solr Servers' config files and If I need to change, I would just change the file and wait for commit to re-read the file. Any inputs? -Saroj On Thu, Nov 1, 2012 at 2:50 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Then, If I find that results are not of my liking then I would like to : change the boost as following : : - Title - boosted to 2 : -Keyword - boosted to 10 : : Is there any way to change this boost, at run-time, without having to : restart solr with new boosts in edismax? edismax field boosts (specified in the qf and pf params) can always be specified at runtime -- first and foremost they are query params. when you put then in your solrconfig.xml file those are just as defaults (or invariants, or appends) of those query params. -Hoss
Re: SolrJ - IOException
I have seen this happening We retry and that works. Is your solr server stalled? On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi balaji.gan...@apollogrp.eduwrote: Hi, I am encountering this error randomly (under load) when posting to Solr using SolrJ. Has anyone encountered a similar error? org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8080/solr/profile at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at Thanks, Balaji -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026.html Sent from the Solr - User mailing list archive at Nabble.com.
IndexDocValues in Solr
Changing the Subject Line to make it easier to understand the topic of the message is there any plan to expose IndexDocValues as part of Solr 4? Any thoughts? -Saroj On Thu, Aug 2, 2012 at 5:10 PM, roz dev rozde...@gmail.com wrote: As we all know, FIeldCache can be costly if we have lots of documents and lots of fields to sort on. I see that IndexDocValues are better at sorting and faceting, w.r.t Memory usage Is there any plan to use IndexDocValues in SOLR for doing sorting and faceting? Will SOLR 4 or 5 have indexDocValues? Is there an easy way to use IndexDocValues in Solr even though it is not implemented yet? -Saroj
Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
Thanks Robert for these inputs. Since we do not really Snowball analyzer for this field, we would not use it for now. If this still does not address our issue, we would tweak thread pool as per eks dev suggestion - I am bit hesitant to do this change yet as we would be reducing thread pool which can adversely impact our throughput If Snowball Filter is being optimized for Solr 4 beta then it would be great for us. If you have already filed a JIRA for this then please let me know and I would like to follow it Thanks again Saroj On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm...@gmail.com wrote: On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote: Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. Hi: I don't claim to know anything about how tomcat manages threads, but really you shouldnt have all these objects. In general snowball stemmers should be reused per-thread-per-field. But if you have a lot of fields*threads, especially if there really is high thread churn on tomcat, then this could be bad with snowball: see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841 I think it would be useful to see if you can tune tomcat's threadpool as he describes. separately: Snowball stemmers are currently really ram-expensive for stupid reasons. each one creates a ton of Among objects, e.g. an EnglishStemmer today is about 8KB. I'll regenerate these and open a JIRA issue: as the snowball code generator in their svn was improved recently and each one now takes about 64 bytes instead (the Among's are static and reused). Still this wont really solve your problem, because the analysis chain could have other heavy parts in initialization, but it seems good to fix. As a workaround until then you can also just use the good old PorterStemmer (PorterStemFilterFactory in solr). Its not exactly the same as using Snowball(English) but its pretty close and also much faster. -- lucidimagination.com
Memory leak?? with CloseableThreadLocal with use of Snowball Filter
Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. I took a heap dump and found that most of the memory is consumed by CloseableThreadLocal which is holding a WeakHashMap of Threads and its state. Most of the old gen is full with ThreadLocal eating up 3GB of heap and heap dump shows that all such entries are using Snowball Filter. I looked into LUCENE-3841 and verified that my version of SOLR 4 has that code. So, I am wondering the reason for this memory leak - is it due to some other bug with Solr/Lucene? Here is a brief snapshot of HeapDump showing the problem Class Name | Shallow Heap | Retained Heap - *org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @ 0x300c3eb28 | 24 | 3,885,213,072* |- class class org.apache.solr.schema.IndexSchema$SolrIndexAnalyzer @ 0x2f9753340 |0 | 0 |- this$0 org.apache.solr.schema.IndexSchema @ 0x300bf4048 | 96 | 276,704 *|- reuseStrategy org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @ 0x300c3eb40 | 16 | 3,885,208,728* | |- class class org.apache.lucene.analysis.Analyzer$PerFieldReuseStrategy @ 0x2f98368c0 |0 | 0 | |- storedValue org.apache.lucene.util.CloseableThreadLocal @ 0x300c3eb50 | 24 | 3,885,208,712 | | |- class class org.apache.lucene.util.CloseableThreadLocal @ 0x2f9788918 |8 | 8 | | |- t java.lang.ThreadLocal @ 0x300c3eb68 | 16 |16 | | | '- class class java.lang.ThreadLocal @ 0x2f80f0868 System Class|8 |24 *| | |- hardRefs java.util.WeakHashMap @ 0x300c3eb78 | 48 | 3,885,208,656* | | | |- class class java.util.WeakHashMap @ 0x2f8476c00 System Class| 16 |16 | | | |- table java.util.WeakHashMap$Entry[16] @ 0x300c3eba8 | 80 | 2,200,016,960 | | | | |- class class java.util.WeakHashMap$Entry[] @ 0x2f84789e8 |0 | 0 | | | | |-* [7] java.util.WeakHashMap$Entry @ 0x306a24950 | 40 | 318,502,920* | | | | | |- class class java.util.WeakHashMap$Entry @ 0x2f84786f8 System Class|0 | 0 | | | | | |- queue java.lang.ref.ReferenceQueue @ 0x300c3ebf8 | 32 |48 | | | | | |- referent java.lang.Thread @ 0x30678c2c0 web-23 | 112 | 160 | | | | | |- value java.util.HashMap @ 0x30678cbb0 | 48 | 318,502,880 | | | | | | |- class class java.util.HashMap @ 0x2f80b9428 System Class | 24 |24 *| | | | | | |- table java.util.HashMap$Entry[32768] @ 0x3c07c6f58 | 131,088 | 318,502,832* | | | | | | | |- class class java.util.HashMap$Entry[] @ 0x2f80bd9c8 |0 | 0 | | | | | | | |- [10457] java.util.HashMap$Entry @ 0x30678cbe0 | 32 |40,864 | | | | | | | | |- class class java.util.HashMap$Entry @ 0x2f80bd400 System Class |0 | 0 | | | | | | | | |- key java.lang.String @ 0x30678cc00 prod_desc_keywd_en_CA | 32 |96 | | | | | | | | |- value org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @ 0x30678cc60 | 24 |20,344 | | | | | | | | |- next java.util.HashMap$Entry @ 0x39a2c9100 | 32 |20,392 | | | | | | | | | |- class class java.util.HashMap$Entry @ 0x2f80bd400 System Class|0 | 0 | | | | | | | | | |- key java.lang.String @ 0x39a2c9120 3637994_fr_CA_cat_name_keywd| 32 | 104 | | | | | | | | | |- value org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @ 0x39a2c9188 | 24 |20,256 | | | | | | | | | | |- class class org.apache.solr.analysis.TokenizerChain$SolrTokenStreamComponents @ 0x2f97a69a0|0 | 0 | | | | | | | | | |
Re: solr/tomcat stops responding
You are referring to a very old thread Did you take any heap dump and thread dumo? They can help you get more insight. -Saroj On Tue, Jul 31, 2012 at 9:04 AM, Suneel pandey.sun...@gmail.com wrote: Hello Kevin, I am also facing same problem After few hours or few day my solr server getting crash. I try to download following patch but its not accessible now. i am using 3.1 version of solr. http://people.apache.org/~yonik/solr/current/solr.war - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/solr-tomcat-stops-responding-tp474577p3998435.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: too many instances of org.tartarus.snowball.Among in the heap
:132) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None Agent Heartbeat - Thread t@5 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:670) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None Remove Metric Data Watch Heartbeat Heartbeat - Thread t@7 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:670) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None Configuration Watch Heartbeat Heartbeat - Thread t@6 java.lang.Thread.State: TIMED_WAITING at java.lang.Thread.sleep(Native Method) at com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:670) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None Signal Dispatcher - Thread t@4 java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None Finalizer - Thread t@3 java.lang.Thread.State: WAITING at java.lang.Object.wait(Native Method) - waiting on 48c6254f (a java.lang.ref.ReferenceQueue$Lock) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) Locked ownable synchronizers: - None Reference Handler - Thread t@2 java.lang.Thread.State: WAITING at java.lang.Object.wait(Native Method) - waiting on 48bb8adc (a java.lang.ref.Reference$Lock) at java.lang.Object.wait(Object.java:485) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) Locked ownable synchronizers: - None main - Thread t@1 java.lang.Thread.State: RUNNABLE at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390) - locked 11dacd96 (a java.net.SocksSocketImpl) at java.net.ServerSocket.implAccept(ServerSocket.java:462) at com.wily.introscope.agent.probe.net.ManagedServerSocket.com_wily_accept14(ManagedServerSocket.java:362) at com.wily.introscope.agent.probe.net.ManagedServerSocket.accept(ManagedServerSocket.java:267) at org.apache.catalina.core.StandardServer.await(StandardServer.java:431) at org.apache.catalina.startup.Catalina.await(Catalina.java:676) at org.apache.catalina.startup.Catalina.start(Catalina.java:628) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) Locked ownable synchronizers: - None On Fri, Jul 27, 2012 at 5:19 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Try taking a couple of thread dumps and see where in the stack the snowball classes show up. That might give you a clue. Did you customize the parameters to the stemmer? If so, maybe it has problems with the file you gave it. Just some generic thoughts that might help. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Fri, Jul 27, 2012 at 3:53 AM, roz dev rozde...@gmail.com wrote: Hi All I am trying to find out the reason for very high memory use and ran JMAP -hist It is showing that i have too many instances of org.tartarus.snowball.Among Any ideas what is this for and why am I getting so many of them num #instances#bytes Class description -- *1: 467281101869124400 org.tartarus.snowball.Among * 2: 5244210 1840458960 byte[]
Re: too many instances of org.tartarus.snowball.Among in the heap
is it some kind of memory leak with Lucene's use of Snowball Stemmer? I tried to google for Snowball Stemmer but could not find any recent info about memory leak this old link does indicate some memory leak but it is from 2004 http://snowball.tartarus.org/archives/snowball-discuss/0631.html Any inputs are welcome -Saroj On Mon, Jul 30, 2012 at 4:39 PM, roz dev rozde...@gmail.com wrote: I did take couple of thread dumps and they seem to be fine Heap dump is huge - close to 15GB I am having hard time to analyze that heap dump 2012-07-30 16:07:32 Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.0-b09 mixed mode): RMI TCP Connection(33)-10.8.21.124 - Thread t@190 java.lang.Thread.State: RUNNABLE at sun.management.ThreadImpl.dumpThreads0(Native Method) at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.sun.jmx.mbeanserver.ConvertingMethod.invokeWithOpenReturn(ConvertingMethod.java:167) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:96) at com.sun.jmx.mbeanserver.MXBeanIntrospector.invokeM2(MXBeanIntrospector.java:33) at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:208) at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:120) at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:262) at javax.management.StandardMBean.invoke(StandardMBean.java:391) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:836) at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:761) at javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1427) at javax.management.remote.rmi.RMIConnectionImpl.access$200(RMIConnectionImpl.java:72) at javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1265) at javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1360) at javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:788) at sun.reflect.GeneratedMethodAccessor50.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:305) at sun.rmi.transport.Transport$1.run(Transport.java:159) at java.security.AccessController.doPrivileged(Native Method) at sun.rmi.transport.Transport.serviceCall(Transport.java:155) at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:535) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:790) at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:649) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - locked 49cbecf2 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) JMX server connection timeout 189 - Thread t@189 java.lang.Thread.State: TIMED_WAITING at java.lang.Object.wait(Native Method) - waiting on b75fa27 (a [I) at com.sun.jmx.remote.internal.ServerCommunicatorAdmin$Timeout.run(ServerCommunicatorAdmin.java:150) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None web-77 - Thread t@186 java.lang.Thread.State: WAITING at sun.misc.Unsafe.park(Native Method) - parking to wait for 5ab03cb6 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:662) Locked ownable synchronizers: - None web-76 - Thread t@185 java.lang.Thread.State: WAITING at sun.misc.Unsafe.park(Native Method) - parking to wait for 5ab03cb6 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158
Re: leaks in solr
in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog goks...@gmail.com wrote: What does the Statistics page in the Solr admin say? There might be several searchers open: org.apache.solr.search.SolrIndexSearcher Each searcher holds open different generations of the index. If obsolete index files are held open, it may be old searchers. How big are the caches? How long does it take to autowarm them? On Thu, Jul 26, 2012 at 6:15 PM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Mark, We use solr 3.6.0 on freebsd 9. Over a period of time, it accumulates lots of space! On Thu, Jul 26, 2012 at 8:47 PM, roz dev rozde...@gmail.com wrote: Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} -- But, We are going to use 4.0 Alpha and see if that helps. -Saroj On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller markrmil...@gmail.com wrote: I'd take a look at this issue: https://issues.apache.org/jira/browse/SOLR-3392 Fixed late April. On Jul 26, 2012, at 7:41 PM, roz dev rozde...@gmail.com wrote: it was from 4/11/12 -Saroj On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller markrmil...@gmail.com wrote: On Jul 26, 2012, at 3:18 PM, roz dev rozde...@gmail.com wrote: Hi Guys I am also seeing this problem. I am using SOLR 4 from Trunk and seeing this issue repeat every day. Any inputs about how to resolve this would be great -Saroj Trunk from what date? - Mark - Mark Miller lucidimagination.com -- Lance Norskog goks...@gmail.com
too many instances of org.tartarus.snowball.Among in the heap
Hi All I am trying to find out the reason for very high memory use and ran JMAP -hist It is showing that i have too many instances of org.tartarus.snowball.Among Any ideas what is this for and why am I getting so many of them num #instances#bytes Class description -- *1: 467281101869124400 org.tartarus.snowball.Among * 2: 5244210 1840458960 byte[] 3: 526519495969839368 char[] 4: 10008928864769280 int[] 5: 10250527410021080 java.util.LinkedHashMap$Entry 6: 4672811 268474232 org.tartarus.snowball.Among[] *7: 8072312 258313984 java.util.HashMap$Entry* 8: 466514 246319392 org.apache.lucene.util.fst.FST$Arc[] 9: 1828542 237600432 java.util.HashMap$Entry[] 10: 3834312 153372480 java.util.TreeMap$Entry 11: 2684700 128865600 org.apache.lucene.util.fst.Builder$UnCompiledNode 12: 4712425 113098200 org.apache.lucene.util.BytesRef 13: 3484836 111514752 java.lang.String 14: 2636045 105441800 org.apache.lucene.index.FieldInfo 15: 1813561 101559416 java.util.LinkedHashMap 16: 6291619 100665904 java.lang.Integer 17: 2684700 85910400 org.apache.lucene.util.fst.Builder$Arc 18: 956998 84215824 org.apache.lucene.index.TermsHashPerField 19: 2892957 69430968 org.apache.lucene.util.AttributeSource$State 20: 2684700 64432800 org.apache.lucene.util.fst.Builder$Arc[] 21: 685595 60332360org.apache.lucene.util.fst.FST 22: 933451 59210944java.lang.Object[] 23: 957043 53594408org.apache.lucene.util.BytesRefHash 24: 591463 42585336 org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader 25: 424801 40780896 org.tartarus.snowball.ext.EnglishStemmer 26: 424801 40780896 org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter 27: 1549670 37192080org.apache.lucene.index.Term 28: 849602 33984080 org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter$WordDelimiterConcatenation 29: 424801 27187264 org.apache.lucene.analysis.core.WhitespaceTokenizer 30: 478499 26795944 org.apache.lucene.index.FreqProxTermsWriterPerField 31: 535521 25705008 org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray 32: 219081 24537072 org.apache.lucene.codecs.BlockTreeTermsWriter$TermsWriter 33: 478499 22967952 org.apache.lucene.index.FieldInvertState 34: 956998 22967952 org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray 35: 478499 22967952 org.apache.lucene.index.TermVectorsConsumerPerField 36: 478499 22967952 org.apache.lucene.index.NormsConsumerPerField 37: 316582 22793904 org.apache.lucene.store.MMapDirectory$MMapIndexInput 38: 906708 21760992 org.apache.lucene.util.AttributeSource$State[] 39: 906708 21760992 org.apache.lucene.analysis.tokenattributes.OffsetAttributeImpl 40: 883588 21206112java.util.ArrayList 41: 438192 21033216 org.apache.lucene.store.RAMOutputStream 42: 860601 20654424java.lang.StringBuilder 43: 424801 20390448 org.apache.lucene.analysis.miscellaneous.WordDelimiterIterator 44: 424801 20390448 org.apache.lucene.analysis.core.StopFilter 45: 424801 20390448 org.apache.lucene.analysis.miscellaneous.KeywordMarkerFilter 46: 424801 20390448 org.apache.lucene.analysis.snowball.SnowballFilter 47: 839390 20145360 org.apache.lucene.index.DocumentsWriterDeleteQueue$TermNode -Saroj
Re: leaks in solr
Hi Guys I am also seeing this problem. I am using SOLR 4 from Trunk and seeing this issue repeat every day. Any inputs about how to resolve this would be great -Saroj On Thu, Jul 26, 2012 at 8:33 AM, Karthick Duraisamy Soundararaj karthick.soundara...@gmail.com wrote: Did you find any more clues? I have this problem in my machines as well.. On Fri, Jun 29, 2012 at 6:04 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Hi list, while monitoring my solr 3.6.1 installation I recognized an increase of memory usage in OldGen JVM heap on my slave. I decided to force Full GC from jvisualvm and send optimize to the already optimized slave index. Normally this helps because I have monitored this issue over the past. But not this time. The Full GC didn't free any memory. So I decided to take a heap dump and see what MemoryAnalyzer is showing. The heap dump is about 23 GB in size. 1.) Report Top consumers - Biggest Objects: Total: 12.3 GB org.apache.lucene.search.FieldCacheImpl : 8.1 GB class java.lang.ref.Finalizer : 2.1 GB org.apache.solr.util.ConcurrentLRUCache : 1.5 GB org.apache.lucene.index.ReadOnlySegmentReader : 622.5 MB ... As you can see, Finalizer has already reached 2.1 GB!!! * java.util.concurrent.ConcurrentHashMap$Segment[16] @ 0x37b056fd0 * segments java.util.concurrent.ConcurrentHashMap @ 0x39b02d268 * map org.apache.solr.util.ConcurrentLRUCache @ 0x398f33c30 * referent java.lang.ref.Finalizer @ 0x37affa810 * next java.lang.ref.Finalizer @ 0x37affa838 ... Seams to be org.apache.solr.util.ConcurrentLRUCache The attributes are: Type |Name | Value - boolean| isDestroyed | true - ref| cleanupThread| null ref| evictionListener | null --- long | oldestEntry | 0 -- int| acceptableWaterMark | 9500 -- ref| stats| org.apache.solr.util.ConcurrentLRUCache$Stats @ 0x37b074dc8 boolean| islive | true - boolean| newThreadForCleanup | false boolean| isCleaning | false ref| markAndSweepLock | java.util.concurrent.locks.ReentrantLock @ 0x39bf63978 - int| lowerWaterMark | 9000 - int| upperWaterMark | 1 - ref| map | java.util.concurrent.ConcurrentHashMap @ 0x39b02d268 -- 2.) While searching for open files and their references I noticed that there are references to index files which are already deleted from disk. E.g. recent index files are data/index/_2iqw.frq and data/index/_2iqx.frq. But I also see references to data/index/_2hid.frq which are quite old and are deleted way back from earlier replications. I have to analyze this a bit deeper. So far my report, I go on analyzing this huge heap dump. If you need any other info or even the heap dump, let me know. Regards Bernd
Re: leaks in solr
it was from 4/11/12 -Saroj On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller markrmil...@gmail.com wrote: On Jul 26, 2012, at 3:18 PM, roz dev rozde...@gmail.com wrote: Hi Guys I am also seeing this problem. I am using SOLR 4 from Trunk and seeing this issue repeat every day. Any inputs about how to resolve this would be great -Saroj Trunk from what date? - Mark
Re: leaks in solr
Thanks Mark. We are never calling commit or optimize with openSearcher=false. As per logs, this is what is happening openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false} -- But, We are going to use 4.0 Alpha and see if that helps. -Saroj On Thu, Jul 26, 2012 at 5:12 PM, Mark Miller markrmil...@gmail.com wrote: I'd take a look at this issue: https://issues.apache.org/jira/browse/SOLR-3392 Fixed late April. On Jul 26, 2012, at 7:41 PM, roz dev rozde...@gmail.com wrote: it was from 4/11/12 -Saroj On Thu, Jul 26, 2012 at 4:21 PM, Mark Miller markrmil...@gmail.com wrote: On Jul 26, 2012, at 3:18 PM, roz dev rozde...@gmail.com wrote: Hi Guys I am also seeing this problem. I am using SOLR 4 from Trunk and seeing this issue repeat every day. Any inputs about how to resolve this would be great -Saroj Trunk from what date? - Mark - Mark Miller lucidimagination.com
Re: Issue with field collapsing in solr 4 while performing distributed search
I think that there is no way around doing custom logic in this case. If indexing process knows that documents have to be grouped then they better be together. -Saroj On Mon, Jun 11, 2012 at 6:37 AM, Nitesh Nandy niteshna...@gmail.com wrote: Martijn, How do we add a custom algorithm for distributing documents in Solr Cloud? According to this discussion http://lucene.472066.n3.nabble.com/SolrCloud-how-to-index-documents-into-a-specific-core-and-how-to-search-against-that-core-td3985262.html , Mark discourages users from using custom distribution mechanism in Solr Cloud. Load balancing is not an issue for us at the moment. In that case, how should we implement a custom partitioning algorithm. On Mon, Jun 11, 2012 at 6:23 PM, Martijn v Groningen martijn.v.gronin...@gmail.com wrote: The ngroups returns the number of groups that have matched with the query. However if you want ngroups to be correct in a distributed environment you need to put document belonging to the same group into the same shard. Groups can't cross shard boundaries. I guess you need to do some manual document partitioning. Martijn On 11 June 2012 14:29, Nitesh Nandy niteshna...@gmail.com wrote: Version: Solr 4.0 (svn build 30th may, 2012) with Solr Cloud (2 slices and 2 shards) The setup was done as per the wiki: http://wiki.apache.org/solr/SolrCloud We are doing distributed search. While querying, we use field collapsing with ngroups set as true as we need the number of search results. However, there is a difference in the number of result list returned and the ngroups value returned. Ex: http://localhost:8983/solr/select?q=message:blah%20AND%20userid:3group=truegroup.field=idgroup.ngroups=true The response XMl looks like response script/ lst name=responseHeader int name=status0/int int name=QTime46/int lst name=params str name=group.fieldid/str str name=group.ngroupstrue/str str name=grouptrue/str str name=qmessagebody:monit AND usergroupid:3/str /lst /lst lst name=grouped lst name=id int name=matches10/int int name=ngroups9/int arr name=groups lst str name=groupValue320043/str result name=doclist numFound=1 start=0 doc.../doc /result /lst lst str name=groupValue398807/str result name=doclist numFound=5 start=0 maxScore=2.4154348... /result /lst lst str name=groupValue346878/str result name=doclist numFound=2 start=0.../result /lst lst str name=groupValue346880/str result name=doclist numFound=2 start=0.../result /lst /arr /lst /lst /response So you can see that the ngroups value returned is 9 and the actual number of groups returned is 4 Why do we have this discrepancy in the ngroups, matches and actual number of groups. Is this an open issue ? Any kind of help is appreciated. -- Regards, Nitesh Nandy -- Met vriendelijke groet, Martijn van Groningen -- Regards, Nitesh Nandy
Re: How to do custom sorting in Solr?
Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each of the sub-category in a such a way that products which are on markdown, are at the bottom of the documents list and other products which are on regular price, are sorted as per their sort order in their sub-category. Expected Results are Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:101 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:71 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:71 . ... Product # 70 Price: Regular, Sort Order:70 My query is like this: q=*:*fq=category:Books What are the options to implement custom sorting and how do I do it? - Define a Custom Function query? - Define a Custom Comparator? Or, - Define a Custom Collector? Please let me know the best way to go about it and any pointers to customize Solr 4. Thanks Saroj
Re: How to do custom sorting in Solr?
Thanks Erik for your quick feedback When Products are assigned to a category or Sub-Category then they can be in any order and price type can be regular or markdown. So, reg and markdown products are intermingled as per their assignment but I want to sort them in such a way that we ensure that all the products which are on markdown are at the bottom of the list. I can use these multiple sorts but I realize that they are costly in terms of heap used, as they are using FieldCache. I have an index with 2M docs and docs are pretty big. So, I don't want to use them unless there is no other option. I am wondering if I can define a custom function query which can be like this: - check if product is on the markdown - if yes then change its sort order field to be the max value in the given sub-category, say 99 - else, use the sort order of the product in the sub-category I have been looking at existing function queries but do not have a good handle on how to make one of my own. - Another option could be use a custom sort comparator but I am not sure about the way it works Any thoughts? -Saroj On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.comwrote: Skimming this, I two options come to mind: 1 Simply apply primary, secondary, etc sorts. Something like sort=subcategory asc,markdown_or_regular desc,sort_order asc 2 You could also use grouping to arrange things in groups and sort within those groups. This has the advantage of returning some members of each of the top N groups in the result set, which makes it easier to get some of each group rather than having to analyze the whole list But your example is somewhat contradictory. You say products which are on markdown, are at the bottom of the documents list But in your examples, products on markdown are intermingled Best Erick On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each of the sub-category in a such a way that products which are on markdown, are at the bottom of the documents list and other products which are on regular price, are sorted as per their sort order in their sub-category. Expected Results are Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:101 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:71 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:71 . ... Product # 70 Price: Regular, Sort Order:70 My query is like this: q=*:*fq=category:Books What are the options to implement custom sorting and how do I do it? - Define a Custom Function query? - Define a Custom Comparator? Or, - Define a Custom Collector? Please let me know the best way to go about it and any pointers to customize Solr 4. Thanks Saroj
Re: How to do custom sorting in Solr?
Yes, these documents have lots of unique values as the same product could be assigned to lots of other categories and that too, in a different sort order. We did some evaluation of heap usage and found that with kind of queries we generate, heap usage was going up to 24-26 GB. I could trace it to the fact that fieldCache is creating an array of 2M size for each of the sort fields. Since same products are mapped to multiple categories, we incur significant memory overhead. Therefore, any solve where memory consumption can be reduced is a good one for me. In fact, we have situations where same product is mapped to more than 1 sub-category in the same category like Books -- Programming - Java in a nutshell -- Sale (40% off) - Java in a nutshell So,another thought in my mind is to somehow use second pass collector to group books appropriately in Programming and Sale categories, with right sort order. But, i have no clue about that piece :( -Saroj On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson erickerick...@gmail.comwrote: 2M docs is actually pretty small. Sorting is sensitive to the number of _unique_ values in the sort fields, not necessarily the number of documents. And sorting only works on fields with a single value (i.e. it can't have more than one token after analysis). So for each field you're only talking 2M values at the vary maximum, assuming that the field in question has a unique value per document, which I doubt very much given your problem description. So with a corpus that size, I'd just try it'. Best Erick On Sun, Jun 10, 2012 at 7:12 PM, roz dev rozde...@gmail.com wrote: Thanks Erik for your quick feedback When Products are assigned to a category or Sub-Category then they can be in any order and price type can be regular or markdown. So, reg and markdown products are intermingled as per their assignment but I want to sort them in such a way that we ensure that all the products which are on markdown are at the bottom of the list. I can use these multiple sorts but I realize that they are costly in terms of heap used, as they are using FieldCache. I have an index with 2M docs and docs are pretty big. So, I don't want to use them unless there is no other option. I am wondering if I can define a custom function query which can be like this: - check if product is on the markdown - if yes then change its sort order field to be the max value in the given sub-category, say 99 - else, use the sort order of the product in the sub-category I have been looking at existing function queries but do not have a good handle on how to make one of my own. - Another option could be use a custom sort comparator but I am not sure about the way it works Any thoughts? -Saroj On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com wrote: Skimming this, I two options come to mind: 1 Simply apply primary, secondary, etc sorts. Something like sort=subcategory asc,markdown_or_regular desc,sort_order asc 2 You could also use grouping to arrange things in groups and sort within those groups. This has the advantage of returning some members of each of the top N groups in the result set, which makes it easier to get some of each group rather than having to analyze the whole list But your example is somewhat contradictory. You say products which are on markdown, are at the bottom of the documents list But in your examples, products on markdown are intermingled Best Erick On Sun, Jun 10, 2012 at 3:36 AM, roz dev rozde...@gmail.com wrote: Hi All I have an index which contains a Catalog of Products and Categories, with Solr 4.0 from trunk Data is organized like this: Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Regular, Sort Order:4 . ... Product # 100 Price: Regular, Sort Order:100 Sub Category: Fiction Products: Product # 1, Price: Markdown, Sort Order:1 Product # 2, Price: Regular, Sort Order:2 Product # 3 Price: Regular, Sort Order:3 Product # 4 Price: Markdown, Sort Order:4 . ... Product # 70 Price: Regular, Sort Order:70 I want to query Solr and sort these products within each of the sub-category in a such a way that products which are on markdown, are at the bottom of the documents list and other products which are on regular price, are sorted as per their sort order in their sub-category. Expected Results are Category: Books Sub Category: Programming Products: Product # 1, Price: Regular Sort Order:1 Product # 2, Price: Markdown, Sort Order:101 Product
Is there any performance cost of using lots of OR in the solr query
Hi All, I am working on an application which makes few solr calls to get the data. On the high level, We have a requirement like this - Make first call to Solr, to get the list of products which are children of a given category - Make 2nd solr call to get product documents based on a list of product ids 2nd query will look like q=document_type:SKUfq=product_id:(34 OR 45 OR 56 OR 77) We can have close to 100 product ids in fq. is there a performance cost of doing these solr calls which have lots of OR? As per Slide # 41 of Presentation The Seven Deadly Sins of Solr, it is a bad idea to have these kind of queries. http://www.slideshare.net/lucenerevolution/hill-jay-7-sins-of-solrpdf But, It does not become clear the reason it is bad. Any inputs will be welcome. Thanks Saroj
Solr Cloud, Commits and Master/Slave configuration
Hi All, I am trying to understand features of Solr Cloud, regarding commits and scaling. - If I am using Solr Cloud then do I need to explicitly call commit (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of writing to disk? - Do We still need to use Master/Slave setup to scale searching? If we have to use Master/Slave setup then do i need to issue hard-commit to make my changes visible to slaves? - If I were to use NRT with Master/Slave setup with soft commit then will the slave be able to see changes made on master with soft commit? Any inputs are welcome. Thanks -Saroj
Re: hot deploy of newer version of solr schema in production
Thanks Jan for your inputs. I am keen to know about the way people keep running live sites while there is a breaking change which calls for complete re-indexing. we want to build a new index , with new schema (it may take couple of hours) without impacting live e-commerce site. any thoughts are welcome Thanks Saroj On Tue, Jan 24, 2012 at 12:21 AM, Jan Høydahl jan@cominvent.com wrote: Hi, To be able to do a true hot deploy of newer schema without reindexing, you must carefully see to that none of your changes are breaking changes. So you should test the process on your development machine and make sure it works. Adding and deleting fields would work, but not changing the field-type or analysis of an existing field. Depending on from/to version, you may want to keep the old schema-version number. The process is: 1. Deploy the new schema, including all dependencies such as dictionaries 2. Do a RELOAD CORE http://wiki.apache.org/solr/CoreAdmin#RELOAD My preference is to do a more thorough upgrade of schema including new functionality and breaking changes, and then do a full reindex. The exception is if my index is huge and the reason for Solr upgrade or schema change is to fix a bug, not to use new functionality. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 24. jan. 2012, at 01:51, roz dev wrote: Hi All, I need community's feedback about deploying newer versions of solr schema into production while existing (older) schema is in use by applications. How do people perform these things? What has been the learning of people about this. Any thoughts are welcome. Thanks Saroj
hot deploy of newer version of solr schema in production
Hi All, I need community's feedback about deploying newer versions of solr schema into production while existing (older) schema is in use by applications. How do people perform these things? What has been the learning of people about this. Any thoughts are welcome. Thanks Saroj
Index format difference between 4.0 and 3.4
Hi All, We are using Solr 1.4.1 in production and are considering an upgrade to newer version. It seems that Solr 3.x requires a complete rebuild of index as the format seems to have changed. Is Solr 4.0 index file format compatible with Solr 3.x format? Please advise. Thanks Saroj
Re: Production Issue: SolrJ client throwing this error even though field type is not defined in schema
This issue disappeared when we reduced the number of documents which were being returned from Solr. Looks to be some issue with Tomcat or Solr, returning truncated responses. -Saroj On Sun, Sep 25, 2011 at 9:21 AM, pulkitsing...@gmail.com wrote: If I had to give a gentle nudge, I would ask you to validate your schema XML file. You can do so by looking for any w3c XML validator website and just copy pasting the text there to find out where its malformed. Sent from my iPhone On Sep 24, 2011, at 2:01 PM, Erick Erickson erickerick...@gmail.com wrote: You might want to review: http://wiki.apache.org/solr/UsingMailingLists There's really not much to go on here. Best Erick On Wed, Sep 21, 2011 at 12:13 PM, roz dev rozde...@gmail.com wrote: Hi All We are getting this error in our Production Solr Setup. Message: Element type t_sort must be followed by either attribute specifications, or /. Solr version is 1.4.1 Stack trace indicates that solr is returning malformed document. Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232) ... 15 more Caused by: org.apache.solr.common.SolrException: parsing error at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) ... 17 more Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,136974] Message: Element type t_sort must be followed by either attribute specifications, or /. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594) at org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360) at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125) ... 21 more
Re: Production Issue: SolrJ client throwing - Element type must be followed by either attribute specifications, or /.
Wanted to update the list with our finding. We reduced the number of documents which are being retrieved from Solr and this error did not appear again. Might be the case that due to high number of documents, solr is returning incomplete documents. -Saroj On Wed, Sep 21, 2011 at 12:13 PM, roz dev rozde...@gmail.com wrote: Hi All We are getting this error in our Production Solr Setup. Message: Element type t_sort must be followed by either attribute specifications, or /. Solr version is 1.4.1 Stack trace indicates that solr is returning malformed document. Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232) ... 15 more Caused by: org.apache.solr.common.SolrException: parsing error at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) ... 17 more Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,136974] Message: Element type t_sort must be followed by either attribute specifications, or /. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594) at org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360) at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125) ... 21 more
Production Issue: SolrJ client throwing this error even though field type is not defined in schema
Hi All We are getting this error in our Production Solr Setup. Message: Element type t_sort must be followed by either attribute specifications, or /. Solr version is 1.4.1 Stack trace indicates that solr is returning malformed document. Caused by: org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at com.gap.gid.search.impl.SearchServiceImpl.executeQuery(SearchServiceImpl.java:232) ... 15 more Caused by: org.apache.solr.common.SolrException: parsing error at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:140) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:101) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:481) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) ... 17 more Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[3,136974] Message: Element type t_sort must be followed by either attribute specifications, or /. at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:594) at org.apache.solr.client.solrj.impl.XMLResponseParser.readArray(XMLResponseParser.java:282) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocument(XMLResponseParser.java:410) at org.apache.solr.client.solrj.impl.XMLResponseParser.readDocuments(XMLResponseParser.java:360) at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:241) at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:125) ... 21 more
cache invalidation in slaves
Hi All Solr has different types of caches such as filterCache, queryResultCache and document Cache . I know that if a commit is done then a new searcher is opened and new caches are built. And, this makes sense. What happens when commits are happening on master and slaves are pulling all the delta updates. Do slaves trash their cache and rebuild them every time there is a new delta index updates downloaded to slave? Thanks Saroj
q and fq in solr 1.4.1
Hi All I am sure that q vs fq question has been answered several times. But, I still have a question which I would like to know the answers for: if we have a solr query like this q=*fq=field_1:XYZfq=field_2:ABCsortBy=field_3+asc How does SolrIndexSearcher fire query in 1.4.1 Will it fire query against whole index first because q=* then filter the results against field_1 and field_2 or is it in parallel? and, if we say that get only 20 rows at a time then will solr do following 1) get all the docs (because q is set to *) and sort them by field_3 2) then, filter the results by field_1 and field_2 Or, will it apply sorting after doing the filter? Please let me know how Solr 1.4.1 works. Thanks Saroj
what is the default value of omitNorms and termVectors in solr schema
Hi As per this document, http://wiki.apache.org/solr/FieldOptionsByUseCase, omitNorms and termVectors have to be explicitly specified in some cases. I am wondering what is the default value of these settings if solr schema definition does not state them. *Example:* field name=ql_path type=string indexed=false stored=true/ In above case, will Solr create norms for this field and term vector as well? Any ideas? Thanks Saroj
Re: Does Solr flush to disk even before ramBufferSizeMB is hit?
Thanks Shawn. If Solr writes this info to Disk as soon as possible (which is what I am seeing) then ramBuffer setting seems to be misleading. Anyone else has any thoughts on this? -Saroj On Mon, Aug 29, 2011 at 6:14 AM, Shawn Heisey s...@elyograg.org wrote: On 8/28/2011 11:18 PM, roz dev wrote: I notice that even though InfoStream does not mention that data is being flushed to disk, new segment files were created on the server. Size of these files kept growing even though there was enough Heap available and 856MB Ram was not even used. With the caveat that I am not an expert and someone may correct me, I'll offer this: It's been my experience that Solr will write the files that constitute stored fields as soon as they are available, because that information is always the same and nothing will change in those files based on the next chunk of data. Thanks, Shawn
Does Solr flush to disk even before ramBufferSizeMB is hit?
Hi All, I am trying to tune ramBufferSizeMB and merge factor for my setup. So, i enabled Lucene Index Writer's log info stream and started monitoring Data folder where index files are created. I started my test with following Heap: 3GB Solr 1.4.1, Index Size = 20 GB, ramBufferSizeMB=856 Merge Factor=25 I ran my testing with 30 concurrent threads writing to Solr. My jobs delete 6 (approx) records by issuing a deleteByQuery command and then proceed to write data. Commit is done at the end of writing process. Results are bit surprising for me and I need some help understanding them. I notice that even though InfoStream does not mention that data is being flushed to disk, new segment files were created on the server. Size of these files kept growing even though there was enough Heap available and 856MB Ram was not even used. Is it the case that Lucene is flushing to disk even if ramBufferSizeMB is being hit. If that is the case then why is it that InfoStream is not logging this info. As per Infostream, it is flushing at the end but files are created much before that. Here is what InfoStream is saying: - Please note that is indicating that a new segment is being flushed at 12:58 AM but files were created at 12:53 am itself and they kept growing. Aug 29, 2011 12:46:00 AM IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.NIOFSDirectory@/opt/gid/solr/ecom/data/index autoCommit=false mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@4552a64dmergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@35242cc9ramBufferSizeMB=856.0 maxBufferedDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=1 index=_3l:C2151995 Aug 29, 2011 12:57:35 AM IW 0 [web-1]: now flush at close Aug 29, 2011 12:57:35 AM IW 0 [web-1]: flush: now pause all indexing threads Aug 29, 2011 12:57:35 AM IW 0 [web-1]: flush: segment=_3m docStoreSegment=_3m docStoreOffset=0 flushDocs=true flushDeletes=true flushDocStores=true numDocs=60788 numBufDelTerms=60788 Aug 29, 2011 12:57:35 AM IW 0 [web-1]: index before flush _3l:C2151995 Aug 29, 2011 12:57:35 AM IW 0 [web-1]: DW: flush postings as segment _3m numDocs=60788 Aug 29, 2011 12:57:35 AM IW 0 [web-1]: DW: closeDocStore: 2 files to flush to segment _3m numDocs=60788 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=9 total now 9 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=182 total now 182 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=49 total now 49 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=7 total now 16 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=145 total now 327 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=37 total now 86 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=9 total now 25 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=208 total now 535 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=52 total now 138 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=7 total now 32 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=136 total now 671 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=39 total now 177 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleIntBlocks count=3 total now 35 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleByteBlocks blockSize=32768 count=58 total now 729 Aug 29, 2011 12:57:40 AM IW 0 [web-1]: DW: DW.recycleCharBlocks count=16 total now 193 Aug 29, 2011 12:57:41 AM IW 0 [web-1]: DW: oldRAMSize=50469888 newFlushedSize=161169038 docs/MB=395.491 new/old=319.337% Aug 29, 2011 12:57:41 AM IFD [web-1]: now checkpoint segments_1x [2 segments ; isCommit = false] Aug 29, 2011 12:57:41 AM IW 0 [web-1]: DW: apply 60788 buffered deleted terms and 0 deleted docIDs and 1 deleted queries on 2 segments. Aug 29, 2011 12:57:42 AM IFD [web-1]: now checkpoint segments_1x [2 segments ; isCommit = false] Aug 29, 2011 12:57:42 AM IFD [web-1]: now checkpoint segments_1x [2 segments ; isCommit = false] Aug 29, 2011 12:57:42 AM IW 0 [web-1]: LMP: findMerges: 2 segments Aug 29, 2011 12:57:42 AM IW 0 [web-1]: LMP: level 6.6799455 to 7.4299455: 1 segments Aug 29, 2011 12:57:42 AM IW 0 [web-1]: LMP: level 5.1209826 to 5.8709826: 1 segments Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: now merge Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: index: _3l:C2151995 _3m:C60788 Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: no more merges pending; now return Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: now merge Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: index: _3l:C2151995 _3m:C60788 Aug 29, 2011 12:57:42 AM IW 0 [web-1]: CMS: no more merges pending; now return Aug 29, 2011 12:57:42 AM IW 0 [web-1]: now call final commit() Aug 29, 2011 12:57:42 AM IW 0 [web-1]: startCommit(): start sizeInBytes=0 Aug 29, 2011 12:57:42 AM
SolrJ Question about Bad Request Root cause error
Hi All We are using SolrJ client (v 1.4.1) to integrate with our solr search server. We notice that whenever SolrJ request does not match with Solr schema, we get Bad Request exception which makes sense. org.apache.solr.common.SolrException: Bad Request But, SolrJ Client does not provide any clue about the reason request is Bad. Is there any way to get the root cause on client side? Of Course, solr server logs have enough info to know that data is bad but it would be great to have the same info in the exception generated by SolrJ. Any thoughts? Is there any plan to add this in future releases? Thanks, Saroj