ping query frequency
Hi, I'm wonderring how frequent this query should be made. Currently it is done before each select request (some very old legacy). I googled a little and found out that it is bad practice and has performance impact. So the question is should I completely remove it or just do it once in some period of time. What is the best practice? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/ping-query-frequency-tp4044305.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: create cores dynamically
For me, it always creates the data dir next to the conf dir I specified (this should depend on your core configuration) and load the core into solr (and I think this is what it is suposed to do) On Mar 3, 2013 3:18 AM, adeelmahmood adeelmahm...@gmail.com wrote: I am not sure if I understand how the creating cores dynamically functionality is supposed to work. From what I have sort of figured out is that I need to specify the instanceDir as the path to a directory which contains the conf file. So I have directory as template for configuration files but when I use this path, solr adds the data directory next to this template conf directory which defeats the purpose. I was hoping that it will copy the template files into a new directory created for the core. Is that not how its supposed to work. Any help is appreciated. Thanks Adeel -- View this message in context: http://lucene.472066.n3.nabble.com/create-cores-dynamically-tp4044279.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Returning to Solr 4.0 from 4.1
On Sat, Mar 2, 2013 at 9:32 PM, Upayavira u...@odoko.co.uk wrote: What I'm questioning is whether the issue you see in 4.1 has been resolved in Subversion. While I would not expect 4.0 to read a 4.1 index, the SVN branch/4.2 should be able to do so effortlessly. Upayavira I see, thanks. Actually, running a clean 4.1 with no previous index does not have the issues. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
advice on a backup service of solr collection
Hi, I want to write a component that will protect solr cluster from indexing damage. The component will take down a replica from each shard (backup replicas), before indexing. Then the indexing will start. When the indexing is finished, some tests will be performed and only if they pass, the backup replicas will be brought on. If some test fails, I will have to check what went wrong in the indexing process, and in any case, I'll have the backup replicas with which I can bring the cluster back without loosing any data. Any insights on this component? Any advice on the issue? Can I randomly take one of the replicas in each shard? Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/advice-on-a-backup-service-of-solr-collection-tp4044350.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: cant' execute query in solr 4.1 admin page
On 3/3/2013 12:15 AM, adfel70 wrote: I used to execute queries in solr admin page for testing purposes. I updated to solr 4.1 and it seems this feature is not working. I can see that the query is always sent without and query parameter. This sounds like this bug: https://issues.apache.org/jira/browse/SOLR-4393 It looks like the patch for another bug fixes it: https://issues.apache.org/jira/browse/SOLR-4349 I use 4.2-SNAPSHOT and have not seen this problem on the latest Firefox. I was tempted by new features and never did use the released version. Thanks, Shawn
Re: cant' execute query in solr 4.1 admin page
Hey Already documented and fixed for 4.2: https://issues.apache.org/jira/browse/SOLR-4349 Stefan On Sunday, March 3, 2013 at 8:15 AM, adfel70 wrote: Hi I used to execute queries in solr admin page for testing purposes. I updated to solr 4.1 and it seems this feature is not working. I can see that the query is always sent without and query parameter. Is this a known bug? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/cant-execute-query-in-solr-4-1-admin-page-tp4044297.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Re: ping query frequency
On 3/3/2013 2:15 AM, adm1n wrote: I'm wonderring how frequent this query should be made. Currently it is done before each select request (some very old legacy). I googled a little and found out that it is bad practice and has performance impact. So the question is should I completely remove it or just do it once in some period of time. Can you point me at the place where it says that it's bad practice to do frequent pings? I use the ping functionality in my haproxy load balancer that sits in front of Solr. It executes a ping request against all my Solr instances every five seconds. Most of the time, the ping request (which is distributed) finishes in single-digit milliseconds. If that is considered bad practice, I want to figure out why and submit issues to get the problem fixed. I can imagine that sending a ping before every query would be a bad idea, but I am hoping that the way I'm using it is OK. The only problem with ping requests that I have ever noticed was caused by long garbage collection pauses on my 8GB Solr heap. Those pauses caused the load balancer to incorrectly mark the active Solr instance(s) as down and send requests to a backup. Through experimentation with -XX memory tuning options, I have now eliminated the GC pause problem. For machines running Solr 4.2-SNAPSHOT, I have reduced the heap to 6GB, the 3.5.0 machines are still running with 8GB. Thanks, Shawn
Re: create cores dynamically
On 3/2/2013 8:18 PM, adeelmahmood wrote: I am not sure if I understand how the creating cores dynamically functionality is supposed to work. From what I have sort of figured out is that I need to specify the instanceDir as the path to a directory which contains the conf file. So I have directory as template for configuration files but when I use this path, solr adds the data directory next to this template conf directory which defeats the purpose. I was hoping that it will copy the template files into a new directory created for the core. Is that not how its supposed to work. In solr.xml, instanceDir is relative to solr.solr.home, which defaults to solr, relative to the current working directory. The solr.solr.home directory is where solr.xml lives. Inside instanceDir, solr looks in conf to find the config. The value for dataDir defaults to data, relative to instanceDir. I don't think instanceDir has a default value, but I could be wrong about that. Here's what I use, with solr.solr.home set to /index/solr4: core loadOnStartup=true instanceDir=cores/ncmain/ transient=false name=ncmain dataDir=../../data/ncmain/ This means that all config directories are under /index/solr4/cores and all index directories are under /index/solr4/data ... I can easily delete all the data without touching the configs. For another installation where I'm using SolrCloud, it is set up a almost the same, but it's still separated because the active configs are in zookeeper, with a backup copy on the disk. Thanks, Shawn
Re: create cores dynamically
Well this is all useful information but not sure if it really answers my question. Let me rephrase what exactly I am trying to do. Lets say I start with core0 so this is how the directory structure looks like solr.home - solr.xml - core0 - conf - data now when I want to dynamically add core1 i want to end up with a structure like this solr.home - solr.xml - core0 - conf - data - core1 - conf - data is this possible with dynamic core creation, to have a seperate directory with conf and data directory inside it for each core separetely Thanks for the help -- View this message in context: http://lucene.472066.n3.nabble.com/create-cores-dynamically-tp4044279p4044389.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: create cores dynamically
On 3/3/2013 12:08 PM, adeelmahmood wrote: Well this is all useful information but not sure if it really answers my question. Let me rephrase what exactly I am trying to do. Lets say I start with core0 so this is how the directory structure looks like solr.home - solr.xml - core0 - conf - data now when I want to dynamically add core1 i want to end up with a structure like this solr.home - solr.xml - core0 - conf - data - core1 - conf - data is this possible with dynamic core creation, to have a seperate directory with conf and data directory inside it for each core separetely Yes, you can do this. You'll need to create the new core directory and its conf directory before creating the core within Solr, Solr doesn't do that part for you. Solr will automatically create the data directory, though. On my setup, I don't create cores dynamically. I have a central config directory and I create symlinks within the individual core config directories back to the central. Thanks, Shawn
Making tika process mail attachments eludes me
Been trying for a while to create an index of a mailbox. I have downloaded solr-4.1.0.tgz, configured example/example-DIH/solr/mail/conf/data-config.xml and emails are indexed, but the attachmens eludes me. The config says: Note - In order to index attachments, set processAttachement=true and drop Tika and its dependencies to example-DIH/solr/mail/lib directory Have tried dropping files from the contrib/extract/lib, but no luck. My friend Google seems to be unable to help me. Do I need to modify schema.xml or solrconfig.xml ? Cannot see any trace of Tika or errors in my logfile. Does it exist a working example to index mails and attachments somewhere to download? -- Regards Leif Hetlesæther
Re: ping query frequency
We too run a ping every 5 seconds and I think the concurrent Mark/Sweep helps to avoid the LB from taking a box out of rotation due to long pauses. Either that or I don't see large enough pauses for my LB to take it out (it'd have to fail 3 times in a row or 15 seconds total before it's gone). The ping query does execute an actual query so of course you want to make this as simple as possible (i.e. q=primary_key:value) so that there's limited to no scanning of the index. I think our query does an id:0 which would always return 0 docs but also any stupid-simple query is fine so long as it hits the caches on subsequent hits. The goal, to me at least, is not that the ping query yields actual docs but that it's a mechanism to remove a solr server out of rotation without having to login to an ops controlled device directly. I'd definitely remove the ping per request (wouldn't the fact that you are doing /select serve as the ping and hence defeat the purpose of the ping query) and definitely do the frequent ping as we are describing if you want to have your solr boxes behind some load balancer. On Sun, Mar 3, 2013 at 8:21 AM, Shawn Heisey s...@elyograg.org wrote: On 3/3/2013 2:15 AM, adm1n wrote: I'm wonderring how frequent this query should be made. Currently it is done before each select request (some very old legacy). I googled a little and found out that it is bad practice and has performance impact. So the question is should I completely remove it or just do it once in some period of time. Can you point me at the place where it says that it's bad practice to do frequent pings? I use the ping functionality in my haproxy load balancer that sits in front of Solr. It executes a ping request against all my Solr instances every five seconds. Most of the time, the ping request (which is distributed) finishes in single-digit milliseconds. If that is considered bad practice, I want to figure out why and submit issues to get the problem fixed. I can imagine that sending a ping before every query would be a bad idea, but I am hoping that the way I'm using it is OK. The only problem with ping requests that I have ever noticed was caused by long garbage collection pauses on my 8GB Solr heap. Those pauses caused the load balancer to incorrectly mark the active Solr instance(s) as down and send requests to a backup. Through experimentation with -XX memory tuning options, I have now eliminated the GC pause problem. For machines running Solr 4.2-SNAPSHOT, I have reduced the heap to 6GB, the 3.5.0 machines are still running with 8GB. Thanks, Shawn
Re: From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?
Hi Mikhail, Thanks for the links, looks like interesting stuff. Sadly this project is stuck in 3.x for some very thorny reasons... Googling around, looks like this might be strictly 4.x... On Mon, Feb 25, 2013 at 12:21 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Mark, AFAIK http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis a convenient framework for such juggling. Please also be aware of the good starting point http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett mbenn...@ideaeng.com wrote: Scenario: You're submitting a block of text as a query. You're content to let solr / lucene handing query parsing and tokenziation, etc. But you'd like to have ALL eventually produced leaf-nodes in the parse tree to have: * Boolean .MUST (effectively a + prefix) * Fuzzy match of ~1 or ~2 In a simple application, and if there were no punctuation, you could preprocess the query, effectively: * split on whitespace * for t in tokens: t = + + t + ~2 But this is ugly, and even then I think things like stop words would be messed up: * OK in Solr: the chair(it can properly remove the) * But if this:+the~2 +chair~2 (I'm not sure this would work) Sure, at the application level you could also remove the stop words in the for t in tokens loop, but then some other weird case would come up. Maybe one of the field's analyzers has some other token filter you forgot about, so you'd have to bring that logic forward as well. (Long story of why I'd want to do all this... and I know people think adding ~2 to all tokens will give bad results anyway, trying to fix inherited code that can't be scrapped, etc) -- Mark Bennett / New Idea Engineering, Inc. / mbenn...@ideaeng.com Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513 -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: What makes an Analyzer/Tokenizer/CharFilter/etc suitable for Solr?
Thanks Jack. On Thu, Feb 28, 2013 at 11:04 PM, Jack Krupansky j...@basetechnology.comwrote: The package Javadoc for Solr analysis is a good start: http://lucene.apache.org/solr/**4_1_0/solr-core/org/apache/** solr/analysis/package-tree.**htmlhttp://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/analysis/package-tree.html Actually, this is representative of why I am writing my own utility. That package tree does not actually make it easy to see all the derivative classes, as they are hiding behind the multiple levels of abstraction. I am not saying it is terribly hard. Still, for a non-Java programmer who is just stepping out of Solr as a black box and trying to understand what can be plugged-in in various configurations to improve their results, it is non-trivial first couple of times. Especially, since it is not just the class name that is important but also which jar need to be added to the library statement. My (preliminary) output for the CharFilters looks like this: -CharFilterFactory (example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.util) HTMLStripCharFilterFactory (example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.charfilter) MappingCharFilterFactory (example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.charfilter) PersianCharFilterFactory (example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.fa) JapaneseIterationMarkCharFilterFactory (example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-kuromoji-4.1.0.jar/org.apache.lucene.analysis.ja) PatternReplaceCharFilterFactory (example/solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.1.0.jar/org.apache.lucene.analysis.pattern) LegacyHTMLStripCharFilterFactory (dist/solr-core-4.1.0.jar/org.apache.solr.analysis) MockCharFilterFactory (dist/solr-test-framework-4.1.0.jar/org.apache.solr.analysis) And (part of) URP tree: -UpdateRequestProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) UIMAUpdateRequestProcessorFactory (dist/solr-uima-4.1.0.jar/org.apache.solr.uima.processor) -AbstractDefaultValueUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) DefaultValueUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) TimestampUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) UUIDUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) CloneFieldUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) DistributedUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) -FieldMutatingUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) ConcatFieldUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) CountFieldValuesUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) FieldLengthUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) -FieldValueSubsetUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) FirstFieldValueUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) LastFieldValueUpdateProcessorFactory (dist/solr-core-4.1.0.jar/org.apache.solr.update.processor) - at the start is abstract class, I also have * (not here) for classes without empty constructor (hence my original question). Especially the AbstractAnalysisFactory: http://lucene.apache.org/core/**4_1_0/analyzers-common/org/** apache/lucene/analysis/util/**AbstractAnalysisFactory.htmlhttp://lucene.apache.org/core/4_1_0/analyzers-common/org/apache/lucene/analysis/util/AbstractAnalysisFactory.html This is useful and confirms my 'empty-constructor' assumption. Also, look at the various factories in solrconfig.xml for other Solr extension points. Including search components, spellcheckers, etc. Will do. I was just wondering if there was a semi-comprehensive list. But I can build it iteratively. Regards, Alex. -- Jack Krupansky -Original Message- From: Alexandre Rafalovitch Sent: Thursday, February 28, 2013 10:32 PM To: solr-user@lucene.apache.org Subject: What makes an Analyzer/Tokenizer/CharFilter/**etc suitable for Solr? Hello, I want to have a unified reference of all different processors one could use in Solr in various extension points. I have written a small tool to extract all implementations of UpdateRequestProcessorFactory, Analyzer, CharFilterFactory, etc (actually of any root class). However, I assume not all Lucene Analyzer derivatives can be just plugged into Solr. Is it fair to say that the class must: *)
Re: Backtick character in field values, and results
Thank you! This saved me a lot of trouble! On Thu, Feb 28, 2013 at 4:39 AM, Erick Erickson erickerick...@gmail.comwrote: ICUFoldingFilterFactory is folding the backtick (grave accent). See admin/analysis page, it's a lifesaver in these situations! Best Erick On Fri, Feb 22, 2013 at 3:46 PM, Neelesh neele...@gmail.com wrote: With a text_unbroken field fieldType name=text_unbroken class=solr.TextField omitNorms=true omitTermFreqAndPositions=true analyzer tokenizer class= solr.KeywordTokenizerFactory / filter class= solr.ICUFoldingFilterFactory / /analyzer /fieldType A query like field:Hello` matches both Hello and Hello`. This does not happen with something like +. That is, field:Hello+ does not match Hello, but only matches Hello+ Is there something special about backticks? Are there more such really special characters? Thanks! -neelesh
Re: Backtick character in field values, and results
On 2/28/2013 5:39 AM, Erick Erickson wrote: ICUFoldingFilterFactory is folding the backtick (grave accent). See admin/analysis page, it's a lifesaver in these situations! Is this the way it's supposed to behave? From what I could tell in my look at the analysis page, it is folding the backtick into nothing. It happens on both 3.5.0 and 4.2-SNAPSHOT. The filter doesn't seem to have this behavior with any of the regular punctuation that I have tested. I use the ICUFoldingFilterFactory in my schema for most of my fields. Thanks, Shawn
atomic updates fail with solrcloud , and real time get throwing NPE
atomic updates are failing in solrcloud , unless the update is sent to the shard where the doc resides. Real time get is throwing NPE when run without distrib=false tried with 4.1 and 4.2 snapshot. Any ideas? Thanks. msj