Re: Sync failure after shard leader election when adding new replica.
Please, please, please do _not_ try to use core discovery to add new replicas by manually editing stuff. bq: and my deployment tools create an empty core on newly provisioned machines. This is a really bad idea (as you have discovered). Basically, your deployment tools have to do everything right to get this to play nice with SolrCloud. Your core names can't conflict. You have to spell all the parameters in core.properties right. Etc. There are endless places to go wrong. And this is all done for you (and tested with unit tests) via the Collections API. Assuming that in your scenario you started machine2 before machine1, how would Solr have any clue that that machine1 would _ever_ come back up? It'll do the best it can and try to elect a leader, but there's only one machine to choose from... and it's sorely out of date Absolutely use the collections api to add replicas to running SolrCloud clusters. And adding a replica via the Collections API _will_ use core discovery, as in it'll cause a core.properties file to be written on the node in question, populate it with all the necessary parameters, initiate a synch from the (running) leader, put itself into the query rotation automatically when the sync is done etc. All without you 1 having to try to figure all this out yourself 2 take the collection offline Best, Erick On Tue, May 26, 2015 at 2:46 PM, Michael Roberts mrobe...@tableau.com wrote: Hi, I have a SolrCloud setup, running 4.10.3. The setup consists of several cores, each with a single shard and initially each shard has a single replica (so, basically, one machine). I am using core discovery, and my deployment tools create an empty core on newly provisioned machines. The scenario that I am testing is, Machine 1 is running and writes are occurring from my application to Solr. At some point, I stop Machine 1, and reconfigure my application to add Machine 2. Both machines are then started. What I would expect to happen at this point, is Machine 2 cannot become leader because it is behind compared to Machine 1. Machine 2 would then restore from Machine 1. However, looking at the logs. I am seeing Machine 2 become elected leader and fail the PeerRestore 2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to continue. 2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader - try and sync 2015-05-24 17:20:25.997 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.update.PeerSync - PeerSync: core=project url=http://10.32.132.64:11000/solr START replicas=[http://jchar-1:11000/solr/project/] nUpdates=100 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.update.PeerSync - PeerSync: core=project url=http://10.32.132.64:11000/solr DONE. We have no versions. sync failed. 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we have no versions - we can't sync in that case - we were active before, so become leader anyway 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader: http://10.32.132.64:11000/solr/project/ shard1 What is the expected behavior here? What’s the best practice for adding a new replica? Should I have the SolrCloud running and do it via the Collections API or can I continue to use core discovery? Thanks.
Re: Solr relevancy score in percentage
Thank you everyone for your comments and recommendations. Will consider all these points in my implementation. Regards, Edwin On 27 May 2015 at 05:15, Walter Underwood wun...@wunderwood.org wrote: On May 26, 2015, at 7:10 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: We want the user to see how relevant the result is with respect to the search query entered, and not how good the results are. That is the meaning of the score from a probabilistic model search engine. Solr is not a probabilistic engine, it is vector space engine. The scores are fundamentally different. Treating it as a probability of relevance will not work. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Re: Removing characters like '\n \n' from indexing
I'm using ExtractingRequestHandler to do the indexing. Do I have to implement the UpdateProcessor method at the ExtractingRequestHandler or as a separate method? Regards, Edwin On 26 May 2015 at 23:42, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I think this is still in topic, Assuming we are using the Extract Update handler, I think the update processor approach still applies. But is it not possible to strip them directly with some extract request handler param? 2015-05-26 16:33 GMT+01:00 Jack Krupansky jack.krupan...@gmail.com: Neither - it removes the characters before indexing. The distinction is that if you remove them during indexing they will still appear in the stored field values even if they are removed from the indexed values, but by removing them before indexing, they will not appear in the stored field values. Again, the distinction is between indexed field values and stored field values. -- Jack Krupansky On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: It is showing up in the search results. Just to confirm, does this UpdateProcessor method remove the characters during indexing or only after indexing has been done? Regards, Edwin On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote: On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote: Hi, Is there a way to remove the special characters like \n during indexing of the rich text documents. I have quite alot of leading \n \n in front of my indexed content of rich text documents due to the space and empty lines with the original documents, and it's causing the content to be flooded with '\n \n' at the start before the actual content comes in. This causes the content to look ugly, and also takes up unnecessary bandwidth in the system. Where is this showing up? If it is in search results, you must use an UpdateProcessor, as these happen before fields are stored (E.g. RegexpReplaceProcessorFactory). If you are concerned about facet results, then you can do it in an analysis chain, for example with a RegexpFilterFactory. Upayavira -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: docValues: Can we apply synonym
Yes it could be :) Anyway thanks for helping. With Regards Aman Tandon On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com: Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com: We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Help/Guidance Needed : To reload kstem protword hash without full core reload
Thank you so much Ahmet :) With Regards Aman Tandon On Wed, May 27, 2015 at 1:29 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Aman, Start with creating a jira account and vote/watch that issue. Post on the issue to see if there is still interest on this. Declare that you will be volunteer and ask kindly for guidance. Creator of the issue or one the watchers may respond. Try to digest ideas discussed on the issue. Rise yours. Collaborate. Don't get discouraged if nobody responds, please remember that committers are busy people. If you have implement something you want to share, upload a patch : https://wiki.apache.org/solr/HowToContribute Good luck, Ahmet On Tuesday, May 26, 2015 7:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Ahmet, Can you please guide me to contribute for this *issue*. I haven't did this before. So I need to know...what should I need to know and how should I start..what IDE or whatever you thought is need to know for a novice. I will be thankful to you :) With Regards Aman Tandon On Tue, May 19, 2015 at 8:10 PM, Aman Tandon amantandon...@gmail.com wrote: That link you provided is exactly I want to do. Thanks Ahmet. With Regards Aman Tandon On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Aman, changing protected words without reindexing makes little or no sense. Regarding protected words, trend is to use solr.KeywordMarkerFilterFactory. Instead I suggest you to work on a more general issue: https://issues.apache.org/jira/browse/SOLR-1307 Ahmet On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon
Re: Removing characters like '\n \n' from indexing
I tried to follow the example here https://wiki.apache.org/solr/UpdateRequestProcessor, by putting the updateRequestProcessorChain in my solrconfig.xml But I'm getting the following error when I tried to reload the core. Caused by: org.apache.solr.common.SolrException: Error loading class 'solr.CustomUpdateRequestProcessorFactory' Is there anything I might have missed out? I'm using Solr 5.1. Regards, Edwin On 27 May 2015 at 10:13, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: I'm using ExtractingRequestHandler to do the indexing. Do I have to implement the UpdateProcessor method at the ExtractingRequestHandler or as a separate method? Regards, Edwin On 26 May 2015 at 23:42, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I think this is still in topic, Assuming we are using the Extract Update handler, I think the update processor approach still applies. But is it not possible to strip them directly with some extract request handler param? 2015-05-26 16:33 GMT+01:00 Jack Krupansky jack.krupan...@gmail.com: Neither - it removes the characters before indexing. The distinction is that if you remove them during indexing they will still appear in the stored field values even if they are removed from the indexed values, but by removing them before indexing, they will not appear in the stored field values. Again, the distinction is between indexed field values and stored field values. -- Jack Krupansky On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: It is showing up in the search results. Just to confirm, does this UpdateProcessor method remove the characters during indexing or only after indexing has been done? Regards, Edwin On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote: On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote: Hi, Is there a way to remove the special characters like \n during indexing of the rich text documents. I have quite alot of leading \n \n in front of my indexed content of rich text documents due to the space and empty lines with the original documents, and it's causing the content to be flooded with '\n \n' at the start before the actual content comes in. This causes the content to look ugly, and also takes up unnecessary bandwidth in the system. Where is this showing up? If it is in search results, you must use an UpdateProcessor, as these happen before fields are stored (E.g. RegexpReplaceProcessorFactory). If you are concerned about facet results, then you can do it in an analysis chain, for example with a RegexpFilterFactory. Upayavira -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Index optimize runs in background.
Our index has almost 100M documents running on SolrCloud of 5 shards and each shard has an index size of about 170+GB (for the record, we are not using stored fields - our documents are pretty large). We perform a full indexing every weekend and during the week there are no updates made to the index. Most of the queries that we run are pretty complex with hundreds of terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc. and take many minutes to execute. A difference of 10-20% is also a big advantage for us. We have been optimizing the index after indexing for years and it has worked well for us. Every once in a while, we upgrade Solr to the latest version and try without optimizing so that we can save the many hours it take to optimize such a huge index, but find optimized index work well for us. Erick I was indexing today the documents and saw the optimize happening in background. On Tue, May 26, 2015 at 9:12 PM, Erick Erickson erickerick...@gmail.com wrote: No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Solr relevancy score in percentage
Hi Edwin, Somehow, it is not recommended to display the relevancy score in percentage: https://wiki.apache.org/lucene-java/ScoresAsPercentages Ahmet On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Would like to check, does the new version of Solr allows this function of display the relevancy score in percentage? I understand from the older version that it is not able to, and the only way is to take the highest score and use that as 100%, and calculate other percentage from that number (For example if the max score is 10 and the next result has a score of 5, you would do (5 / 10) * 100 = 50%) Is there a better way to do this now? I'm using Solr 5.1 Regards, Edwin
AW: Setting system property
For my EmbeddedSolr-mode I do ... System.setProperty( solr.allow.unsafe.resourceloading, true ); ... which works fine. For the remote-mode, i.e. Solr/jetty server I put SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true into solr.in.sh. Unfortunately this setting/option does not seem to be applied. When I try to do an xi:include in solrconfig.xml (or schema.xml) I am getting security exceptions... Any advices? -Ursprüngliche Nachricht- Von: Erik Hatcher [mailto:erik.hatc...@gmail.com] Gesendet: Mittwoch, 13. Mai 2015 16:57 An: solr-user@lucene.apache.org Betreff: Re: Setting system property Clemens - For this particular property, it is only accessed as a system property directly, so it must be set on the JVM startup and cannot be set any other way. Erik — Erik Hatcher, Senior Solutions Architect http://www.lucidworks.com http://www.lucidworks.com/ On May 13, 2015, at 3:49 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: I'd like to make use of solr.allow.unsafe.resourceloading=true. Is the commandline -D solr.allow.unsafe.resourceloading=true the only way to inject/set this property or can it be done (e.g.) in solr.xml ? Thx Clemens
Re: Solr relevancy score in percentage
Hi Arslan, Thank you for the link. That means we are not advisable to show anything that's related to the relevancy score, even though the default sorting of the result is by relevancy score? Since showing the raw relevancy score does not make any sense to the user since they won't understand what it means too. Regards, Edwin On 26 May 2015 at 14:16, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Edwin, Somehow, it is not recommended to display the relevancy score in percentage: https://wiki.apache.org/lucene-java/ScoresAsPercentages Ahmet On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Would like to check, does the new version of Solr allows this function of display the relevancy score in percentage? I understand from the older version that it is not able to, and the only way is to take the highest score and use that as 100%, and calculate other percentage from that number (For example if the max score is 10 and the next result has a score of 5, you would do (5 / 10) * 100 = 50%) Is there a better way to do this now? I'm using Solr 5.1 Regards, Edwin
Re: Index of Hit in MultiValue fields
The result that Solr returns is the document, not anything beneath, so no, you cannot do this. You could use highlighting, you could parse the output of explains (debug.explains.structured=true will help) to identify which field triggered the match. Alternatively, you could use block joins. Make a parent doc and each of your colours as child docs, then you could return which doc matched. You could use the ExpandComponent to retrieve details of the parent doc (http://heliosearch.org/expand-block-join/) Dunno if any of that helps. Upayavira On Tue, May 26, 2015, at 08:33 AM, Rodolfo Zitellini wrote: Dear List, In my schema I have a couple multi value fields and I would need to retrive the index of which one generated a match. For example let's suppose I have a text field like this with three values: MyField: [0] Red [1] Blue [2] Green Searching for Blue gets me the document, but I would also need the index (1) in the multi value field. I tried using the highligter but it is a bit hackish then to calculate the index. Is it possible without resorting to highlighting? Thanks! Rodolfo
Re: YAJar
Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: Solr relevancy score in percentage
Correct. The relevancy score simply states that we think result #1 is more relevant than result #2. It doesn't say that #1 is relevant. The score doesn't have any validity across queries either, as, for example, a different number of query terms will cause the score to change. Upayavira On Tue, May 26, 2015, at 08:57 AM, Zheng Lin Edwin Yeo wrote: Hi Arslan, Thank you for the link. That means we are not advisable to show anything that's related to the relevancy score, even though the default sorting of the result is by relevancy score? Since showing the raw relevancy score does not make any sense to the user since they won't understand what it means too. Regards, Edwin On 26 May 2015 at 14:16, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Edwin, Somehow, it is not recommended to display the relevancy score in percentage: https://wiki.apache.org/lucene-java/ScoresAsPercentages Ahmet On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Would like to check, does the new version of Solr allows this function of display the relevancy score in percentage? I understand from the older version that it is not able to, and the only way is to take the highest score and use that as 100%, and calculate other percentage from that number (For example if the max score is 10 and the next result has a score of 5, you would do (5 / 10) * 100 = 50%) Is there a better way to do this now? I'm using Solr 5.1 Regards, Edwin
Index of Hit in MultiValue fields
Dear List, In my schema I have a couple multi value fields and I would need to retrive the index of which one generated a match. For example let's suppose I have a text field like this with three values: MyField: [0] Red [1] Blue [2] Green Searching for Blue gets me the document, but I would also need the index (1) in the multi value field. I tried using the highligter but it is a bit hackish then to calculate the index. Is it possible without resorting to highlighting? Thanks! Rodolfo
Re: Index optimize runs in background.
Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
docValues: Can we apply synonym
Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon
Re: YAJar
i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: Solr relevancy score in percentage
The question is more why do you want your users to see the scores? If they are wanting to affect ranking, what you want is the ability to run the same query with different boosting and see the difference (2 result sets), then see if the new ordering is better or worse. What the actual/raw score is is irrelevant to that, what is important is ordering? If you want to show how good your results are, then as the link shows, that is very difficult to measure (and very subjective!) On 26 May 2015 at 09:37, Upayavira u...@odoko.co.uk wrote: Correct. The relevancy score simply states that we think result #1 is more relevant than result #2. It doesn't say that #1 is relevant. The score doesn't have any validity across queries either, as, for example, a different number of query terms will cause the score to change. Upayavira On Tue, May 26, 2015, at 08:57 AM, Zheng Lin Edwin Yeo wrote: Hi Arslan, Thank you for the link. That means we are not advisable to show anything that's related to the relevancy score, even though the default sorting of the result is by relevancy score? Since showing the raw relevancy score does not make any sense to the user since they won't understand what it means too. Regards, Edwin On 26 May 2015 at 14:16, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Edwin, Somehow, it is not recommended to display the relevancy score in percentage: https://wiki.apache.org/lucene-java/ScoresAsPercentages Ahmet On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Would like to check, does the new version of Solr allows this function of display the relevancy score in percentage? I understand from the older version that it is not able to, and the only way is to take the highest score and use that as 100%, and calculate other percentage from that number (For example if the max score is 10 and the next result has a score of 5, you would do (5 / 10) * 100 = 50%) Is there a better way to do this now? I'm using Solr 5.1 Regards, Edwin
Re: YAJar
I guess this is one reason why the whole WAR approach is being removed! Solr should be a black-box that you talk to, and get responses from. What it depends on and how it is deployed, should be irrelevant to you. If you are wanting to override the version of guava that Solr uses, then you'd have to rebuild Solr (can be done with maven) and manually update the pom.xml to use guava 18.0, but why would you? You need to test Solr completely (in case any guava bugs affect Solr), deal with any build issues that arise (if guava changes any APIs), and cause yourself a world of pain, for what gain? On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote: i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: docValues: Can we apply synonym
To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon
Re: Index optimize runs in background.
Modassar, Are you saying that the reason you are optimising is because you have been doing it for years? If this is the only reason, you should stop doing it immediately. The one scenario in which optimisation still makes some sense is when you reindex every night and optimise straight after. This will leave you with a single segment which will search faster. However, if you are doing a lot of indexing, especially with deletes/updates, you will have merged your content into a single segment which will later need to be merged. That merge will be costly as it will involve copying the entire content of your large segment, which will impact performance. Before Solr 3.6, Optimisation was necessary and recommended. At that point (or a little before) the TieredMergePolicy became the default, and this made optimisation generally unnecessary. Upayavira On Mon, May 25, 2015, at 07:17 AM, Modassar Ather wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: YAJar
No, not really. Creating your own components that extend Solr is quite acceptable - they can live in the Solr Home lib directory outside of the war. But really, if you are coding within Solr, you really need to use the libraries that Solr uses. Or... create a JIRA ticket and help to upgrade Solr to the next version of the library you need. Or explain here what you are trying to so folks can help you find another way to achieve the same. Upayavira On Tue, May 26, 2015, at 01:00 PM, Daniel Collins wrote: I guess this is one reason why the whole WAR approach is being removed! Solr should be a black-box that you talk to, and get responses from. What it depends on and how it is deployed, should be irrelevant to you. If you are wanting to override the version of guava that Solr uses, then you'd have to rebuild Solr (can be done with maven) and manually update the pom.xml to use guava 18.0, but why would you? You need to test Solr completely (in case any guava bugs affect Solr), deal with any build issues that arise (if guava changes any APIs), and cause yourself a world of pain, for what gain? On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote: i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Removing characters like '\n \n' from indexing
Hi, Is there a way to remove the special characters like \n during indexing of the rich text documents. I have quite alot of leading \n \n in front of my indexed content of rich text documents due to the space and empty lines with the original documents, and it's causing the content to be flooded with '\n \n' at the start before the actual content comes in. This causes the content to look ugly, and also takes up unnecessary bandwidth in the system. Regards, Edwin
Re: Removing characters like '\n \n' from indexing
On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote: Hi, Is there a way to remove the special characters like \n during indexing of the rich text documents. I have quite alot of leading \n \n in front of my indexed content of rich text documents due to the space and empty lines with the original documents, and it's causing the content to be flooded with '\n \n' at the start before the actual content comes in. This causes the content to look ugly, and also takes up unnecessary bandwidth in the system. Where is this showing up? If it is in search results, you must use an UpdateProcessor, as these happen before fields are stored (E.g. RegexpReplaceProcessorFactory). If you are concerned about facet results, then you can do it in an analysis chain, for example with a RegexpFilterFactory. Upayavira
Re: When is too many fields in qf is too many?
How you have tie is fine. Setting tie to 1 might give you reasonable results. You could easily still have scores that are just always an order of magnitude or two higher, but try it out! BTW Anything you put in teh URL can also be put into a request handler. If you ever just want to have a 15 minute conversation via hangout, happy to chat with you :) Might be fun to think through your prob together. -Doug On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com wrote: Hi Doug, I'm back to this topic. Unfortunately, due to my DB structer, and business need, I will not be able to search against a single field (i.e.: using copyField). Thus, I have to use list of fields via qf. Given this, I see you said above to use tie=1.0 will that, more or less, address this scoring issue? Should tie=1.0 be set on the request handler like so: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypeedismax/str str name=qfF1 F2 F3 F4 ... ... .../str float name=tie1.0/float str name=fl_UNIQUE_FIELD_,score/str str name=wtxml/str str name=indenttrue/str /lst /requestHandler Or must tie be passed as part of the URL? Thanks Steve On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: Yeah a copyField into one could be a good space/time tradeoff. It can be more manageable to use an all field for both relevancy and performance, if you can handle the duplication of data. You could set tie=1.0, which effectively sums all the matches instead of picking the best match. You'll still have cases where one field's score might just happen to be far off of another, and thus dominating the summation. But something easy to try if you want to keep playing with dismax. -Doug On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com wrote: Hi Doug, Your blog write up on relevancy is very interesting, I didn't know this. Looks like I have to go back to my drawing board and figure out an alternative solution: somehow get those group-based-fields data into a single field using copyField. Thanks Steve On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull dturnb...@opensourceconnections.com wrote: Steven, I'd be concerned about your relevance with that many qf fields. Dismax takes a winner takes all point of view to search. Field scores can vary by an order of magnitude (or even two) despite the attempts of query normalization. You can read more here http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/ I'm about to win the blashphemer merit badge, but ad-hoc all-field like searching over many fields is actually a good use case for Elasticsearch's cross field queries. https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/ It wouldn't be hard (and actually a great feature for the project) to get the Lucene query associated with cross field search into Solr. You could easily write a plugin to integrate it into a query parser: https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java Hope that helps -Doug -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, LLC | 240.476.9983 | http://www.opensourceconnections.com Author: Relevant Search http://manning.com/turnbull from Manning Publications This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such. On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com wrote: Hi everyone, My solution requires that users in group-A can only search against a set of fields-A and users in group-B can only search against a set of fields-B, etc. There can be several groups, as many as 100 even more. To meet this need, I build my search by passing in the list of fields via qf. What goes into qf can be large: as many as 1500 fields and each field name averages 15 characters long, in effect the data passed via qf will be over 20K characters. Given the above, beside the fact that a search for apple translating to a 20K characters passing over the network, what else within Solr and Lucene I should be worried about if any? Will I hit some kind of a limit? Will each search now require more CPU cycles?
Re: docValues: Can we apply synonym
mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com: We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: SolrCloud 4.8 - Transaction log size over 1GB
Thanks Erick for your willingness and patience, if I understood well when autoCommit with openSearcher=true at first commit (soft or hard) all new documents will be automatically available for search. But when openSearcher=false, the commit will flush recent index changes to stable storage, but does not cause a new searcher to be opened to make those changes visible https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-autoCommit . So, it is not clear what is this stable storage, where is and when the new documents will be visible? Only when at very end of indexing process my code will commit ? Does it mean, let me say, that when openSearcher=false we have implicit commit done by solrCloud autoCommit not visible to world and explicit commit done by clients visible to world? On Tue, May 26, 2015 at 2:55 AM, Erick Erickson erickerick...@gmail.com wrote: The design is that the latest successfully flushed tlog file is kept for peer sync in SolrCloud mode. When a replica comes up, there's a chance that it's not very many docs behind. So, if possible, some of the docs are taken from the leader's tlog and replayed to the follower that's just been started. If the follower is too far out of sync, a full old-style replication is done. So there will always be a tlog file (and occasionally more than one if they're very small) kept around, even on successful commit. It doesn't matter if you have leaders and replicas or not, that's still the process that's followed. Please re-read the link I sent earlier. There's absolutely no reason your tlog files have to be so big! Really, set you autoCommit to, say, 15 seconds and 10 docs and set openSearcher=false in your solrconfig.xml file and your tlog file that's kept around will be much smaller and they'll be available for peer sync.. And if you really don't care about tlogs at all, just take this bit our of your solrconfig.xml updateLog str name=dir${solr.ulog.dir:}/str int name=${solr.ulog.numVersionBuckets:256}/int /updateLog Best, Erick On Mon, May 25, 2015 at 4:40 PM, Vincenzo D'Amore v.dam...@gmail.com wrote: Hi Erick, I have tried indexing code I have few times, this is the behaviour I have tried out: When an indexing process starts, even if one or more tlog file exists, a new tlog file is created and all the new documents are stored there. When indexing process ends and does an hard commit, older old tlog files are removed but the new one (the latest) remains. As far as I can see, since my indexing process every time loads few millions of documents, at end of process latest tlog file persist with all these documents there. So I have such big tlog files. Now the question is, why latest tlog file persist even if the code have done a hard commit. When an hard commit is done successfully, why should we keep latest tlog file? On Mon, May 25, 2015 at 7:24 PM, Erick Erickson erickerick...@gmail.com wrote: OK, assuming you're not doing any commits at all until the very end, then the tlog contains all the docs for the _entire_ run. The article really doesn't care whether the commits come from the solrconfig.xml or SolrJ client or curl. The tlog simply is not truncated until a hard commit happens, no matter where it comes from. So here's what I'd do: 1 set autoCommit in your solrconfig.xml with openSearcher=false for every minute. Then the problem will probably go away. or 2 periodically issue a hard commit (openSearcher=false) from the client. Of the two, I _strongly_ recommend 1 as it's more graceful when there are multiple clents. Best, Erick On Mon, May 25, 2015 at 4:45 AM, Vincenzo D'Amore v.dam...@gmail.com wrote: Hi Erick, thanks for your support. Reading the post I realised that my scenario does not apply the autoCommit configuration, now we don't have autoCommit in our solrconfig.xml. We need docs are searchable only after the indexing process, and all the documents are committed only at end of index process. Now I don't understand why tlog files are so big, given that we have an hard commit at end of every indexing. On Sun, May 24, 2015 at 5:49 PM, Erick Erickson erickerick...@gmail.com wrote: Vincenzo: Here's perhaps more than you want to know about hard commits, soft commits and transaction logs: http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Sun, May 24, 2015 at 12:04 AM, Vincenzo D'Amore v.dam...@gmail.com wrote: Thanks Shawn for your prompt support. Best regards, Vincenzo On Sun, May 24, 2015 at 6:45 AM, Shawn Heisey apa...@elyograg.org wrote: On 5/23/2015 9:41 PM, Vincenzo D'Amore wrote: Thanks Shawn, may be this is a silly question, but I looked
Re: Removing characters like '\n \n' from indexing
Neither - it removes the characters before indexing. The distinction is that if you remove them during indexing they will still appear in the stored field values even if they are removed from the indexed values, but by removing them before indexing, they will not appear in the stored field values. Again, the distinction is between indexed field values and stored field values. -- Jack Krupansky On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: It is showing up in the search results. Just to confirm, does this UpdateProcessor method remove the characters during indexing or only after indexing has been done? Regards, Edwin On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote: On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote: Hi, Is there a way to remove the special characters like \n during indexing of the rich text documents. I have quite alot of leading \n \n in front of my indexed content of rich text documents due to the space and empty lines with the original documents, and it's causing the content to be flooded with '\n \n' at the start before the actual content comes in. This causes the content to look ugly, and also takes up unnecessary bandwidth in the system. Where is this showing up? If it is in search results, you must use an UpdateProcessor, as these happen before fields are stored (E.g. RegexpReplaceProcessorFactory). If you are concerned about facet results, then you can do it in an analysis chain, for example with a RegexpFilterFactory. Upayavira
Re: Running Solr 5.1.0 as a Service on Windows
I am using NSSM to start zookeeper as a service on windows (and for Solr too). in NSSM I configured it to just point to to E:\zookeeper-3.4.6\bin\zkServer.cmd. As long as you can run that from the command line to validate that you have modified all of the zookeeper config files correctly, NSSM should have no problem starting up zookeeper. Will Miller Development Manager, eCommerce Services | Online Technology 462 Seventh Avenue, New York, NY, 10018 Office: 212.502.9323 | Cell: 317.653.0614 wmil...@fbbrands.com | www.fbbrands.com From: Upayavira u...@odoko.co.uk Sent: Monday, May 25, 2015 4:10 PM To: solr-user@lucene.apache.org Subject: Re: Running Solr 5.1.0 as a Service on Windows Zookeeper is just Java, so there's no reason why it can't be started in Windows. However, the startup scripts for Zookeeper on Windows are pathetic, so you are much more on your own than you are on Linux. There may be folks here who can answer your question (e.g. with Windows specific startup scripts), or you might consider asking on the Zookeeper mailing lists directly: https://zookeeper.apache.org/lists.html Upayavira On Mon, May 25, 2015, at 10:34 AM, Zheng Lin Edwin Yeo wrote: I've managed to get the Solr started as a Windows service after re-configuring the startup script, as I've previously missed out some of the custom configurations there. However, I still couldn't get the zookeeper to start the same way too. Are we able to use NSSM to start up zookeeper as a Microsoft Windows service too? Regards, Edwin On 25 May 2015 at 12:16, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Has anyone tried to run Solr 5.1.0 as a Microsoft Windows service? i've tried to follow the steps from this website http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/, which uses NSSM. However, when I tried to start the service from the Component Services in the Windows Control Panel Administrative tools, I get the following message: Windows could not start the Solr5 service on Local Computer. The service did not return an error. This could be an internal Windows error or an internal service error. Is this the correct way to set it up, or is there other methods? Regards, Edwin
Re: Index of Hit in MultiValue fields
We had a similar problem, when searching we wanted to return the doc, and for the multi-valued field we wanted to show only the value that matched the search. This was used for an advanced auto suggestion. As Upaya specified, Highlighting was the good solution for us. Managing in the UI only the unit of information coming from the highlighting. Cheers 2015-05-26 9:42 GMT+01:00 Upayavira u...@odoko.co.uk: The result that Solr returns is the document, not anything beneath, so no, you cannot do this. You could use highlighting, you could parse the output of explains (debug.explains.structured=true will help) to identify which field triggered the match. Alternatively, you could use block joins. Make a parent doc and each of your colours as child docs, then you could return which doc matched. You could use the ExpandComponent to retrieve details of the parent doc (http://heliosearch.org/expand-block-join/) Dunno if any of that helps. Upayavira On Tue, May 26, 2015, at 08:33 AM, Rodolfo Zitellini wrote: Dear List, In my schema I have a couple multi value fields and I would need to retrive the index of which one generated a match. For example let's suppose I have a text field like this with three values: MyField: [0] Red [1] Blue [2] Green Searching for Blue gets me the document, but I would also need the index (1) in the multi value field. I tried using the highligter but it is a bit hackish then to calculate the index. Is it possible without resorting to highlighting? Thanks! Rodolfo -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
AW: Solr 5.1 ignores SOLR_JAVA_MEM setting
Thx. When will 5.2 approximately be released? -Ursprüngliche Nachricht- Von: Timothy Potter [mailto:thelabd...@gmail.com] Gesendet: Dienstag, 26. Mai 2015 17:50 An: solr-user@lucene.apache.org Betreff: Re: Solr 5.1 ignores SOLR_JAVA_MEM setting Yes, same bug. Fixed in 5.2 On Tue, May 26, 2015 at 9:15 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: I also noticed that (see my post this morning) ... SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true ... Is not taken into consideration (anymore). Same bug? -Ursprüngliche Nachricht- Von: Ere Maijala [mailto:ere.maij...@helsinki.fi] Gesendet: Mittwoch, 15. April 2015 09:25 An: solr-user Betreff: Solr 5.1 ignores SOLR_JAVA_MEM setting Folks, just a quick heads-up that apparently Solr 5.1 introduced a change in bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh or environment. I just filed https://issues.apache.org/jira/browse/SOLR-7392. The problem can be circumvented by using SOLR_HEAP setting, e.g. SOLR_HEAP=32G, but it's not mentioned in solr.in.sh by default. --Ere -- Ere Maijala Kansalliskirjasto / The National Library of Finland
Re: Help/Guidance Needed : To reload kstem protword hash without full core reload
Hi Ahmet, Can you please guide me to contribute for this *issue*. I haven't did this before. So I need to know...what should I need to know and how should I start..what IDE or whatever you thought is need to know for a novice. I will be thankful to you :) With Regards Aman Tandon On Tue, May 19, 2015 at 8:10 PM, Aman Tandon amantandon...@gmail.com wrote: That link you provided is exactly I want to do. Thanks Ahmet. With Regards Aman Tandon On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Aman, changing protected words without reindexing makes little or no sense. Regarding protected words, trend is to use solr.KeywordMarkerFilterFactory. Instead I suggest you to work on a more general issue: https://issues.apache.org/jira/browse/SOLR-1307 Ahmet On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon
Re: Running Solr 5.1.0 as a Service on Windows
Hi Edwin, Are there changes you recommend to bin/solr.cmd to make it easier to work with NSSM? If so, please file a JIRA as I'd like to help make that process easier. Thanks. Tim On Mon, May 25, 2015 at 3:34 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: I've managed to get the Solr started as a Windows service after re-configuring the startup script, as I've previously missed out some of the custom configurations there. However, I still couldn't get the zookeeper to start the same way too. Are we able to use NSSM to start up zookeeper as a Microsoft Windows service too? Regards, Edwin On 25 May 2015 at 12:16, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Has anyone tried to run Solr 5.1.0 as a Microsoft Windows service? i've tried to follow the steps from this website http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/, which uses NSSM. However, when I tried to start the service from the Component Services in the Windows Control Panel Administrative tools, I get the following message: Windows could not start the Solr5 service on Local Computer. The service did not return an error. This could be an internal Windows error or an internal service error. Is this the correct way to set it up, or is there other methods? Regards, Edwin
Re: docValues: Can we apply synonym
Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com: We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Solr relevancy score in percentage
Honeslty the only case where the score in percentage could make sense, is for the More Like This. In that case Solr should provide that feature as we perfectly know that the 100 % similar score is a copy of the seed document. If I am right, because of the MLT implementation, not taking care of the identity score, we are getting there weird scores as well. Maybe in there is the only place I would prefer a percentage. Cheers 2015-05-26 16:23 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Currently I've take the score that I get from Solr, and divide it by the maxScore, and multiply it by 100 to get the percentage. All these are done on the coding for the UI. The user will only see the percentage and will not know anything about the score. Since the score by itself is meaningless, so I don't think I should display that score of like 1.7 or 0.2 on the UI, which could further confuse the user and raise alot more questions. Regards, Edwin On 26 May 2015 at 23:07, Shawn Heisey apa...@elyograg.org wrote: On 5/26/2015 8:10 AM, Zheng Lin Edwin Yeo wrote: We want the user to see how relevant the result is with respect to the search query entered, and not how good the results are. But I suspect a problem is that the 1st record will always be 100%, regardless of what is the score, as the 1st record score will always be equals to the maxScore. If you want to give your users *something* then simply display the score that you get from Solr. I recommend that you DON'T give them maxScore, because they will be tempted to make the percentage calculation themselves to try and find meaning where there is none. A clever user will be able to figure out maxScore for themselves simply by sorting on relevance and looking at the score on the top doc. When you get questions about what the number means, and you *WILL* get those questions, you can tell them that the number itself is meaningless and what matters is how the scores within a single result compare to each other -- exactly what you have been told here. Thanks, Shawn -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Index optimize runs in background.
No results yet. I finished the test harness last night (not really a unit test, a stand-alone program that endlessly adds stuff and tests that every commit returns the correct number of docs). 8,000 cycles later there aren't any problems reported. Siiigh. On Tue, May 26, 2015 at 1:51 AM, Modassar Ather modather1...@gmail.com wrote: Hi, Erick you mentioned about a unit test to test the optimize running in background. Kindly share your findings if any. Thanks, Modassar On Mon, May 25, 2015 at 11:47 AM, Modassar Ather modather1...@gmail.com wrote: Thanks everybody for your replies. I have noticed the optimization running in background every time I indexed. This is 5 node cluster with solr-5.1.0 and uses the CloudSolrClient. Kindly share your findings on this issue. Our index has almost 100M documents running on SolrCloud. We have been optimizing the index after indexing for years and it has worked well for us. Thanks, Modassar On Fri, May 22, 2015 at 11:55 PM, Erick Erickson erickerick...@gmail.com wrote: Actually, I've recently seen very similar behavior in Solr 4.10.3, but involving hard commits openSearcher=true, see: https://issues.apache.org/jira/browse/SOLR-7572. Of course I can't reproduce this at will, sii. A unit test should be very simple to write though, maybe I can get to it today. Erick On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk wrote: On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote: On 5/21/2015 6:21 AM, Modassar Ather wrote: I am using Solr-5.1.0. I have an indexer class which invokes cloudSolrClient.optimize(true, true, 1). My indexer exits after the invocation of optimize and the optimization keeps on running in the background. Kindly let me know if it is per design and how can I make my indexer to wait until the optimization is over. Is there a configuration/parameter I need to set for the same. Please note that the same indexer with cloudSolrServer.optimize(true, true, 1) on Solr-4.10 used to wait till the optimize was over before exiting. This is very odd, because I could not get HttpSolrServer to optimize in the background, even when that was what I wanted. I wondered if maybe the Cloud object behaves differently with regard to blocking until an optimize is finished ... except that there is no code for optimizing in CloudSolrClient at all ... so I don't know where the different behavior would actually be happening. A more important question is, why are you optimising? Generally it isn't recommended anymore as it reduces the natural distribution of documents amongst segments and makes future merges more costly. Upayavira
Re: Removing characters like '\n \n' from indexing
I think this is still in topic, Assuming we are using the Extract Update handler, I think the update processor approach still applies. But is it not possible to strip them directly with some extract request handler param? 2015-05-26 16:33 GMT+01:00 Jack Krupansky jack.krupan...@gmail.com: Neither - it removes the characters before indexing. The distinction is that if you remove them during indexing they will still appear in the stored field values even if they are removed from the indexed values, but by removing them before indexing, they will not appear in the stored field values. Again, the distinction is between indexed field values and stored field values. -- Jack Krupansky On Tue, May 26, 2015 at 10:25 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: It is showing up in the search results. Just to confirm, does this UpdateProcessor method remove the characters during indexing or only after indexing has been done? Regards, Edwin On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote: On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote: Hi, Is there a way to remove the special characters like \n during indexing of the rich text documents. I have quite alot of leading \n \n in front of my indexed content of rich text documents due to the space and empty lines with the original documents, and it's causing the content to be flooded with '\n \n' at the start before the actual content comes in. This causes the content to look ugly, and also takes up unnecessary bandwidth in the system. Where is this showing up? If it is in search results, you must use an UpdateProcessor, as these happen before fields are stored (E.g. RegexpReplaceProcessorFactory). If you are concerned about facet results, then you can do it in an analysis chain, for example with a RegexpFilterFactory. Upayavira -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: docValues: Can we apply synonym
I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: docValues: Can we apply synonym
I should investigate that, as usually synonyms are analysis stage. A simple way is to replace the word with all its synonyms ( including original word), but simply using this kind of processor will change the token position and offsets, modifying the actual content of the document . I am from Bombay will become I am from Bombay Mumbai which can be annoying. So a clever approach must be investigated. 2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com: Okay So how could I do it with UpdateProcessors? With Regards Aman Tandon On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: mmm this is different ! Without any customisation, right now you could : - use docValues to provide exact value facets. - Than you can use a copy field, with the proper analysis, to search when a user click on a filter ! So you will see in your facets : Mumbai(3) Bombay(2) And when clicking you see 5 results. A little bit misleading for the users … On the other hand if you you want to apply the synonyms before, the indexing pipeline ( because docValues field can not be analysed), I think you should play with UpdateProcessors. Cheers 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com: We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Help/Guidance Needed : To reload kstem protword hash without full core reload
Hi Aman, Start with creating a jira account and vote/watch that issue. Post on the issue to see if there is still interest on this. Declare that you will be volunteer and ask kindly for guidance. Creator of the issue or one the watchers may respond. Try to digest ideas discussed on the issue. Rise yours. Collaborate. Don't get discouraged if nobody responds, please remember that committers are busy people. If you have implement something you want to share, upload a patch : https://wiki.apache.org/solr/HowToContribute Good luck, Ahmet On Tuesday, May 26, 2015 7:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Ahmet, Can you please guide me to contribute for this *issue*. I haven't did this before. So I need to know...what should I need to know and how should I start..what IDE or whatever you thought is need to know for a novice. I will be thankful to you :) With Regards Aman Tandon On Tue, May 19, 2015 at 8:10 PM, Aman Tandon amantandon...@gmail.com wrote: That link you provided is exactly I want to do. Thanks Ahmet. With Regards Aman Tandon On Tue, May 19, 2015 at 5:06 PM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Aman, changing protected words without reindexing makes little or no sense. Regarding protected words, trend is to use solr.KeywordMarkerFilterFactory. Instead I suggest you to work on a more general issue: https://issues.apache.org/jira/browse/SOLR-1307 Ahmet On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon On Tuesday, May 19, 2015 3:16 AM, Aman Tandon amantandon...@gmail.com wrote: Please help or I am not clear here? With Regards Aman Tandon On Mon, May 18, 2015 at 9:47 PM, Aman Tandon amantandon...@gmail.com wrote: Hi, *Problem Statement: *I want to reload an hash of protwords created by the kstem filter without reloading the whole index core. *My Thought: *I am thinking to reload the hash by passing a parameter like *r=1 *to analysis url request (to somehow pass the parameter via url). And I am thinking if somehow by changing the IndexSchema.java I might can pass this parameter though my analyzer chain to KStemFilter. In which I will call the initializeDictionary function to make protwords hash again from the file if *r=1*, instead of making full core reload request. Please guide me, I know question might be stupid, the thought came in my mind and I want to share and ask some suggestions here. Is it possible or not and how can i achieve the same? I will be thankful for guidance. With Regards Aman Tandon
Re: No results for MoreLikeThis
Good call. I'd previously attempted to use one of my fields, however, and it didn't work. I then thought maybe broadening it to list anything could help. I'd tried using the interestingTerms parameter as well. Just for the sake of double checking before replying to your message, though, I changed fl once more to the field I was hoping to find items related to. I had a typo, though, and it worked. Instead of 'descript2' I used 'descript' and voila. 'descript' is the indexed field, descript2 is a copyField that uses a different analyzer (the one I'm actually using for querying). I guess it only takes non-copy (and maybe non-dynamic?) fields into account? Thanks for any more information on that field specific approach/issue! -- *John Blythe* Product Manager Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Tue, May 26, 2015 at 4:16 PM, Upayavira u...@odoko.co.uk wrote: I doubt mlt.fl=* will work. Provide it with specific field names that should be used for the comparison. Upayavira On Tue, May 26, 2015, at 08:17 PM, John Blythe wrote: hi all, running a query like this, but am getting no results from the mlt handler: http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1 been googling around without any luck as of yet. i have the requestHandler added to solrconfig.xml: requestHandler name=/mlt class=solr.MoreLikeThisHandler / and confirm it is loaded in the Plugins/Stats area of the solr admin interface. i've tried adding minimum word length, term frequency, etc. per a post or two i ran across where people had similar issues resolved by doing so, but it didn't help any. i'm not getting any errors, what puzzle piece am i missing in my configuration or query building? thanks! - john
Re: Problem with numeric math types and the dataimport handler
On 5/20/2015 12:06 AM, Shalin Shekhar Mangar wrote: Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I fixed in 4.10. Can you try a newer release? Looks like that didn't fix it. I applied the patch on SOLR-6165 to the lucene_solr_4_9_1 tag, built a new war, and when it was done, restarted Solr with that war. The solr-impl version in the dashboard is now 4.9-SNAPSHOT 1680667 - solr - 2015-05-20 14:23:11 After some importing with DIH and a Solr restart, this is the most recent error in the log: WARN - 2015-05-26 14:28:09.289; org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException reading log org.apache.solr.common.SolrException: ERROR: [doc=usatphotos084190] Error adding field 'did'='java.math.BigInteger:1214221' msg=For input string: java.math.BigInteger:1214221 Looks like we'll need a new issue. I'm not in a position right now to try a newer Solr version than 4.9.1. Thanks, Shawn
RE: NPE when faceting with MLT Query from upgrade to Solr 5.1.0
I have added a patch which should fix the problem. https://issues.apache.org/jira/browse/SOLR-7559 Please review. Cheers, Jeroen -Original Message- From: Jeroen Steggink [mailto:jeroen.stegg...@contentstrategy.nl] Sent: dinsdag 26 mei 2015 21:45 To: solr-user@lucene.apache.org Subject: RE: NPE when faceting with MLT Query from upgrade to Solr 5.1.0 Hi Tim, I just ran into the exact same problem. I see you created a bug in JIRA. I will check what is causing this and try and fix it. https://issues.apache.org/jira/browse/SOLR-7559 Jeroen -Original Message- From: Tim H [mailto:th98...@gmail.com] Sent: maandag 18 mei 2015 17:28 To: solr-user@lucene.apache.org Subject: NPE when faceting with MLT Query from upgrade to Solr 5.1.0 Hi everyone, Recently I upgraded to solr 5.1.0. When trying to generate facets using the more like this handler, I now get a a NullPointerException. I never got this exception while using Solr 4.10.0 Details are below: Stack Trace: at org.apache.solr.request.SimpleFacets.getHeatmapCounts(SimpleFacets.java:1555) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:284) at org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:233) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Query: qt=/mlt q=id:545dbb57b54c2403f286050e546dcdcab54cf2d074e5a2f7 mlt.mindf=5 mlt.mintf=1 mlt.minwl=3 mlt.boost=true fq=storeid:546dcdcab54cf2d074e5a2f7 mlt.fl=overview_mlt,abstract_mlt,description_mlt,company_profile_mlt,bio_mlt mlt.interestingTerms=details fl=conceptid,score sort=score desc start=0 rows=2 facet=true facet.field=tags facet.field=locations facet.mincount=1 facet.method=enum facet.limit=-1 facet.sort=count Schema.xml(relevant parts): field name=tags type=string indexed=true stored=true multiValued=true / field name=locations type=string indexed=true stored=true multiValued=true / dynamicField name=*_mlt stored=true indexed=true type=text_general termVectors=true multiValued=true / solrconfig.xml(relevant parts): requestHandler name=/mlt class=solr.MoreLikeThisHandler /requestHandler
Re: No results for MoreLikeThis
I doubt mlt.fl=* will work. Provide it with specific field names that should be used for the comparison. Upayavira On Tue, May 26, 2015, at 08:17 PM, John Blythe wrote: hi all, running a query like this, but am getting no results from the mlt handler: http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1 been googling around without any luck as of yet. i have the requestHandler added to solrconfig.xml: requestHandler name=/mlt class=solr.MoreLikeThisHandler / and confirm it is loaded in the Plugins/Stats area of the solr admin interface. i've tried adding minimum word length, term frequency, etc. per a post or two i ran across where people had similar issues resolved by doing so, but it didn't help any. i'm not getting any errors, what puzzle piece am i missing in my configuration or query building? thanks! - john
Re: No results for MoreLikeThis
Just checked my schema.xml and think that the issue is resulting from the stored property being set false on descript2 and true on descript. -- *John Blythe* Product Manager Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Tue, May 26, 2015 at 4:22 PM, John Blythe j...@curvolabs.com wrote: Good call. I'd previously attempted to use one of my fields, however, and it didn't work. I then thought maybe broadening it to list anything could help. I'd tried using the interestingTerms parameter as well. Just for the sake of double checking before replying to your message, though, I changed fl once more to the field I was hoping to find items related to. I had a typo, though, and it worked. Instead of 'descript2' I used 'descript' and voila. 'descript' is the indexed field, descript2 is a copyField that uses a different analyzer (the one I'm actually using for querying). I guess it only takes non-copy (and maybe non-dynamic?) fields into account? Thanks for any more information on that field specific approach/issue! -- *John Blythe* Product Manager Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Tue, May 26, 2015 at 4:16 PM, Upayavira u...@odoko.co.uk wrote: I doubt mlt.fl=* will work. Provide it with specific field names that should be used for the comparison. Upayavira On Tue, May 26, 2015, at 08:17 PM, John Blythe wrote: hi all, running a query like this, but am getting no results from the mlt handler: http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1 been googling around without any luck as of yet. i have the requestHandler added to solrconfig.xml: requestHandler name=/mlt class=solr.MoreLikeThisHandler / and confirm it is loaded in the Plugins/Stats area of the solr admin interface. i've tried adding minimum word length, term frequency, etc. per a post or two i ran across where people had similar issues resolved by doing so, but it didn't help any. i'm not getting any errors, what puzzle piece am i missing in my configuration or query building? thanks! - john
Re: [solr 5.1] Looking for full text + collation search field
Hi Bjorn, Not 100% sure but, ICUFoldingFilter may suit for you. It also removes diacritics. ahmet On Thursday, May 21, 2015 3:20 PM, Björn Keil greifenschwi...@yahoo.de wrote: Thanks for the advice. I have tried the field type and it seems to do what it is supposed to in combination with a lower case filter. However, that raises another slight problem: German umlauts are supposed to be treated slightly different for the purpose of searching than for sorting. For sorting a normal ICUCollationField with standard rules should suffice*, for the purpose of searching I cannot just replace an ü with a u, ü is supposed to equal ue, or, in terms of RuleBasedCollators, there is a secondary difference. The rules for the collator include: ue , ü ae , ä oe , ö ss , ß (again, that applies to searching *only*, for the sorting the rule a , ä would apply, which is implied in the default rules.) I can of course program a filter that does these rudimentary replacements myself, at best after the lower case filter but before the ASCIIFoldingFilter, I am just wondering if there isn't some way to use collations keys for full text search. * even though Latin script and specifically German is my primary concern, I want some rudimentary support for all European languages, including ones that use Cyrillic and Greek script, special symbols in Icelandic that are not strictly Latin and ligatures like Æ, which collation keys could easily provide. Ahmet Arslan iori...@yahoo.com.INVALID schrieb am 22:10 Mittwoch, 20.Mai 2015: Hi Bjorn, solr.ICUCollationField is useful for *sorting*, and you cannot sort on tokenized fields. Your example looks like diacritics insensitive search. Please see : ASCIIFoldingFilterFactory Ahmet On Wednesday, May 20, 2015 2:53 PM, Björn Keil deeph...@web.de wrote: Hello, might anyone suggest a field type with which I may do both a full text search (i.e. there is an analyzer including a tokenizer) and apply a collation? An example for what I want to do: There is a field composer for which I passed the value Dvořák, Antonín. I want the following queries to match: composer:(antonín dvořák) composer:dvorak composer:dvorak, antonin the latter case is possible using a solr.ICUCollationField, but that type does not support an Analyzer and consequently no tokenizer, thus, it is not helpful. Unlike former versions of solr there do not seem to be CollationKeyFilters which you may hang into the analyzer of a solr.TextField... so I am a bit at a loss how I get *both* a tokenizer and a collation at the same time. Thanks for help, Björn
Re: Solr relevancy score in percentage
On 5/26/2015 8:10 AM, Zheng Lin Edwin Yeo wrote: We want the user to see how relevant the result is with respect to the search query entered, and not how good the results are. But I suspect a problem is that the 1st record will always be 100%, regardless of what is the score, as the 1st record score will always be equals to the maxScore. If you want to give your users *something* then simply display the score that you get from Solr. I recommend that you DON'T give them maxScore, because they will be tempted to make the percentage calculation themselves to try and find meaning where there is none. A clever user will be able to figure out maxScore for themselves simply by sorting on relevance and looking at the score on the top doc. When you get questions about what the number means, and you *WILL* get those questions, you can tell them that the number itself is meaningless and what matters is how the scores within a single result compare to each other -- exactly what you have been told here. Thanks, Shawn
Different behavior (bug?) for RegExTransformer in Solr5
I'm experimenting with Solr5 (5.1.0 1672403 - timpotter - 2015-04-09 10:37:54). In my custom DIH, I use a RegExTransformer to load several columns, which may or may not be present. If present, the regexp matches and the data loads correctly in both Solr4 and 5. If not present and the regexp fails, the column is empty in Solr 4. But in Solr5 it contains the original string to be matched. In other words, in Solr 5.10, if the 'replaceWith' value is empty, 'replaceWith' appears to revert to the original string. Example: Column 'data' contains: column1:xxx,column3:yyy DIH regexp: field column=column1 regex=^.*column1:(.*?),.*$ replaceWith=$1 sourceColName=data / field column=column2 regex=^.*column2:(.*?),.*$ replaceWith=$1 sourceColName=data / field column=column3 regex=^.*column3:(.*?),.*$ replaceWith=$1 sourceColName=data / solr4: column1: xxx column2: column3: yyy solr5: column1:xxx column2: column1:xxx,column3:yyy column3: yyy
No results for MoreLikeThis
hi all, running a query like this, but am getting no results from the mlt handler: http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1 been googling around without any luck as of yet. i have the requestHandler added to solrconfig.xml: requestHandler name=/mlt class=solr.MoreLikeThisHandler / and confirm it is loaded in the Plugins/Stats area of the solr admin interface. i've tried adding minimum word length, term frequency, etc. per a post or two i ran across where people had similar issues resolved by doing so, but it didn't help any. i'm not getting any errors, what puzzle piece am i missing in my configuration or query building? thanks! - john
RE: NPE when faceting with MLT Query from upgrade to Solr 5.1.0
Hi Tim, I just ran into the exact same problem. I see you created a bug in JIRA. I will check what is causing this and try and fix it. https://issues.apache.org/jira/browse/SOLR-7559 Jeroen -Original Message- From: Tim H [mailto:th98...@gmail.com] Sent: maandag 18 mei 2015 17:28 To: solr-user@lucene.apache.org Subject: NPE when faceting with MLT Query from upgrade to Solr 5.1.0 Hi everyone, Recently I upgraded to solr 5.1.0. When trying to generate facets using the more like this handler, I now get a a NullPointerException. I never got this exception while using Solr 4.10.0 Details are below: Stack Trace: at org.apache.solr.request.SimpleFacets.getHeatmapCounts(SimpleFacets.java:1555) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:284) at org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:233) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:745) Query: qt=/mlt q=id:545dbb57b54c2403f286050e546dcdcab54cf2d074e5a2f7 mlt.mindf=5 mlt.mintf=1 mlt.minwl=3 mlt.boost=true fq=storeid:546dcdcab54cf2d074e5a2f7 mlt.fl=overview_mlt,abstract_mlt,description_mlt,company_profile_mlt,bio_mlt mlt.interestingTerms=details fl=conceptid,score sort=score desc start=0 rows=2 facet=true facet.field=tags facet.field=locations facet.mincount=1 facet.method=enum facet.limit=-1 facet.sort=count Schema.xml(relevant parts): field name=tags type=string indexed=true stored=true multiValued=true / field name=locations type=string indexed=true stored=true multiValued=true / dynamicField name=*_mlt stored=true indexed=true type=text_general termVectors=true multiValued=true / solrconfig.xml(relevant parts): requestHandler name=/mlt class=solr.MoreLikeThisHandler /requestHandler
Re: When is too many fields in qf is too many?
Thanks Doug. I might have to take you on the hangout offer. Let me refine the requirement further and if I still see the need, I will let you know. Steve On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: How you have tie is fine. Setting tie to 1 might give you reasonable results. You could easily still have scores that are just always an order of magnitude or two higher, but try it out! BTW Anything you put in teh URL can also be put into a request handler. If you ever just want to have a 15 minute conversation via hangout, happy to chat with you :) Might be fun to think through your prob together. -Doug On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com wrote: Hi Doug, I'm back to this topic. Unfortunately, due to my DB structer, and business need, I will not be able to search against a single field (i.e.: using copyField). Thus, I have to use list of fields via qf. Given this, I see you said above to use tie=1.0 will that, more or less, address this scoring issue? Should tie=1.0 be set on the request handler like so: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypeedismax/str str name=qfF1 F2 F3 F4 ... ... .../str float name=tie1.0/float str name=fl_UNIQUE_FIELD_,score/str str name=wtxml/str str name=indenttrue/str /lst /requestHandler Or must tie be passed as part of the URL? Thanks Steve On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: Yeah a copyField into one could be a good space/time tradeoff. It can be more manageable to use an all field for both relevancy and performance, if you can handle the duplication of data. You could set tie=1.0, which effectively sums all the matches instead of picking the best match. You'll still have cases where one field's score might just happen to be far off of another, and thus dominating the summation. But something easy to try if you want to keep playing with dismax. -Doug On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com wrote: Hi Doug, Your blog write up on relevancy is very interesting, I didn't know this. Looks like I have to go back to my drawing board and figure out an alternative solution: somehow get those group-based-fields data into a single field using copyField. Thanks Steve On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull dturnb...@opensourceconnections.com wrote: Steven, I'd be concerned about your relevance with that many qf fields. Dismax takes a winner takes all point of view to search. Field scores can vary by an order of magnitude (or even two) despite the attempts of query normalization. You can read more here http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/ I'm about to win the blashphemer merit badge, but ad-hoc all-field like searching over many fields is actually a good use case for Elasticsearch's cross field queries. https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/ It wouldn't be hard (and actually a great feature for the project) to get the Lucene query associated with cross field search into Solr. You could easily write a plugin to integrate it into a query parser: https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java Hope that helps -Doug -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, LLC | 240.476.9983 | http://www.opensourceconnections.com Author: Relevant Search http://manning.com/turnbull from Manning Publications This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such. On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com wrote: Hi everyone, My solution requires that users in group-A can only search against a set of fields-A and users in group-B can only search against a set of fields-B, etc. There can be several groups, as many as 100 even more. To meet this need, I build my search by passing in the list of fields via qf. What goes into qf can be large: as many as 1500 fields and each field name
Re: Solr 5.1 ignores SOLR_JAVA_MEM setting
Probably in the next week or so. The branch has been cut, the release is being put together/tested/finalized with the usual process. On Tue, May 26, 2015 at 9:37 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: Thx. When will 5.2 approximately be released? -Ursprüngliche Nachricht- Von: Timothy Potter [mailto:thelabd...@gmail.com] Gesendet: Dienstag, 26. Mai 2015 17:50 An: solr-user@lucene.apache.org Betreff: Re: Solr 5.1 ignores SOLR_JAVA_MEM setting Yes, same bug. Fixed in 5.2 On Tue, May 26, 2015 at 9:15 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: I also noticed that (see my post this morning) ... SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true ... Is not taken into consideration (anymore). Same bug? -Ursprüngliche Nachricht- Von: Ere Maijala [mailto:ere.maij...@helsinki.fi] Gesendet: Mittwoch, 15. April 2015 09:25 An: solr-user Betreff: Solr 5.1 ignores SOLR_JAVA_MEM setting Folks, just a quick heads-up that apparently Solr 5.1 introduced a change in bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh or environment. I just filed https://issues.apache.org/jira/browse/SOLR-7392. The problem can be circumvented by using SOLR_HEAP setting, e.g. SOLR_HEAP=32G, but it's not mentioned in solr.in.sh by default. --Ere -- Ere Maijala Kansalliskirjasto / The National Library of Finland
Re: Problem with numeric math types and the dataimport handler
On 5/26/2015 2:37 PM, Shawn Heisey wrote: On 5/20/2015 12:06 AM, Shalin Shekhar Mangar wrote: Sounds similar to https://issues.apache.org/jira/browse/SOLR-6165 which I fixed in 4.10. Can you try a newer release? Looks like that didn't fix it. I applied the patch on SOLR-6165 to the lucene_solr_4_9_1 tag, built a new war, and when it was done, restarted Solr with that war. The solr-impl version in the dashboard is now 4.9-SNAPSHOT 1680667 - solr - 2015-05-20 14:23:11 After some importing with DIH and a Solr restart, this is the most recent error in the log: WARN - 2015-05-26 14:28:09.289; org.apache.solr.update.UpdateLog$LogReplayer; REYPLAY_ERR: IOException reading log org.apache.solr.common.SolrException: ERROR: [doc=usatphotos084190] Error adding field 'did'='java.math.BigInteger:1214221' msg=For input string: java.math.BigInteger:1214221 Looks like we'll need a new issue. I'm not in a position right now to try a newer Solr version than 4.9.1. Given the way that I use Solr, this is honestly not really a major problem for me. Within five minutes or so after DIH is done, my transaction logs will only contain data indexed via SolrJ, so this problem will be gone. The reason I think it's worth fixing, assuming it's still a problem in 5.2: There are people that use DIH *exclusively* for indexing, and for those people, this could become a real problem, because tlog replay won't work. Thanks, Shawn
Re: SolrCloud 4.8 - Transaction log size over 1GB
right, autoCommit (in solrconfig.xml) will 1 close the current Lucene segments and open a new one 2 close the tlog and start a new one. Those actions are independent of whether openSearcher=true or false. if (and only if) openSearcher=true, then the commits will be immediately visible to a query. So then it's up to you to issue either a soft commit (or hard commit with openSearcher=true) at some point for the docs to be visible. bq: Does it mean, let me say, that when openSearcher=false we have implicit commit done by solrCloud autoCommit not visible to world and explicit commit done by clients visible to world? Exactly. Now, this all assumes that you want all your recent indexing to be visible at once. If you don't care whether documents become visible while you're indexing but before the whole thing is done, then: 1 set autoCommit with openSearcher=false to some fairly short interval, say 1 minute. 2 set autoSoftCommit to some longer interval (say 5 minutes). Now you don't have to do anything at all. Don't commit from the client. Just wait 5 minutes after the indexing is done before expecting to see _all_ the docs from your indexing run. Do note one quirk though. Let's claim you're doing autoCommits with openSearcher=false. If you restart Solr, then those changes _will_ become visible. Best, Erick On Tue, May 26, 2015 at 9:33 AM, Vincenzo D'Amore v.dam...@gmail.com wrote: Thanks Erick for your willingness and patience, if I understood well when autoCommit with openSearcher=true at first commit (soft or hard) all new documents will be automatically available for search. But when openSearcher=false, the commit will flush recent index changes to stable storage, but does not cause a new searcher to be opened to make those changes visible https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-autoCommit . So, it is not clear what is this stable storage, where is and when the new documents will be visible? Only when at very end of indexing process my code will commit ? Does it mean, let me say, that when openSearcher=false we have implicit commit done by solrCloud autoCommit not visible to world and explicit commit done by clients visible to world? On Tue, May 26, 2015 at 2:55 AM, Erick Erickson erickerick...@gmail.com wrote: The design is that the latest successfully flushed tlog file is kept for peer sync in SolrCloud mode. When a replica comes up, there's a chance that it's not very many docs behind. So, if possible, some of the docs are taken from the leader's tlog and replayed to the follower that's just been started. If the follower is too far out of sync, a full old-style replication is done. So there will always be a tlog file (and occasionally more than one if they're very small) kept around, even on successful commit. It doesn't matter if you have leaders and replicas or not, that's still the process that's followed. Please re-read the link I sent earlier. There's absolutely no reason your tlog files have to be so big! Really, set you autoCommit to, say, 15 seconds and 10 docs and set openSearcher=false in your solrconfig.xml file and your tlog file that's kept around will be much smaller and they'll be available for peer sync.. And if you really don't care about tlogs at all, just take this bit our of your solrconfig.xml updateLog str name=dir${solr.ulog.dir:}/str int name=${solr.ulog.numVersionBuckets:256}/int /updateLog Best, Erick On Mon, May 25, 2015 at 4:40 PM, Vincenzo D'Amore v.dam...@gmail.com wrote: Hi Erick, I have tried indexing code I have few times, this is the behaviour I have tried out: When an indexing process starts, even if one or more tlog file exists, a new tlog file is created and all the new documents are stored there. When indexing process ends and does an hard commit, older old tlog files are removed but the new one (the latest) remains. As far as I can see, since my indexing process every time loads few millions of documents, at end of process latest tlog file persist with all these documents there. So I have such big tlog files. Now the question is, why latest tlog file persist even if the code have done a hard commit. When an hard commit is done successfully, why should we keep latest tlog file? On Mon, May 25, 2015 at 7:24 PM, Erick Erickson erickerick...@gmail.com wrote: OK, assuming you're not doing any commits at all until the very end, then the tlog contains all the docs for the _entire_ run. The article really doesn't care whether the commits come from the solrconfig.xml or SolrJ client or curl. The tlog simply is not truncated until a hard commit happens, no matter where it comes from. So here's what I'd do: 1 set autoCommit in your solrconfig.xml with openSearcher=false for every minute. Then the problem will probably go away. or 2 periodically issue a hard
Re: No results for MoreLikeThis
If the source document is in your index (i.e. not passed in via stream.body) then the fields used will either need to be stored or have term vectors enabled. The latter is more performant. Upayavira On Tue, May 26, 2015, at 09:24 PM, John Blythe wrote: Just checked my schema.xml and think that the issue is resulting from the stored property being set false on descript2 and true on descript. -- *John Blythe* Product Manager Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Tue, May 26, 2015 at 4:22 PM, John Blythe j...@curvolabs.com wrote: Good call. I'd previously attempted to use one of my fields, however, and it didn't work. I then thought maybe broadening it to list anything could help. I'd tried using the interestingTerms parameter as well. Just for the sake of double checking before replying to your message, though, I changed fl once more to the field I was hoping to find items related to. I had a typo, though, and it worked. Instead of 'descript2' I used 'descript' and voila. 'descript' is the indexed field, descript2 is a copyField that uses a different analyzer (the one I'm actually using for querying). I guess it only takes non-copy (and maybe non-dynamic?) fields into account? Thanks for any more information on that field specific approach/issue! -- *John Blythe* Product Manager Lead Developer 251.605.3071 | j...@curvolabs.com www.curvolabs.com 58 Adams Ave Evansville, IN 47713 On Tue, May 26, 2015 at 4:16 PM, Upayavira u...@odoko.co.uk wrote: I doubt mlt.fl=* will work. Provide it with specific field names that should be used for the comparison. Upayavira On Tue, May 26, 2015, at 08:17 PM, John Blythe wrote: hi all, running a query like this, but am getting no results from the mlt handler: http://localhost:8983/solr/parts/select?q=mfgname2%3A+Acme Corp+descript2%3A+(SCREW+3.5X50MM)start=0rows=1fl=*%2C+scorewt=jsonindent=truemlt=truemlt.fl=*mlt.mintf=1mlt.mindf=1mlt.minwl=1 been googling around without any luck as of yet. i have the requestHandler added to solrconfig.xml: requestHandler name=/mlt class=solr.MoreLikeThisHandler / and confirm it is loaded in the Plugins/Stats area of the solr admin interface. i've tried adding minimum word length, term frequency, etc. per a post or two i ran across where people had similar issues resolved by doing so, but it didn't help any. i'm not getting any errors, what puzzle piece am i missing in my configuration or query building? thanks! - john
Re: Solr relevancy score in percentage
On May 26, 2015, at 7:10 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: We want the user to see how relevant the result is with respect to the search query entered, and not how good the results are. That is the meaning of the score from a probabilistic model search engine. Solr is not a probabilistic engine, it is vector space engine. The scores are fundamentally different. Treating it as a probability of relevance will not work. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
Sync failure after shard leader election when adding new replica.
Hi, I have a SolrCloud setup, running 4.10.3. The setup consists of several cores, each with a single shard and initially each shard has a single replica (so, basically, one machine). I am using core discovery, and my deployment tools create an empty core on newly provisioned machines. The scenario that I am testing is, Machine 1 is running and writes are occurring from my application to Solr. At some point, I stop Machine 1, and reconfigure my application to add Machine 2. Both machines are then started. What I would expect to happen at this point, is Machine 2 cannot become leader because it is behind compared to Machine 1. Machine 2 would then restore from Machine 1. However, looking at the logs. I am seeing Machine 2 become elected leader and fail the PeerRestore 2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - Enough replicas found to continue. 2015-05-24 17:20:25.983 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - I may be the new leader - try and sync 2015-05-24 17:20:25.997 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.update.PeerSync - PeerSync: core=project url=http://10.32.132.64:11000/solr START replicas=[http://jchar-1:11000/solr/project/] nUpdates=100 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.update.PeerSync - PeerSync: core=project url=http://10.32.132.64:11000/solr DONE. We have no versions. sync failed. 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - We failed sync, but we have no versions - we can't sync in that case - we were active before, so become leader anyway 2015-05-24 17:20:25.999 -0700 (,,,) coreZkRegister-1-thread-4 : INFO org.apache.solr.cloud.ShardLeaderElectionContext - I am the new leader: http://10.32.132.64:11000/solr/project/ shard1 What is the expected behavior here? What’s the best practice for adding a new replica? Should I have the SolrCloud running and do it via the Collections API or can I continue to use core discovery? Thanks.
Re: YAJar
Run whatever tests you want with 14.0.1, replace it with 18.0, rerun the tests and compare. François On May 26, 2015, at 10:25 AM, Robust Links pey...@robustlinks.com wrote: by dumping you mean recompiling solr with guava 18? On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte fschietteca...@gmail.com wrote: Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a while ago and it worked fine for me. François On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com wrote: i have a minhash logic that uses guava 18.0 method that is not in guava 14.0.1. This minhash logic is a separate maven project. I'm including it in my project via maven.the code is being used as a search component on the set of results. The logic goes through the search results and deletes duplicates. here is the solrconfig.xml requestHandler name=/select class=solr.SearchHandler default=true arr name=last-components strtvComponent/str strterms/str strminHashDedup/str /arr /requestHandler searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr name=MAX_COMPARISONS5/str DedupSearchHits class is the one implementing the minhash (hence using guava 18). I start solr via the solr.in.sh script. The error I am getting is: Caused by: java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode; at com.xyz.incrementToken(MinHashTokenFilter.java:54) at com.xyz.MinHash.calculate(MinHash.java:131) at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89) at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297) at org.apache.solr.core.SolrCore.init(SolrCore.java:813) What is the best design to solve this problem?I understand the point of modularity but how can i include logic into solr that does result processing without loading that jar into solr? thank you On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com wrote: I guess this is one reason why the whole WAR approach is being removed! Solr should be a black-box that you talk to, and get responses from. What it depends on and how it is deployed, should be irrelevant to you. If you are wanting to override the version of guava that Solr uses, then you'd have to rebuild Solr (can be done with maven) and manually update the pom.xml to use guava 18.0, but why would you? You need to test Solr completely (in case any guava bugs affect Solr), deal with any build issues that arise (if guava changes any APIs), and cause yourself a world of pain, for what gain? On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote: i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: YAJar
What I am suggesting is that you set up a stand alone version of solr with 14.0.1 and run some sort of test suite similar to what you would normally use solr for in your app. The replace the guava jar and re-run the tests. If all works well, and I suspect it will because it did for me, then you can use 18.0. Simple really. François On May 26, 2015, at 10:30 AM, Robust Links pey...@robustlinks.com wrote: i can't run 14.0.1. that is the problem. 14 does not have the interfaces i need On Tue, May 26, 2015 at 10:28 AM, François Schiettecatte fschietteca...@gmail.com wrote: Run whatever tests you want with 14.0.1, replace it with 18.0, rerun the tests and compare. François On May 26, 2015, at 10:25 AM, Robust Links pey...@robustlinks.com wrote: by dumping you mean recompiling solr with guava 18? On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte fschietteca...@gmail.com wrote: Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a while ago and it worked fine for me. François On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com wrote: i have a minhash logic that uses guava 18.0 method that is not in guava 14.0.1. This minhash logic is a separate maven project. I'm including it in my project via maven.the code is being used as a search component on the set of results. The logic goes through the search results and deletes duplicates. here is the solrconfig.xml requestHandler name=/select class=solr.SearchHandler default=true arr name=last-components strtvComponent/str strterms/str strminHashDedup/str /arr /requestHandler searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr name=MAX_COMPARISONS5/str DedupSearchHits class is the one implementing the minhash (hence using guava 18). I start solr via the solr.in.sh script. The error I am getting is: Caused by: java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode; at com.xyz.incrementToken(MinHashTokenFilter.java:54) at com.xyz.MinHash.calculate(MinHash.java:131) at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89) at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297) at org.apache.solr.core.SolrCore.init(SolrCore.java:813) What is the best design to solve this problem?I understand the point of modularity but how can i include logic into solr that does result processing without loading that jar into solr? thank you On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com wrote: I guess this is one reason why the whole WAR approach is being removed! Solr should be a black-box that you talk to, and get responses from. What it depends on and how it is deployed, should be irrelevant to you. If you are wanting to override the version of guava that Solr uses, then you'd have to rebuild Solr (can be done with maven) and manually update the pom.xml to use guava 18.0, but why would you? You need to test Solr completely (in case any guava bugs affect Solr), deal with any build issues that arise (if guava changes any APIs), and cause yourself a world of pain, for what gain? On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote: i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: Solr relevancy score in percentage
We want the user to see how relevant the result is with respect to the search query entered, and not how good the results are. But I suspect a problem is that the 1st record will always be 100%, regardless of what is the score, as the 1st record score will always be equals to the maxScore. Regards, Edwin On 26 May 2015 at 19:36, Daniel Collins danwcoll...@gmail.com wrote: The question is more why do you want your users to see the scores? If they are wanting to affect ranking, what you want is the ability to run the same query with different boosting and see the difference (2 result sets), then see if the new ordering is better or worse. What the actual/raw score is is irrelevant to that, what is important is ordering? If you want to show how good your results are, then as the link shows, that is very difficult to measure (and very subjective!) On 26 May 2015 at 09:37, Upayavira u...@odoko.co.uk wrote: Correct. The relevancy score simply states that we think result #1 is more relevant than result #2. It doesn't say that #1 is relevant. The score doesn't have any validity across queries either, as, for example, a different number of query terms will cause the score to change. Upayavira On Tue, May 26, 2015, at 08:57 AM, Zheng Lin Edwin Yeo wrote: Hi Arslan, Thank you for the link. That means we are not advisable to show anything that's related to the relevancy score, even though the default sorting of the result is by relevancy score? Since showing the raw relevancy score does not make any sense to the user since they won't understand what it means too. Regards, Edwin On 26 May 2015 at 14:16, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Edwin, Somehow, it is not recommended to display the relevancy score in percentage: https://wiki.apache.org/lucene-java/ScoresAsPercentages Ahmet On Tuesday, May 26, 2015 8:34 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote: Hi, Would like to check, does the new version of Solr allows this function of display the relevancy score in percentage? I understand from the older version that it is not able to, and the only way is to take the highest score and use that as 100%, and calculate other percentage from that number (For example if the max score is 10 and the next result has a score of 5, you would do (5 / 10) * 100 = 50%) Is there a better way to do this now? I'm using Solr 5.1 Regards, Edwin
Re: YAJar
I'm not aware of a way you can do this, other than upgrading the Guava in Solr itself. Or rather, you'd need to create your own classloader and load your own instance of Guava using that rather than the default classloader. That's possible, but would be rather ugly and complex. I'd say research what's required to upgrade the Guava in Solr. Upayavira On Tue, May 26, 2015, at 03:11 PM, Robust Links wrote: i have a minhash logic that uses guava 18.0 method that is not in guava 14.0.1. This minhash logic is a separate maven project. I'm including it in my project via maven.the code is being used as a search component on the set of results. The logic goes through the search results and deletes duplicates. here is the solrconfig.xml requestHandler name=/select class=solr.SearchHandler default=true arr name=last-components strtvComponent/str strterms/str strminHashDedup/str /arr /requestHandler searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr name=MAX_COMPARISONS5/str DedupSearchHits class is the one implementing the minhash (hence using guava 18). I start solr via the solr.in.sh script. The error I am getting is: Caused by: java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode; at com.xyz.incrementToken(MinHashTokenFilter.java:54) at com.xyz.MinHash.calculate(MinHash.java:131) at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89) at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297) at org.apache.solr.core.SolrCore.init(SolrCore.java:813) What is the best design to solve this problem?I understand the point of modularity but how can i include logic into solr that does result processing without loading that jar into solr? thank you On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com wrote: I guess this is one reason why the whole WAR approach is being removed! Solr should be a black-box that you talk to, and get responses from. What it depends on and how it is deployed, should be irrelevant to you. If you are wanting to override the version of guava that Solr uses, then you'd have to rebuild Solr (can be done with maven) and manually update the pom.xml to use guava 18.0, but why would you? You need to test Solr completely (in case any guava bugs affect Solr), deal with any build issues that arise (if guava changes any APIs), and cause yourself a world of pain, for what gain? On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote: i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: YAJar
Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a while ago and it worked fine for me. François On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com wrote: i have a minhash logic that uses guava 18.0 method that is not in guava 14.0.1. This minhash logic is a separate maven project. I'm including it in my project via maven.the code is being used as a search component on the set of results. The logic goes through the search results and deletes duplicates. here is the solrconfig.xml requestHandler name=/select class=solr.SearchHandler default=true arr name=last-components strtvComponent/str strterms/str strminHashDedup/str /arr /requestHandler searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr name=MAX_COMPARISONS5/str DedupSearchHits class is the one implementing the minhash (hence using guava 18). I start solr via the solr.in.sh script. The error I am getting is: Caused by: java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode; at com.xyz.incrementToken(MinHashTokenFilter.java:54) at com.xyz.MinHash.calculate(MinHash.java:131) at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89) at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297) at org.apache.solr.core.SolrCore.init(SolrCore.java:813) What is the best design to solve this problem?I understand the point of modularity but how can i include logic into solr that does result processing without loading that jar into solr? thank you On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com wrote: I guess this is one reason why the whole WAR approach is being removed! Solr should be a black-box that you talk to, and get responses from. What it depends on and how it is deployed, should be irrelevant to you. If you are wanting to override the version of guava that Solr uses, then you'd have to rebuild Solr (can be done with maven) and manually update the pom.xml to use guava 18.0, but why would you? You need to test Solr completely (in case any guava bugs affect Solr), deal with any build issues that arise (if guava changes any APIs), and cause yourself a world of pain, for what gain? On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote: i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: Removing characters like '\n \n' from indexing
It is showing up in the search results. Just to confirm, does this UpdateProcessor method remove the characters during indexing or only after indexing has been done? Regards, Edwin On 26 May 2015 at 21:30, Upayavira u...@odoko.co.uk wrote: On Tue, May 26, 2015, at 02:20 PM, Zheng Lin Edwin Yeo wrote: Hi, Is there a way to remove the special characters like \n during indexing of the rich text documents. I have quite alot of leading \n \n in front of my indexed content of rich text documents due to the space and empty lines with the original documents, and it's causing the content to be flooded with '\n \n' at the start before the actual content comes in. This causes the content to look ugly, and also takes up unnecessary bandwidth in the system. Where is this showing up? If it is in search results, you must use an UpdateProcessor, as these happen before fields are stored (E.g. RegexpReplaceProcessorFactory). If you are concerned about facet results, then you can do it in an analysis chain, for example with a RegexpFilterFactory. Upayavira
Re: Solr relevancy score in percentage
Currently I've take the score that I get from Solr, and divide it by the maxScore, and multiply it by 100 to get the percentage. All these are done on the coding for the UI. The user will only see the percentage and will not know anything about the score. Since the score by itself is meaningless, so I don't think I should display that score of like 1.7 or 0.2 on the UI, which could further confuse the user and raise alot more questions. Regards, Edwin On 26 May 2015 at 23:07, Shawn Heisey apa...@elyograg.org wrote: On 5/26/2015 8:10 AM, Zheng Lin Edwin Yeo wrote: We want the user to see how relevant the result is with respect to the search query entered, and not how good the results are. But I suspect a problem is that the 1st record will always be 100%, regardless of what is the score, as the 1st record score will always be equals to the maxScore. If you want to give your users *something* then simply display the score that you get from Solr. I recommend that you DON'T give them maxScore, because they will be tempted to make the percentage calculation themselves to try and find meaning where there is none. A clever user will be able to figure out maxScore for themselves simply by sorting on relevance and looking at the score on the top doc. When you get questions about what the number means, and you *WILL* get those questions, you can tell them that the number itself is meaningless and what matters is how the scores within a single result compare to each other -- exactly what you have been told here. Thanks, Shawn
Re: YAJar
by dumping you mean recompiling solr with guava 18? On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte fschietteca...@gmail.com wrote: Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a while ago and it worked fine for me. François On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com wrote: i have a minhash logic that uses guava 18.0 method that is not in guava 14.0.1. This minhash logic is a separate maven project. I'm including it in my project via maven.the code is being used as a search component on the set of results. The logic goes through the search results and deletes duplicates. here is the solrconfig.xml requestHandler name=/select class=solr.SearchHandler default=true arr name=last-components strtvComponent/str strterms/str strminHashDedup/str /arr /requestHandler searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr name=MAX_COMPARISONS5/str DedupSearchHits class is the one implementing the minhash (hence using guava 18). I start solr via the solr.in.sh script. The error I am getting is: Caused by: java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode; at com.xyz.incrementToken(MinHashTokenFilter.java:54) at com.xyz.MinHash.calculate(MinHash.java:131) at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89) at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297) at org.apache.solr.core.SolrCore.init(SolrCore.java:813) What is the best design to solve this problem?I understand the point of modularity but how can i include logic into solr that does result processing without loading that jar into solr? thank you On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com wrote: I guess this is one reason why the whole WAR approach is being removed! Solr should be a black-box that you talk to, and get responses from. What it depends on and how it is deployed, should be irrelevant to you. If you are wanting to override the version of guava that Solr uses, then you'd have to rebuild Solr (can be done with maven) and manually update the pom.xml to use guava 18.0, but why would you? You need to test Solr completely (in case any guava bugs affect Solr), deal with any build issues that arise (if guava changes any APIs), and cause yourself a world of pain, for what gain? On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote: i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: YAJar
i can't run 14.0.1. that is the problem. 14 does not have the interfaces i need On Tue, May 26, 2015 at 10:28 AM, François Schiettecatte fschietteca...@gmail.com wrote: Run whatever tests you want with 14.0.1, replace it with 18.0, rerun the tests and compare. François On May 26, 2015, at 10:25 AM, Robust Links pey...@robustlinks.com wrote: by dumping you mean recompiling solr with guava 18? On Tue, May 26, 2015 at 10:22 AM, François Schiettecatte fschietteca...@gmail.com wrote: Have you tried dumping guava 14.0.1 and using 18.0 with Solr? I did a while ago and it worked fine for me. François On May 26, 2015, at 10:11 AM, Robust Links pey...@robustlinks.com wrote: i have a minhash logic that uses guava 18.0 method that is not in guava 14.0.1. This minhash logic is a separate maven project. I'm including it in my project via maven.the code is being used as a search component on the set of results. The logic goes through the search results and deletes duplicates. here is the solrconfig.xml requestHandler name=/select class=solr.SearchHandler default=true arr name=last-components strtvComponent/str strterms/str strminHashDedup/str /arr /requestHandler searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr name=MAX_COMPARISONS5/str DedupSearchHits class is the one implementing the minhash (hence using guava 18). I start solr via the solr.in.sh script. The error I am getting is: Caused by: java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode; at com.xyz.incrementToken(MinHashTokenFilter.java:54) at com.xyz.MinHash.calculate(MinHash.java:131) at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89) at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297) at org.apache.solr.core.SolrCore.init(SolrCore.java:813) What is the best design to solve this problem?I understand the point of modularity but how can i include logic into solr that does result processing without loading that jar into solr? thank you On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com wrote: I guess this is one reason why the whole WAR approach is being removed! Solr should be a black-box that you talk to, and get responses from. What it depends on and how it is deployed, should be irrelevant to you. If you are wanting to override the version of guava that Solr uses, then you'd have to rebuild Solr (can be done with maven) and manually update the pom.xml to use guava 18.0, but why would you? You need to test Solr completely (in case any guava bugs affect Solr), deal with any build issues that arise (if guava changes any APIs), and cause yourself a world of pain, for what gain? On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote: i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: Index optimize runs in background.
On 5/26/2015 6:29 AM, Upayavira wrote: Are you saying that the reason you are optimising is because you have been doing it for years? If this is the only reason, you should stop doing it immediately. The one scenario in which optimisation still makes some sense is when you reindex every night and optimise straight after. This will leave you with a single segment which will search faster. However, if you are doing a lot of indexing, especially with deletes/updates, you will have merged your content into a single segment which will later need to be merged. That merge will be costly as it will involve copying the entire content of your large segment, which will impact performance. Before Solr 3.6, Optimisation was necessary and recommended. At that point (or a little before) the TieredMergePolicy became the default, and this made optimisation generally unnecessary. In general, I concur with this advice about optimizing. Historically, optimize was done for increased performance. In older versions, an unoptimized index performed *MUCH* worse than an index with a single segment. This is no longer the case today, mostly due to so many Lucene features working on a per-segment basis. A single segment does perform faster, but the difference is much smaller than it used to be. A full optimize on a large index requires a LOT of CPU and I/O resources -- while the optimize is underway, performance is not very good. There are,however, still times when running optimize is appropriate: 1) The index is mostly static, not receiving very frequent updates. 2) There is a large percentage of deleted documents in the index. With modern Lucene/Solr and these use cases, the reasons for optimizing are still performance-related, but the only time you should do an optimize is when the benefit outweighs the cost. For the 1) use case, the index will likely remain mostly-optimized for a long period of time after the optimize is done, so the resources required for the optimize are worth spending. For the 2) use case, optimizing will reduce the size of the index significantly, so general performance gets better. That makes the cost worthwhile. Thanks, Shawn
AW: Solr 5.1 ignores SOLR_JAVA_MEM setting
I also noticed that (see my post this morning) ... SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true ... Is not taken into consideration (anymore). Same bug? -Ursprüngliche Nachricht- Von: Ere Maijala [mailto:ere.maij...@helsinki.fi] Gesendet: Mittwoch, 15. April 2015 09:25 An: solr-user Betreff: Solr 5.1 ignores SOLR_JAVA_MEM setting Folks, just a quick heads-up that apparently Solr 5.1 introduced a change in bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh or environment. I just filed https://issues.apache.org/jira/browse/SOLR-7392. The problem can be circumvented by using SOLR_HEAP setting, e.g. SOLR_HEAP=32G, but it's not mentioned in solr.in.sh by default. --Ere -- Ere Maijala Kansalliskirjasto / The National Library of Finland
Re: YAJar
i have a minhash logic that uses guava 18.0 method that is not in guava 14.0.1. This minhash logic is a separate maven project. I'm including it in my project via maven.the code is being used as a search component on the set of results. The logic goes through the search results and deletes duplicates. here is the solrconfig.xml requestHandler name=/select class=solr.SearchHandler default=true arr name=last-components strtvComponent/str strterms/str strminHashDedup/str /arr /requestHandler searchComponent name=minHashDedup class=com.xyz.DedupSearchHitsstr name=MAX_COMPARISONS5/str DedupSearchHits class is the one implementing the minhash (hence using guava 18). I start solr via the solr.in.sh script. The error I am getting is: Caused by: java.lang.NoSuchMethodError: com.google.common.hash.HashFunction.hashUnencodedChars(Ljava/lang/CharSequence;)Lcom/google/common/hash/HashCode; at com.xyz.incrementToken(MinHashTokenFilter.java:54) at com.xyz.MinHash.calculate(MinHash.java:131) at com.xyz.Algorithms.minhash.MinHasher.compare(MinHasher.java:89) at com.xyz.Algorithms.minhash.DedupSearchHits.init(DedupSearchHits.java:74) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:619) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2311) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2305) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:2338) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:1297) at org.apache.solr.core.SolrCore.init(SolrCore.java:813) What is the best design to solve this problem?I understand the point of modularity but how can i include logic into solr that does result processing without loading that jar into solr? thank you On Tue, May 26, 2015 at 8:00 AM, Daniel Collins danwcoll...@gmail.com wrote: I guess this is one reason why the whole WAR approach is being removed! Solr should be a black-box that you talk to, and get responses from. What it depends on and how it is deployed, should be irrelevant to you. If you are wanting to override the version of guava that Solr uses, then you'd have to rebuild Solr (can be done with maven) and manually update the pom.xml to use guava 18.0, but why would you? You need to test Solr completely (in case any guava bugs affect Solr), deal with any build issues that arise (if guava changes any APIs), and cause yourself a world of pain, for what gain? On 26 May 2015 at 11:29, Robust Links pey...@robustlinks.com wrote: i have custom search components. On Tue, May 26, 2015 at 4:34 AM, Upayavira u...@odoko.co.uk wrote: Why is your app tied that closely to Solr? I can understand if you are talking about SolrJ, but normal usage you use a different application in a different JVM from Solr. Upayavira On Tue, May 26, 2015, at 05:14 AM, Robust Links wrote: I am stuck in Yet Another Jarmagedon of SOLR. this is a basic question. i noticed solr 5.0 is using guava 14.0.1. My app needs guava 18.0. What is the pattern to override a jar version uploaded into jetty? I am using maven, and solr is being started the old way java -jar start.jar -Dsolr.solr.home=... -Djetty.home=... I tried to edit jetty's start.config (then run java -DSTART=/my/dir/start.config -jar start.jar) but got no where... any help would be much appreciated Peyman
Re: Index optimize runs in background.
I completely agree with Upayavira and Shawn. Modassar, can you explain us how often do you index ? Have you ever played with the merge Factor ? I hardly think you need to optimise at all. Simply a tuning of the merge Factor should solve all your issues . I assume you were optimising only to have fast search, weren't you ? Cheers 2015-05-26 16:07 GMT+01:00 Shawn Heisey apa...@elyograg.org: On 5/26/2015 6:29 AM, Upayavira wrote: Are you saying that the reason you are optimising is because you have been doing it for years? If this is the only reason, you should stop doing it immediately. The one scenario in which optimisation still makes some sense is when you reindex every night and optimise straight after. This will leave you with a single segment which will search faster. However, if you are doing a lot of indexing, especially with deletes/updates, you will have merged your content into a single segment which will later need to be merged. That merge will be costly as it will involve copying the entire content of your large segment, which will impact performance. Before Solr 3.6, Optimisation was necessary and recommended. At that point (or a little before) the TieredMergePolicy became the default, and this made optimisation generally unnecessary. In general, I concur with this advice about optimizing. Historically, optimize was done for increased performance. In older versions, an unoptimized index performed *MUCH* worse than an index with a single segment. This is no longer the case today, mostly due to so many Lucene features working on a per-segment basis. A single segment does perform faster, but the difference is much smaller than it used to be. A full optimize on a large index requires a LOT of CPU and I/O resources -- while the optimize is underway, performance is not very good. There are,however, still times when running optimize is appropriate: 1) The index is mostly static, not receiving very frequent updates. 2) There is a large percentage of deleted documents in the index. With modern Lucene/Solr and these use cases, the reasons for optimizing are still performance-related, but the only time you should do an optimize is when the benefit outweighs the cost. For the 1) use case, the index will likely remain mostly-optimized for a long period of time after the optimize is done, so the resources required for the optimize are worth spending. For the 2) use case, optimizing will reduce the size of the index significantly, so general performance gets better. That makes the cost worthwhile. Thanks, Shawn -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
solr date functions and object creation
Hello I have a weird SOLR problem with object creation from a date function query against a TrieDate field in my index called ds. This boost function bf=min(div(ms(NOW/HOUR,ds),60480),26) causes many millions of FunctionQuery objects to be created in memory. When I change it to bf=min(abs(div(ms(NOW/HOUR,ds),60480)),26) the extra objects aren't created, the change is I added abs(). I've checked that every document has the field ds populated and the dates it contains are all on the past Any ideas why? The extra memory usage has caused stability problems. Thanks
Re: docValues: Can we apply synonym
We are interested in using docValues for better memory utilization and speed. Currently we are faceting the search results on *city. *In city we have also added the synonym for cities like mumbai, bombay (These are Indian cities). So that result of mumbai is also eligible when somebody will applying filter of bombay on search results. I need this functionality to apply with docValues enabled field. With Regards Aman Tandon On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: I checked in the Documentation to be sure, but apparently : DocValues are only available for specific field types. The types chosen determine the underlying Lucene docValue type that will be used. The available Solr field types are: - StrField and UUIDField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the SORTED type. - If the field is multi-valued, Lucene will use the SORTED_SET type. - Any Trie* numeric fields and EnumField. - If the field is single-valued (i.e., multi-valued is false), Lucene will use the NUMERIC type. - If the field is multi-valued, Lucene will use the SORTED_SET type. This means you should not analyse a field where DocValues is enabled. Can your explain us your use case ? Why are you interested in synonyms DocValues level ? Cheers 2015-05-26 13:32 GMT+01:00 Upayavira u...@odoko.co.uk: To my understanding, docValues are just an uninverted index. That is, it contains the terms that are generated at the end of an analysis chain. Therefore, you simply enable docValues and include the SynonymFilterFactory in your analysis. Is that enough, or are you struggling with some other issue? Upayavira On Tue, May 26, 2015, at 12:03 PM, Aman Tandon wrote: Hi, We have some field *city* in which the docValues are enabled. We need to add the synonym in that field so how could we do it? With Regards Aman Tandon -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: Solr 5.1 ignores SOLR_JAVA_MEM setting
Yes, same bug. Fixed in 5.2 On Tue, May 26, 2015 at 9:15 AM, Clemens Wyss DEV clemens...@mysign.ch wrote: I also noticed that (see my post this morning) ... SOLR_OPTS=$SOLR_OPTS -Dsolr.allow.unsafe.resourceloading=true ... Is not taken into consideration (anymore). Same bug? -Ursprüngliche Nachricht- Von: Ere Maijala [mailto:ere.maij...@helsinki.fi] Gesendet: Mittwoch, 15. April 2015 09:25 An: solr-user Betreff: Solr 5.1 ignores SOLR_JAVA_MEM setting Folks, just a quick heads-up that apparently Solr 5.1 introduced a change in bin/solr that overrides SOLR_JAVA_MEM setting from solr.in.sh or environment. I just filed https://issues.apache.org/jira/browse/SOLR-7392. The problem can be circumvented by using SOLR_HEAP setting, e.g. SOLR_HEAP=32G, but it's not mentioned in solr.in.sh by default. --Ere -- Ere Maijala Kansalliskirjasto / The National Library of Finland
Re: Solr relevancy score in percentage
This is one of those things that is, IMO, strictly a feel good thing that's sometimes insisted upon by the product manager and all the information in the world about this is really meaningless falls on deaf ears. If you simply have no choice (a position I've been because it wasn't worth the argument), you can do the star thing. That is, display 5 stars for percentages between 80-100, 4 stars for 60-80 etc. and not display the percentages or raw scores at all. But as others have said, it really isn't providing any additional information, and IMO misleading the user... Best, Erick On Tue, May 26, 2015 at 8:31 AM, Alessandro Benedetti benedetti.ale...@gmail.com wrote: Honeslty the only case where the score in percentage could make sense, is for the More Like This. In that case Solr should provide that feature as we perfectly know that the 100 % similar score is a copy of the seed document. If I am right, because of the MLT implementation, not taking care of the identity score, we are getting there weird scores as well. Maybe in there is the only place I would prefer a percentage. Cheers 2015-05-26 16:23 GMT+01:00 Zheng Lin Edwin Yeo edwinye...@gmail.com: Currently I've take the score that I get from Solr, and divide it by the maxScore, and multiply it by 100 to get the percentage. All these are done on the coding for the UI. The user will only see the percentage and will not know anything about the score. Since the score by itself is meaningless, so I don't think I should display that score of like 1.7 or 0.2 on the UI, which could further confuse the user and raise alot more questions. Regards, Edwin On 26 May 2015 at 23:07, Shawn Heisey apa...@elyograg.org wrote: On 5/26/2015 8:10 AM, Zheng Lin Edwin Yeo wrote: We want the user to see how relevant the result is with respect to the search query entered, and not how good the results are. But I suspect a problem is that the 1st record will always be 100%, regardless of what is the score, as the 1st record score will always be equals to the maxScore. If you want to give your users *something* then simply display the score that you get from Solr. I recommend that you DON'T give them maxScore, because they will be tempted to make the percentage calculation themselves to try and find meaning where there is none. A clever user will be able to figure out maxScore for themselves simply by sorting on relevance and looking at the score on the top doc. When you get questions about what the number means, and you *WILL* get those questions, you can tell them that the number itself is meaningless and what matters is how the scores within a single result compare to each other -- exactly what you have been told here. Thanks, Shawn -- -- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry? William Blake - Songs of Experience -1794 England
Re: When is too many fields in qf is too many?
Hi Doug, I'm back to this topic. Unfortunately, due to my DB structer, and business need, I will not be able to search against a single field (i.e.: using copyField). Thus, I have to use list of fields via qf. Given this, I see you said above to use tie=1.0 will that, more or less, address this scoring issue? Should tie=1.0 be set on the request handler like so: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=defTypeedismax/str str name=qfF1 F2 F3 F4 ... ... .../str float name=tie1.0/float str name=fl_UNIQUE_FIELD_,score/str str name=wtxml/str str name=indenttrue/str /lst /requestHandler Or must tie be passed as part of the URL? Thanks Steve On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull dturnb...@opensourceconnections.com wrote: Yeah a copyField into one could be a good space/time tradeoff. It can be more manageable to use an all field for both relevancy and performance, if you can handle the duplication of data. You could set tie=1.0, which effectively sums all the matches instead of picking the best match. You'll still have cases where one field's score might just happen to be far off of another, and thus dominating the summation. But something easy to try if you want to keep playing with dismax. -Doug On Wed, May 20, 2015 at 2:56 PM, Steven White swhite4...@gmail.com wrote: Hi Doug, Your blog write up on relevancy is very interesting, I didn't know this. Looks like I have to go back to my drawing board and figure out an alternative solution: somehow get those group-based-fields data into a single field using copyField. Thanks Steve On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull dturnb...@opensourceconnections.com wrote: Steven, I'd be concerned about your relevance with that many qf fields. Dismax takes a winner takes all point of view to search. Field scores can vary by an order of magnitude (or even two) despite the attempts of query normalization. You can read more here http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/ I'm about to win the blashphemer merit badge, but ad-hoc all-field like searching over many fields is actually a good use case for Elasticsearch's cross field queries. https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fields_queries.html http://opensourceconnections.com/blog/2015/03/19/elasticsearch-cross-field-search-is-a-lie/ It wouldn't be hard (and actually a great feature for the project) to get the Lucene query associated with cross field search into Solr. You could easily write a plugin to integrate it into a query parser: https://github.com/elastic/elasticsearch/blob/master/src/main/java/org/apache/lucene/queries/BlendedTermQuery.java Hope that helps -Doug -- *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections, LLC | 240.476.9983 | http://www.opensourceconnections.com Author: Relevant Search http://manning.com/turnbull from Manning Publications This e-mail and all contents, including attachments, is considered to be Company Confidential unless explicitly stated otherwise, regardless of whether attachments are marked as such. On Wed, May 20, 2015 at 8:27 AM, Steven White swhite4...@gmail.com wrote: Hi everyone, My solution requires that users in group-A can only search against a set of fields-A and users in group-B can only search against a set of fields-B, etc. There can be several groups, as many as 100 even more. To meet this need, I build my search by passing in the list of fields via qf. What goes into qf can be large: as many as 1500 fields and each field name averages 15 characters long, in effect the data passed via qf will be over 20K characters. Given the above, beside the fact that a search for apple translating to a 20K characters passing over the network, what else within Solr and Lucene I should be worried about if any? Will I hit some kind of a limit? Will each search now require more CPU cycles? Memory? Etc. If the network traffic becomes an issue, my alternative solution is to create a /select handler for each group and in that handler list the fields under qf. I have considered creating pseudo-fields for each group and then use copyField into that group. During search, I than can qf against that one field. Unfortunately, this is not ideal for my solution because the fields that go into each group dynamically change (at least once a month) and when they do change, I have to re-index everything (this I have to avoid) to sync that group-field.