RE: Can I use boosting fields with edismax ?
Amit its important to note that dismax/edismax isn't giving you a weighted average of these field score. Without the tie parameter one fields score is likely always winning the dismax contest. Field scores are relative, so 5 could be an amazing score for say title while 500 a terrible score for text. Dismax picks the field that yields the maximum score, so the worst text scores might be sorted higher than the best title match. Look at your debug output and use that, rather than your sense of relative field importance, to adjust qf. I wrote a blog post on this topic that you might find helpful http://www.opensourceconnections.com/2013/07/02/getting-dissed-by-dismax-why-your-incorrect-assumptions-about-dismax-are-hurting-search-relevancy/ Sent from my Windows Phone From: Amit Aggarwal Sent: 11/25/2013 6:31 AM To: solr-user@lucene.apache.org Subject: Re: Can I use boosting fields with edismax ? Ok Erick.. I will try thanks On 25-Nov-2013 2:46 AM, Erick Erickson erickerick...@gmail.com wrote: This should work. Try adding debug=all to your URL, and examine the output both with and without your boosting. I believe you'll see the difference in the score calculations. From there it's a matter of adjusting the boosts to get the results you want. Best, Erick On Sat, Nov 23, 2013 at 9:17 AM, Amit Aggarwal amit.aggarwa...@gmail.com wrote: Hello All , I am using defType=edismax So will boosting will work like this in solrConfig.xml str name=qfvalue_search^2.0 desc_search country_search^1.5 state_search^2.0 city_search^2.5 area_search^3.0/str I think it is not working .. If yes , then what should I do ?
Solution for MM ignored in edismax queries with operators ?
Hi, We found a possible solution for SOLR-2649https://issues.apache.org/jira/browse/SOLR-2649 : MM ignored in edismax queries with operators. The details are here https://issues.apache.org/jira/browse/SOLR-2649?focusedCommentId=13822482page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13822482. Any feedback is welcome. Best regards, Anca Kopetz Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
In a functon query, I can't get the ValueSource when extend ValueSourceParser
hi, I am working with solr4.1. When I don't parseValueSource, my function query works well. The code is like this: public class DateSourceParser extends ValueSourceParser { @Override public void init(NamedList namedList) { } @Override *public ValueSource parse(FunctionQParser fp) throws SyntaxError { return new DateFunction(); }* } When I want to use the ValueSource, like this: public class DateSourceParser extends ValueSourceParser { @Override public void init(NamedList namedList) { } @Override *public ValueSource parse(FunctionQParser fp) throws SyntaxError { ValueSource source = fp.parseValueSource(); return new DateFunction(source); }* } fp.parseValueSource() throws an error like this: ERROR [org.apache.solr.core.SolrCore] - org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Expected identifier at pos 12 str='dateDeboost()' at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:147) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:70) at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:173) at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:229) at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:274) at com.caucho.server.port.TcpConnection.run(TcpConnection.java:514) at com.caucho.util.ThreadPool.runTasks(ThreadPool.java:527) at com.caucho.util.ThreadPool.run(ThreadPool.java:449) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.solr.search.SyntaxError: Expected identifier at pos 12 str='dateDeboost()' at org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:747) at org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:726) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:345) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223) at org.sling.solr.custom.DateSourceParser.parse(DateSourceParser.java:24) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352) at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.search.BoostQParserPlugin$1.parse(BoostQParserPlugin.java:61) at org.apache.solr.search.QParser.getQuery(QParser.java:142) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:117) ... 13 more so, how to make fp.parseValueSource() work? Thanks!!! sling -- View this message in context: http://lucene.472066.n3.nabble.com/In-a-functon-query-I-can-t-get-the-ValueSource-when-extend-ValueSourceParser-tp4103026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: distributed search is significantly slower than direct search
https://issues.apache.org/jira/browse/SOLR-5478 There it goes On Mon, Nov 18, 2013 at 5:44 PM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Sure, I am out of office till end of week. I reply after i upload the patch
RE: How To Use Multivalued Field Payload at Boosting?
Solr has no query parsers that support payloads. You would have make your own query parser and also create a custom similarity implementing scorePayload for it to work. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Sunday 24th November 2013 19:07 To: solr-user@lucene.apache.org Subject: How To Use Multivalued Field Payload at Boosting? I have a multivalued field and they have payloads. How can I use that payloads at boosting? (When user searches for a keyword and if a match happens at that multivalued field its payload will be added it to the general score) PS: I use Solr 4.5.1 as Cloud.
FYI real-time get handler is needed for Solr cloud recovery.
Just had an issue on our Solr cloud and wanted to point this out to the list at large. The real-time /get handler is used by Solr Cloud's sync/recovery mechanism, so *DO NOT* remove it from SolrConfig if you are using Solr Cloud! We did (because we weren't using real-time get ourselves and we were trying to remove all the unnecessary stuff from solrconfig). What it means is that whenever a leadership change for a shard happens, ALL the replicas go into full recovery mode, since they can't determine whether they are in sync or not. There seem to get some getVersions messages which are implemented in the RealTimeGetComponent, and since these are required for cloud recovery, shouldn't there be more emphasis on this being a required component (or part of the Core Admin Handler so it can't be configured away?) We have the comments in schema that the _version field is mandatory for SolrCloud, I think we at least need something similar for the /get handler. I'll log a JIRA for this, but sending here first.
syncronization between replicas
Hi, We currently running tests on solr to find as many problems in our solr environment so we can be ready for these kind of problems in production, anyway we found an edge case and have few questions about it. We have one collection with two shards, each shard with replica factor 2. we are sending docs to the index and everything is okay, now the scenario: 1. take one of the replicas of shard1 down(it doesn't matter which one) 2. continue indexing documents(that's important for this scenario) 3. take down the second replica of shard1(now the shard is down and we cannot index anymore) 4. take the replica from step 1 up(that's important that this replica will go up first) 5. take the replica from step 3 up The regular synchronization flow is that the leader synchronize the other replica, but I'm pretty sure this is a known issue, is there a way to do a two way synchronization or do you have any other solution for me? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/syncronization-between-replicas-tp4103046.html Sent from the Solr - User mailing list archive at Nabble.com.
Setting solr.data.dir for SolrCloud instance
I found something strange while trying to create more than one collection in SolrCloud: I am running every instance with -Dsolr.data.dir=/data If I look at Core Admin section, I can see that I have one core and its dataDir is set to this fixed location. Problem is, if I create a new collection, another core is created - but with this fixed index location again. I was expecting that the path I sent would serve as the BASE path for all cores the the node hosts. Current behaviour seems like a bug to me, because obviously one collection will see data that was not indexed to him. Is there a way to overcome this? I mean, change the default data dir location, but still be able to create more than one collection correctly? -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Setting solr.data.dir for SolrCloud instance
The first thing I'd do is not send an absolute path. What happens if you just sent -Dsolr.data.dir=data? (no '/')? We had this discussion a while ago when we were working on auto-discovery, and it turns out that there _are_ legitimate cases in which more than one core/collection can point to the same data dir. You have to very carefully control who writes to the core, and I wouldn't do it unless there was no choice, but some people find it useful. And, in general, I wouldn't mix and match the _core_ admin API with the _collections_ api unless you're very confident in what you are doing. Why isn't just letting the default data.dir location working for you? There are good reasons to make it explicit, mostly just checking that you're not over-thinking the problem. Usually they'll be located in a reasonable place. Best, Erick On Mon, Nov 25, 2013 at 8:12 AM, adfel70 adfe...@gmail.com wrote: I found something strange while trying to create more than one collection in SolrCloud: I am running every instance with -Dsolr.data.dir=/data If I look at Core Admin section, I can see that I have one core and its dataDir is set to this fixed location. Problem is, if I create a new collection, another core is created - but with this fixed index location again. I was expecting that the path I sent would serve as the BASE path for all cores the the node hosts. Current behaviour seems like a bug to me, because obviously one collection will see data that was not indexed to him. Is there a way to overcome this? I mean, change the default data dir location, but still be able to create more than one collection correctly? -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Parse eDisMax queries for keywords
Hi Jack, thanks for your reply. Ok in this case I agree that enriching the query in the application layer is a good idea. We are still a bit puzzled how the enriched query should look like. I'll post here when we found a solution. If somebody has suggestions, I'd be happy to hear them. Mirko 2013/11/21 Jack Krupansky j...@basetechnology.com The query parser does its own tokenization and parsing before your analyzer tokenizer and filters are called, assuring that only one white space-delimited token is analyzed at a time. You're probably best off having an application layer preprocessor for the query that enriches the query in the manner that you're describing. Or, simply settle for a heuristic approach that may give you 70% of what you want using only existing Solr features on the server side. -- Jack Krupansky -Original Message- From: Mirko Sent: Thursday, November 21, 2013 5:30 AM To: solr-user@lucene.apache.org Subject: Parse eDisMax queries for keywords Hi, We would like to implement special handling for queries that contain certain keywords. Our particular use case: In the example query Footitle season 1 we want to discover the keywords season , get the subsequent number, and boost (or filter for) documents that match 1 on field name=season. We have two fields in our schema: !-- titles contains titles -- field name=title type=text indexed=true stored=true multiValued=false/ fieldType name=text class=solr.TextField omitNorms=true analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- ... -- /analyzer /fieldType field name=season type=season_number indexed=true stored=false multiValued=false/ !-- season contains season numbers -- fieldType name=season_number class=solr.TextField omitNorms=true analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season) *0*([0-9]+).* replacement=$1/ /analyzer /fieldType Our idea was to use a Keyword tokenizer and a Regex on the season field to extract the season number from the complete query. However, we use a ExtendedDisMax query parser in our search handler: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qf title season /str /lst /requestHandler The problem is that the eDisMax tokenizes the query, so that our field season receives the tokens [Foo, season, 1] without any order, instead of the complete query. How can we pass the complete query (untokenized) to the season field? We don't understand which tokenizer is used here and why our season field received tokens instead of the complete query. Or is there another approach to solve this use case with Solr? Thanks, Mirko
Re: Suggester - how to return exact match?
Thanks! We solved this issue in the front-end now. I.e. we add the exact match to the list of suggestions there. Mirko 2013/11/22 Developer bbar...@gmail.com Might not be a perfect solution but you can use edgengram filter and copy all your field data to that field and use it for suggestion. fieldType name=text_autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=250 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType http://localhost:8983/solr/core1/select?q=name:iphone The above query will return iphone iphone5c iphone4g -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-how-to-return-exact-match-tp4102203p4102521.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.x : how to implement an update processor chain working for partial updates
In my solr schema I have the following fields defined : field name=content type=text_general indexed=false stored=true multiValued=true / field name=all type=text_general indexed=true stored=false multiValued=true termVectors=true / field name=eng type=text_en indexed=true stored=false multiValued=true termVectors=true / field name=ita type=text_it indexed=true stored=false multiValued=true termVectors=true / field name=fre type=text_fr indexed=true stored=false multiValued=true termVectors=true / ... copyField source=content dest=all/ To fill in the language specific fields, I use a custom update processor chain, with a custom ConditionalCopyProcessor that copies content field into appropriate language field, depending on document language (as explained in http://wiki.apache.org/solr/UpdateRequestProcessor). Problem is this custom chain is applied on the document passed to the update request, thus it works all right when inserting a new document or updating the whole document, where all fields are provided, but it does not when passed document holds only updated fields (as language-specific fields are not stored). I would avoid to set language specific fields to stored=true, as content field may hold big values. Is there a way to have solr execute my ConditionalCopyProcessor on the actual updated doc (the one resulting from solr retrieving all stored values and merging with update request values), and not on the request doc ? Thank a lot for your help. Paule -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-x-how-to-implement-an-update-processor-chain-working-for-partial-updates-tp4103071.html Sent from the Solr - User mailing list archive at Nabble.com.
ConcurrentModificationException from XMLResponseWriter
Following exception is found in solr logs. We are using Solr 3.2. As the stack trace is not referring to any application classes, I couldn't figure out the piece of code that throws this exception. Is there any way to debug this issue? Is it related to the issue ConcurrentModificationException from BinaryResponseWriter Nov 25, 2013 7:10:56 AM org.apache.solr.common.SolrException log SEVERE: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:392) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:391) at org.apache.solr.response.XMLWriter.writeMap(XMLWriter.java:644) at org.apache.solr.response.XMLWriter.writeVal(XMLWriter.java:591) at org.apache.solr.response.XMLWriter.writeResponse(XMLWriter.java:131) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:35) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:265) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869) at org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664) at org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527) at org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684) at java.lang.Thread.run(Thread.java:662) Thanks
Trouble with manually routed collection after upgrade to 4.6
Hi, I've been using a collection on Solr 4.5.X for a few weeks and just did an upgrade to 4.6 and am having some issues. First: this collection is, I guess, implicitly routed. I do this for every document insert using SolrJ: document.addField(_route_, shardId) After upgrading the servers to 4.6 I now get the following on every insert/delete when using either SolrJ 4.5.1 or 4.6: org.apache.solr.common.SolrException: No active slice servicing hash code 17b9dff6 in DocCollection In the clusterstate *none* of my shards have a range set (they're all null), but I thought this would be expected since I do routing myself. Did the upgrade change something here? I didn't see anything related to this in the upgrade notes. Thanks, Brett
RE: Multiple data/index.YYYYMMDD.... dirs == bug?
-Original message- From:Otis Gospodnetic otis.gospodne...@gmail.com Sent: Wednesday 20th November 2013 16:40 To: solr-user@lucene.apache.org Subject: Multiple data/index.MMDD dirs == bug? Hi, When full index replication is happening via SnapPuller, a temporary timestamped index dir is created. Questions: 1) Under normal circumstances could more than 1 timestamped index directory ever be present? No, except during replication. 2) Should there always be an the .../data/index directory present? No, the directory can also be index.TIME. It is pointed to from index.properties. I'm asking because I see the following situation on one SolrCloud node: $ du -ms /home/solr/data/* 1188367/home/solr/data/index.20131118152402344 709050/home/solr/data/index.20131119210950598 1/home/solr/data/index.properties 1/home/solr/data/replication.properties 3053/home/solr/data/tlog Note: 1) there are 2 timestamped directories 2) there is no data/index directory This is not good but you can safely remove all that are not in index.properties, usually keep only the newest. According to SnapPuller, the timestamped index dir is a temporary dir and should be removed after replication. unless maybe some error case is not being handled correctly and timestamped index dirs are leaking. It can happen when Solr dies, they are not removed on start up. Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: ConcurrentModificationException from XMLResponseWriter
On 11/25/2013 8:43 AM, Shyamsunder R Mutcha wrote: Following exception is found in solr logs. We are using Solr 3.2. As the stack trace is not referring to any application classes, I couldn't figure out the piece of code that throws this exception. Is there any way to debug this issue? Is it related to the issue ConcurrentModificationException from BinaryResponseWriter Nov 25, 2013 7:10:56 AM org.apache.solr.common.SolrException log SEVERE: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:392) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:391) at org.apache.solr.response.XMLWriter.writeMap(XMLWriter.java:644) The exception is coming from LinkedHashMap, a built-in Java object type. http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html The code that made the call that's failing is line 644 of this source code file: solr/core/src/java/org/apache/solr/response/XMLWriter.java I looked at the 3.2 source code. What's going on here is fairly normal - it's interating through a Map and outputting the data contained there to the writer. The actual problem is occurring elsewhere, it's only showing up in XMLWriter due to the way LinkedHashMap objects work. Another thread has modified the Map while the iterator is being used. This is something you're not allowed to do with this object type, so it throws the exception. I can't find any existing Solr bugs, so the question is: Are you using any custom code with Solr? Perhaps something you downloaded or purchased, or something you wrote in-house? If so, then that code has some bugs. If this *is* a bug in Solr 3.x, it is highly unlikely that it will get fixed, at least in a 3.x version. If it still exists in version 4.x (which is unlikely), then it will get fixed there. Version 3.2 is two years old, and the entire 3.x branch is in maintenance mode, meaning that only EXTREMELY severe bugs will be fixed. Thanks, Shawn
Re: Trouble with manually routed collection after upgrade to 4.6
Here's my clusterstate.json: https://gist.github.com/bretthoerner/a8120a8d89c93f773d70 On Mon, Nov 25, 2013 at 10:18 AM, Brett Hoerner br...@bretthoerner.comwrote: Hi, I've been using a collection on Solr 4.5.X for a few weeks and just did an upgrade to 4.6 and am having some issues. First: this collection is, I guess, implicitly routed. I do this for every document insert using SolrJ: document.addField(_route_, shardId) After upgrading the servers to 4.6 I now get the following on every insert/delete when using either SolrJ 4.5.1 or 4.6: org.apache.solr.common.SolrException: No active slice servicing hash code 17b9dff6 in DocCollection In the clusterstate *none* of my shards have a range set (they're all null), but I thought this would be expected since I do routing myself. Did the upgrade change something here? I didn't see anything related to this in the upgrade notes. Thanks, Brett
Re: How To Use Multivalued Field Payload at Boosting?
Is there any example for it? 2013/11/25 Markus Jelsma markus.jel...@openindex.io Solr has no query parsers that support payloads. You would have make your own query parser and also create a custom similarity implementing scorePayload for it to work. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Sunday 24th November 2013 19:07 To: solr-user@lucene.apache.org Subject: How To Use Multivalued Field Payload at Boosting? I have a multivalued field and they have payloads. How can I use that payloads at boosting? (When user searches for a keyword and if a match happens at that multivalued field its payload will be added it to the general score) PS: I use Solr 4.5.1 as Cloud.
Re: Trouble with manually routed collection after upgrade to 4.6
Think I got it. For some reason this was in my clusterstate.json after the upgrade (note that I was using 4.5.X just fine previously...): router: { name: compositeId }, I stopped all my nodes and manually edited this to me implicit (is there a tool for this? I've always done it manually), started the cluster up again and it's all good now. On Mon, Nov 25, 2013 at 10:38 AM, Brett Hoerner br...@bretthoerner.comwrote: Here's my clusterstate.json: https://gist.github.com/bretthoerner/a8120a8d89c93f773d70 On Mon, Nov 25, 2013 at 10:18 AM, Brett Hoerner br...@bretthoerner.comwrote: Hi, I've been using a collection on Solr 4.5.X for a few weeks and just did an upgrade to 4.6 and am having some issues. First: this collection is, I guess, implicitly routed. I do this for every document insert using SolrJ: document.addField(_route_, shardId) After upgrading the servers to 4.6 I now get the following on every insert/delete when using either SolrJ 4.5.1 or 4.6: org.apache.solr.common.SolrException: No active slice servicing hash code 17b9dff6 in DocCollection In the clusterstate *none* of my shards have a range set (they're all null), but I thought this would be expected since I do routing myself. Did the upgrade change something here? I didn't see anything related to this in the upgrade notes. Thanks, Brett
Re: Cloning shards = cloning collections
Hi, As a matter of fact, what about exposing a new Collection API CLONE command and having Solr simply copy all the needed shards and replicas at the FS level, would that work (or not because of different Directory implementations that may not all lend themselves to being simply copied)? Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Nov 25, 2013 at 12:10 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, In http://search-lucene.com/m/O1O2r14sU811 Shalin wrote: The splitting process is nothing but the creation of a bitset with which a LiveDocsReader is created. These readers are then added to the a new index via IW.addIndexes(IndexReader[] readers) method. ... which makes me wonder couldn't the same mechanism be used to clone shards and thus allow us to clone/duplicate a whole collection? A handy feature, IMHO. Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: ConcurrentModificationException from XMLResponseWriter
Shawn, We have custom search handlers that uses in built components - result and facet to generate the results. I see that our facet generation is using the LinkedHashMap. I will revisit my code. Thanks for the advise!!! We are migrating to Solr4 soon :) Thanks On Monday, November 25, 2013 11:28 AM, Shawn Heisey s...@elyograg.org wrote: On 11/25/2013 8:43 AM, Shyamsunder R Mutcha wrote: Following exception is found in solr logs. We are using Solr 3.2. As the stack trace is not referring to any application classes, I couldn't figure out the piece of code that throws this exception. Is there any way to debug this issue? Is it related to the issue ConcurrentModificationException from BinaryResponseWriter Nov 25, 2013 7:10:56 AM org.apache.solr.common.SolrException log SEVERE: java.util.ConcurrentModificationException at java.util.LinkedHashMap$LinkedHashIterator.nextEntry(LinkedHashMap.java:373) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:392) at java.util.LinkedHashMap$EntryIterator.next(LinkedHashMap.java:391) at org.apache.solr.response.XMLWriter.writeMap(XMLWriter.java:644) The exception is coming from LinkedHashMap, a built-in Java object type. http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html The code that made the call that's failing is line 644 of this source code file: solr/core/src/java/org/apache/solr/response/XMLWriter.java I looked at the 3.2 source code. What's going on here is fairly normal - it's interating through a Map and outputting the data contained there to the writer. The actual problem is occurring elsewhere, it's only showing up in XMLWriter due to the way LinkedHashMap objects work. Another thread has modified the Map while the iterator is being used. This is something you're not allowed to do with this object type, so it throws the exception. I can't find any existing Solr bugs, so the question is: Are you using any custom code with Solr? Perhaps something you downloaded or purchased, or something you wrote in-house? If so, then that code has some bugs. If this *is* a bug in Solr 3.x, it is highly unlikely that it will get fixed, at least in a 3.x version. If it still exists in version 4.x (which is unlikely), then it will get fixed there. Version 3.2 is two years old, and the entire 3.x branch is in maintenance mode, meaning that only EXTREMELY severe bugs will be fixed. Thanks, Shawn
Revolution writeup
I just posted a writeup of the Lucene/Solr Revolution Dublin conference. I've been waiting for videos to become available, but I got impatient. Slides are there, mostly though. Sorry if I missed your talk -- I'm hoping to catch up when the videos are posted... http://blog.safariflow.com/2013/11/25/this-revolution-will-be-televised/ -Mike Sokolov
Re: Solr 4.x : how to implement an update processor chain working for partial updates
: : Is there a way to have solr execute my ConditionalCopyProcessor on the : actual updated doc (the one resulting from solr retrieving all stored values : and merging with update request values), and not on the request doc ? Partial Updates, and loading the existing stored fields of a document that is being partially updated, happens in the DistributedUpdateProcessor as part of hte leader logic (so that we can be confident we have the correct field values and _version_ info even if there are competing updates to the same document) if you configure your update processor to happen *after* the DistributedUpdateProcessor, then the document will be fuly populated -- unfortunatly. the down side however is that your processorwill be run redundently on each replica, which can be anoying if it's a resource intensive update processor or requires hitting an external resource. NOTE: even if you aren't using SolrCloud, you still get an implicit instance of DistributedUpdateProcessor precisely so that partial updates will work... https://wiki.apache.org/solr/UpdateRequestProcessor#Distributed_Updates -Hoss
Re: In a functon query, I can't get the ValueSource when extend ValueSourceParser
I'm not sure i understand your question - largely because you've only provided a small sample of information aboutwhat you are doing, and not giving a full picture. what are you actually trying to accomplish? With your custom ValueSourceParser, what input are you sending to solr that generates that error? what does your DateFunction do? Best i can tell from the information provided, you've registered your DateSourceParser using the name 'dateDeboost' (just a guess, you never actaully said) and then you tried using it in a request in some way (boost function?) as 'dateDeboost()' (just guessing based on the error message) In which case this error is entirely expected, because your parse implementation says that you expect your function to be passed as input another vlaue source -- but when you called your function (in the input string 'dateDeboost()') you didn't specify any arguments at all - let alone an input argument that could be evaluated as a nested ValueSource. : Date: Mon, 25 Nov 2013 02:11:43 -0800 (PST) : From: sling sling...@gmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: In a functon query, : I can't get the ValueSource when extend ValueSourceParser : : hi, : I am working with solr4.1. : When I don't parseValueSource, my function query works well. The code is : like this: : public class DateSourceParser extends ValueSourceParser { : @Override : public void init(NamedList namedList) { : } : @Override : *public ValueSource parse(FunctionQParser fp) throws SyntaxError { : return new DateFunction(); : }* : } : : When I want to use the ValueSource, like this: : public class DateSourceParser extends ValueSourceParser { : @Override : public void init(NamedList namedList) { : } : @Override : *public ValueSource parse(FunctionQParser fp) throws SyntaxError { : ValueSource source = fp.parseValueSource(); : return new DateFunction(source); : }* : } : : fp.parseValueSource() throws an error like this: : ERROR [org.apache.solr.core.SolrCore] - : org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: : Expected identifier at pos 12 str='dateDeboost()' : at : org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:147) : at : org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) : at : org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) : at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) : at : org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448) : at : org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269) : at : com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:70) : at : com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:173) : at : com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:229) : at : com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:274) : at com.caucho.server.port.TcpConnection.run(TcpConnection.java:514) : at com.caucho.util.ThreadPool.runTasks(ThreadPool.java:527) : at com.caucho.util.ThreadPool.run(ThreadPool.java:449) : at java.lang.Thread.run(Thread.java:662) : Caused by: org.apache.solr.search.SyntaxError: Expected identifier at pos 12 : str='dateDeboost()' : at : org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:747) : at : org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:726) : at : org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:345) : at : org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:223) : at : org.sling.solr.custom.DateSourceParser.parse(DateSourceParser.java:24) : at : org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:352) : at : org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68) : at org.apache.solr.search.QParser.getQuery(QParser.java:142) : at : org.apache.solr.search.BoostQParserPlugin$1.parse(BoostQParserPlugin.java:61) : at org.apache.solr.search.QParser.getQuery(QParser.java:142) : at : org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:117) : ... 13 more : : : so, how to make fp.parseValueSource() work? : : Thanks!!! : : sling : : : : : : -- : View this message in context: http://lucene.472066.n3.nabble.com/In-a-functon-query-I-can-t-get-the-ValueSource-when-extend-ValueSourceParser-tp4103026.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss
Re: csv does not return custom fields (distance)
It's a known issue, support for returning psuedo-fields in the CSV response writer was never implemented. Need someone to spend some time working up a patch to add it... https://issues.apache.org/jira/browse/SOLR-5423 : Date: Wed, 20 Nov 2013 20:55:53 -0800 (PST) : From: GaneshSe ganeshmail...@gmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: csv does not return custom fields (distance) : : I am using spacial search feature in Solr (4.0) version. : : When I try to extract the csv (using wt=csv option) using the edismax : parser, I dont get all the fields in the CSV output as specified in the fl : parameter. Only the schema fields are coming out in CSV and the score, the : custom fields like distance as specified/bolded below does not come out in : the csv file. But i am able to get the same in the wt=xml option. : : q=+(Name:abcd)sfield=locationrows=100defType=edismaxpt=40.721587,-73.886938q.op=ORisShard=truestart=0fl=*,score,*dist:geodist()*wt=csv : : Above is not complete query : : I would like to have distance in the CSV output, any help please? : : : : : -- : View this message in context: http://lucene.472066.n3.nabble.com/csv-does-not-return-custom-fields-distance-tp4102313.html : Sent from the Solr - User mailing list archive at Nabble.com. : -Hoss
New to Solr - Need advice on clustering
Hi Solr-users I’m trying to setup Solr for search and indexing on the project I’m working on. My project is a e-commerce B2B solution. We are planning on setting up 2 frontend servers for the website, and I was planning on installing Solr on these servers. We are using Windows Server 2012 for the frontend servers. We are not expecting a huge load on the servers, so we expect these 2 servers to be adequate to handle both the website and search index. I have been looking at SolrCloud and ZooKeeper. Howver I have read that you need at least 3 ZooKeepers in an ensamble, and I only have 2 servers. I need to handle the situation where one of the servers crashes, so I need both servers to have a Solr index. Do you have any advise on the best setup for my situation? Thank you for your help. Regards Anders Olsen
POLL: Solr vs. SolrCloud usage
Hi, It would be great to see what Solr people are using - Solr or SolrCloud: Vote == http://blog.sematext.com/2013/11/25/poll-solr-cloud-usage/ Here are a couple of old polls, if you are curious about this sort of stuff like I am: * http://blog.sematext.com/2013/02/25/poll-solr-cloud-or-not/ -- from 9 months ago * http://blog.sematext.com/2013/02/15/poll-which-solr-version-are-you-using/ Thanks, Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/
Re: building custom cache - using lucene docids
On Sun, Nov 24, 2013 at 8:31 AM, Erick Erickson erickerick...@gmail.comwrote: bq: Do i understand you correctly that when two segmets get merged, the docids (of the original segments) remain the same? The original segments are unchanged, segments are _never_ changed after they're closed. But they'll be thrown away. Say you have segment1 and segment2 that get merged into segment3. As soon as the last searcher that is looking at segment1 and segment2 is closed, those two segments will be deleted from your disk. But for any given doc, the docid in segment3 will very likely be different than it was in segment1 or 2. i'm trying to figure this out - i'll have to dig, i suppose. for example, if the docbase (the docid offset per searcher) was stored together with the index segment, that would be an indication of 'relative stability of docids' I think you're reading too much into LUCENE-2897. I'm pretty sure the segment in question is not available to you anyway before this rewrite is done, but freely admit I don't know much about it. i've done tests, committing and overwriting a document and saw (SOLR4.0) that docids are being recycled. I deleted 2 docs, then added a new document and guess what: the new document had the docid of the previously deleted document (but different fields). That was new to me, so I searched and found the LUCENE-2897 which seemed to explain that behaviour. You're probably going to get into the whole PerSegment family of operations, which is something I'm not all that familiar with so I'll leave explanations to others. Thank you, it is useful to get insights from various sides, roman On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Erick, Many thanks for the info. An additional question: Do i understand you correctly that when two segmets get merged, the docids (of the original segments) remain the same? (unless, perhaps in situation, they were merged using the last index segment which was opened for writing and where the docids could have suddenly changed in a commit just before the merge) Yes, you guessed right that I am putting my code into the custom cache - so it gets notified on index changes. I don't know yet how, but I think I can find the way to the current active, opened (last) index segment. Which is actively updated (as opposed to just being merged) -- so my definition of 'not last ones' is: where docids don't change. I'd be grateful if someone could spot any problem with such assumption. roman On Sat, Nov 23, 2013 at 7:39 PM, Erick Erickson erickerick...@gmail.com wrote: bq: But can I assume that docids in other segments (other than the last one) will be relatively stable? Kinda. Maybe. Maybe not. It depends on how you define other than the last one. The key is that the internal doc IDs may change when segments are merged. And old segments get merged. Doc IDs will _never_ change in a segment once it's closed (although as you note they may be marked as deleted). But that segment may be written to a new segment when merging and the internal ID for a given document in the new segment bears no relationship to internal ID in the old segment. BTW, I think you only really care when opening a new searchers. There is a UserCache (see solrconfig.xml) that gets notified when a new searcher is being opened to give it an opportunity to refresh itself, is that useful? As long as a searcher is open, it's guaranteed that nothing is changing. Hard commits with openSearcher=false don't open new searchers, which is why changes aren't visible until a softCommit or a hard commit with openSearcher=true despite the fact that the segments are closed. FWIW, Erick Best Erick On Sat, Nov 23, 2013 at 12:40 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, docids are 'ephemeral', but i'd still like to build a search cache with them (they allow for the fastest joins). i'm seeing docids keep changing with updates (especially, in the last index segment) - as per https://issues.apache.org/jira/browse/LUCENE-2897 That would be fine, because i could build the cache from diff (of index state) + reading the latest index segment in its entirety. But can I assume that docids in other segments (other than the last one) will be relatively stable? (ie. when an old doc is deleted, the docid is marked as removed; update doc = delete old create a new docid)? thanks roman
Re: building custom cache - using lucene docids
On Sun, Nov 24, 2013 at 10:44 AM, Jack Krupansky j...@basetechnology.comwrote: We should probably talk about internal Lucene document IDs and external or rebased Lucene document IDs. The internal document IDs are always per-segment and never, ever change for that closed segment. But... the application would not normally see these IDs. Usually the externally visible Lucene document IDs have been rebased to add the sum total count of documents (both existing and deleted) of all preceding segments to the document IDs of a given segment, producing a global (across the full index of all segments) Lucene document ID. So, if you have those three segments, with deleted documents in the first two segments, and then merge those first two segments, the externally-visible Lucene document IDs for the third segment will suddenly all be different, shifted lower by the number of deleted documents that were just merged away, even though nothing changed in the third segment itself. That's right, and I'm starting to think that if i keep the segment id and the original offset, i don't need to rebuild that part of the cache, because it has not been rebased (but I can always update the deleted docs). It seems simple so I'm suspecting to find a catch somewhere. but if it works, that could potentially speed up any cache building Do you have information where the docbase of the segment are stored? Or which java class I should start my exploration from? [it is somewhat sprawling complex, so I'm bit lost :)] Maybe these should be called local (to the segment) Lucene document IDs and global (across all segment) Lucene document IDs. Or, maybe internal vs. external is good enough. In short, it is completely safe to use and save Lucene document IDs, but only as long as no merging of segments is performed. Even one tiny merge and all subsequent saved document IDs are invalidated. Be careful with your merge policy - normally merges are happening in the background, automatically. my tests, as per previous email, showed that the last segment docid's are not that stable. I don't know if it matters that I used the RAMDirectory for the test, but the docids were being 'recycled' - the deleted docs were in the previous segment, then suddently their docids were inside newly added documents (so maybe solr/lucene is not counting deleted docs, if they are at the end of a segment...?) i don't know. i'll need to explore the index segments to understand what was going on there, thanks for any possible pointers roman -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Sunday, November 24, 2013 8:31 AM To: solr-user@lucene.apache.org Subject: Re: building custom cache - using lucene docids bq: Do i understand you correctly that when two segmets get merged, the docids (of the original segments) remain the same? The original segments are unchanged, segments are _never_ changed after they're closed. But they'll be thrown away. Say you have segment1 and segment2 that get merged into segment3. As soon as the last searcher that is looking at segment1 and segment2 is closed, those two segments will be deleted from your disk. But for any given doc, the docid in segment3 will very likely be different than it was in segment1 or 2. I think you're reading too much into LUCENE-2897. I'm pretty sure the segment in question is not available to you anyway before this rewrite is done, but freely admit I don't know much about it. You're probably going to get into the whole PerSegment family of operations, which is something I'm not all that familiar with so I'll leave explanations to others. On Sat, Nov 23, 2013 at 8:22 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Erick, Many thanks for the info. An additional question: Do i understand you correctly that when two segmets get merged, the docids (of the original segments) remain the same? (unless, perhaps in situation, they were merged using the last index segment which was opened for writing and where the docids could have suddenly changed in a commit just before the merge) Yes, you guessed right that I am putting my code into the custom cache - so it gets notified on index changes. I don't know yet how, but I think I can find the way to the current active, opened (last) index segment. Which is actively updated (as opposed to just being merged) -- so my definition of 'not last ones' is: where docids don't change. I'd be grateful if someone could spot any problem with such assumption. roman On Sat, Nov 23, 2013 at 7:39 PM, Erick Erickson erickerick...@gmail.com wrote: bq: But can I assume that docids in other segments (other than the last one) will be relatively stable? Kinda. Maybe. Maybe not. It depends on how you define other than the last one. The key is that the internal doc IDs may change when segments are merged. And old segments get merged. Doc IDs will _never_ change
Re: building custom cache - using lucene docids
On Mon, Nov 25, 2013 at 12:54 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Roman, I don't fully understand your question. After segment is flushed it's never changed, hence segment-local docids are always the same. Due to merge segment can gone, its' docs become new ones in another segment. This is true for 'global' (Solr-style) docnums, which can flip after merge is happened in the middle of the segments' chain. As well you are saying about segmented cache I can propose you to look at CachingWrapperFilter and NoOpRegenerator as a pattern for such data structures. Thanks Mikhail, the CWF confirms that the idea of regenerating just part of the cache is doable. The CacheRegenerators, on the other hand, make no sense to me - and they are not given any 'signals', so they don't know if they are in the middle of some regeneration or not, and they should not keep a state (of previous index) - as they can be shared by threads that build the cache Best, roman On Sat, Nov 23, 2013 at 9:40 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, docids are 'ephemeral', but i'd still like to build a search cache with them (they allow for the fastest joins). i'm seeing docids keep changing with updates (especially, in the last index segment) - as per https://issues.apache.org/jira/browse/LUCENE-2897 That would be fine, because i could build the cache from diff (of index state) + reading the latest index segment in its entirety. But can I assume that docids in other segments (other than the last one) will be relatively stable? (ie. when an old doc is deleted, the docid is marked as removed; update doc = delete old create a new docid)? thanks roman -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: New to Solr - Need advice on clustering
On 26 November 2013 01:44, Anders Kåre Olsen a...@mail.dk wrote: Hi Solr-users I’m trying to setup Solr for search and indexing on the project I’m working on. My project is a e-commerce B2B solution. We are planning on setting up 2 frontend servers for the website, and I was planning on installing Solr on these servers. We are using Windows Server 2012 for the frontend servers. We are not expecting a huge load on the servers, so we expect these 2 servers to be adequate to handle both the website and search index. I have been looking at SolrCloud and ZooKeeper. Howver I have read that you need at least 3 ZooKeepers in an ensamble, and I only have 2 servers. I need to handle the situation where one of the servers crashes, so I need both servers to have a Solr index. [...] If you do not want to get into SolrCloud, a simpler solution might be a HTTP load balancer in front of the two Solr instances. Hardware load balancers are better, but more expensive. A software load balancer like haproxy should meet your needs. Regards, Gora
Re: Solr 4.x : how to implement an update processor chain working for partial updates
SOLR-5395 just out with 4.6 might have some relevance here (RunAlways marker interface for UpdateRequestProcessorFactory). Not sure how it affects partial updates though. Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Nov 26, 2013 at 1:44 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : : Is there a way to have solr execute my ConditionalCopyProcessor on the : actual updated doc (the one resulting from solr retrieving all stored values : and merging with update request values), and not on the request doc ? Partial Updates, and loading the existing stored fields of a document that is being partially updated, happens in the DistributedUpdateProcessor as part of hte leader logic (so that we can be confident we have the correct field values and _version_ info even if there are competing updates to the same document) if you configure your update processor to happen *after* the DistributedUpdateProcessor, then the document will be fuly populated -- unfortunatly. the down side however is that your processorwill be run redundently on each replica, which can be anoying if it's a resource intensive update processor or requires hitting an external resource. NOTE: even if you aren't using SolrCloud, you still get an implicit instance of DistributedUpdateProcessor precisely so that partial updates will work... https://wiki.apache.org/solr/UpdateRequestProcessor#Distributed_Updates -Hoss
Re: In a functon query, I can't get the ValueSource when extend ValueSourceParser
Thanks a lot for your reply, Chris. I was trying to sort the query result by the Datefunction, by passing q={!boost b=dateDeboost()}title:test to the /select request-handler. Before, my custom DateFunction is like this: public class DateFunction extends FieldCacheSource { private static final long serialVersionUID = 6752223682280098130L; private static long now; public DateFunction(String field) { super(field); now = System.currentTimeMillis(); } @Override public FunctionValues getValues(Map context, AtomicReaderContext readerContext) throws IOException { long[] times = cache.getLongs(readerContext.reader(), field, false); final float[] weights = new float[times.length]; for (int i = 0; i times.length; i++) { weights[i] = ScoreUtils.getNewsScoreFactor(now, times[i]); } return new FunctionValues() { @Override public float floatVal(int doc) { return weights[doc]; } }; } } It calculate every documet's date-weight, but at the same time , it only need the one doc's date-weight, so it run slowly. When I see the source code of recip function in org.apache.solr.search.ValueSourceParser, like this: addParser(recip, new ValueSourceParser() { @Override public ValueSource parse(FunctionQParser fp) throws SyntaxError { ValueSource source = fp.parseValueSource(); float m = fp.parseFloat(); float a = fp.parseFloat(); float b = fp.parseFloat(); return new ReciprocalFloatFunction(source, m, a, b); } }); and in the ReciprocalFloatFunction, it get the value like this: @Override public FunctionValues getValues(Map context, AtomicReaderContext readerContext) throws IOException { final FunctionValues vals = source.getValues(context, readerContext); return new FloatDocValues(this) { @Override public float floatVal(int doc) { return a/(m*vals.floatVal(doc) + b); } @Override public String toString(int doc) { return Float.toString(a) + /( + m + *float( + vals.toString(doc) + ')' + '+' + b + ')'; } }; } So I think this is what I want. When calculate a doc's date-weight, I needn't cache.getLongs(x), instead, I should source.getValues(xxx) Therefore I change my code, but when fp.parseValueSource(), it throws an error like this: org.apache.solr.search.SyntaxError: Expected identifier at pos 12 str='dateDeboost()' Do I describe clearly this time? Thanks again! sling -- View this message in context: http://lucene.472066.n3.nabble.com/In-a-functon-query-I-can-t-get-the-ValueSource-when-extend-ValueSourceParser-tp4103026p4103207.html Sent from the Solr - User mailing list archive at Nabble.com.
Please help me to understand debugQuery output
Hello All, Can any one help me in understanding debugQuery output like this. lst name=explain str 0.6276088 = (MATCH) sum of: 0.6276088 = (MATCH) max of: 0.18323982 = (MATCH) sum of: 0.18323982 = (MATCH) weight(state_search:a in 327) [DefaultSimilarity], result of: 0.18323982 = score(doc=327,freq=2.0 = termFreq=2.0 ), product of: 0.3188151 = queryWeight, product of: 3.2512918 = idf(docFreq=35, maxDocs=342) 0.098057985 = queryNorm 0.5747526 = fieldWeight in 327, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.2512918 = idf(docFreq=35, maxDocs=342) 0.125 = fieldNorm(doc=327) 0.2505932 = (MATCH) sum of: 0.2505932 = (MATCH) weight(country_search:a in 327) [DefaultSimilarity], result of: 0.2505932 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.3135134 = queryWeight, product of: 3.1972246 = idf(docFreq=37, maxDocs=342) 0.098057985 = queryNorm 0.79930615 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.1972246 = idf(docFreq=37, maxDocs=342) 0.25 = fieldNorm(doc=327) 0.25283098 = (MATCH) sum of: 0.25283098 = (MATCH) weight(area_search:a in 327) [DefaultSimilarity], result of: 0.25283098 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.398 = queryWeight, product of: 4.06 = idf(docFreq=15, maxDocs=342) 0.098057985 = queryNorm 0.6347222 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.06 = idf(docFreq=15, maxDocs=342) 0.15625 = fieldNorm(doc=327) 0.6276088 = (MATCH) sum of: 0.12957011 = (MATCH) weight(city_search:a in 327) [DefaultSimilarity], result of: 0.12957011 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.3188151 = queryWeight, product of: 3.2512918 = idf(docFreq=35, maxDocs=342) 0.098057985 = queryNorm 0.40641147 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.2512918 = idf(docFreq=35, maxDocs=342) 0.125 = fieldNorm(doc=327) 0.3638727 = (MATCH) weight(city_search:ab in 327) [DefaultSimilarity], result of: 0.3638727 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.5342705 = queryWeight, product of: 5.4485164 = idf(docFreq=3, maxDocs=342) 0.098057985 = queryNorm 0.68106455 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.4485164 = idf(docFreq=3, maxDocs=342) 0.125 = fieldNorm(doc=327) 0.13416591 = (MATCH) weight(city_search:b in 327) [DefaultSimilarity], result of: 0.13416591 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.32441998 = queryWeight, product of: 3.3084502 = idf(docFreq=33, maxDocs=342) 0.098057985 = queryNorm 0.41355628 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.3084502 = idf(docFreq=33, maxDocs=342) 0.125 = fieldNorm(doc=327) /str Any links where this explaination is explained ? Thanks -- Amit Aggarwal 8095552012
Re: a function query of time, frequency and score.
Thanks, Erick. What I want to do is custom the sort by date, time, and number. I want to know is there some formula to tackle this. Thanks again! sling On Fri, Nov 22, 2013 at 9:11 PM, Erick Erickson [via Lucene] ml-node+s472066n4102599...@n3.nabble.com wrote: Not quite sure what you're asking. The field() function query brings the value of a field into the score, something like: http://localhost:8983/solr/select?wt=jsonfl=id%20scoreq={!boost%20b=field(popularity)}ipod Best, Erick On Thu, Nov 21, 2013 at 10:43 PM, sling [hidden email]http://user/SendEmail.jtp?type=nodenode=4102599i=0 wrote: Hi, guys. I indexed 1000 documents, which have fields like title, ptime and frequency. The title is a text fild, the ptime is a date field, and the frequency is a int field. Frequency field is ups and downs. say sometimes its value is 0, and sometimes its value is 999. Now, in my app, the query could work with function query well. The function query is implemented as the score multiplied by an decreased date-weight array. However, I have got no idea to add the frequency to this formula... so could someone give me a clue? Thanks again! sling -- View this message in context: http://lucene.472066.n3.nabble.com/a-function-query-of-time-frequency-and-score-tp4102531.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/a-function-query-of-time-frequency-and-score-tp4102531p4102599.html To unsubscribe from a function query of time, frequency and score., click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4102531code=c2xpbmczNThAZ21haWwuY29tfDQxMDI1MzF8NzMyOTA2Njg2 . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/a-function-query-of-time-frequency-and-score-tp4102531p4103216.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: building custom cache - using lucene docids
OK, I've spent some time reading the solr/lucene4x classes, and this is myunderstanding (feel free to correct me ;-)) DirectoryReader holds the opened segments -- each segment has its own reader, the BaseCompositeReader (or extended classes thereof) store the offsets per each segment; eg. [0, 5, 22] - meaning, there are 2 segments, with 5, and 17 docs respectively The segments are listed in the segments_N file, http://lucene.apache.org/core/3_0_3/fileformats.html#Segments File So theoretically, order of segments could change when merge happens - yet, every SegmentReader is identified by unique name and this name doesn't change unless the segment itself changed (ie. docs were deleted; or got more docs) - so it is possible to rely on this name to know what has not changed the name is coming from SegmentInfo (check its toString method) -- the SegmentInfo has a method equals() that will consider as equal the readers with the same name and the same dir (which is useful to know - two readers, one with deletes, one without, are equal) Lucene's FieldCache itself is rather complex, but it shows there is a very clever mechanism (a few actually!) -- a class can register a listener that will be called whenever an index segments is being closed (this could be used to invalidate portions of a cache), the relevant classes are: SegmentReader.CoreClosedListener, IndexReader.ReaderClosedListener But Lucene is using this mechanism only to purge the cache - so effectively, every commits triggers cache rebuild. This is the interesting bit: lots of work could be spared if segments data were reused (but admittedly, only sometimes - for data that was fully read into memory, for anything else, such as terms, the cache reads only some values and is fetching the rest from the index - so Lucene must close the reader and rebuild the cache on every commit; but that is not my case, as I am to copy values from an index, and store them in memory...) the weird 'recyclation' of docids I've observed can probably be explained by the fact that the index reader contains segments and near realtime readers (but I'm not sure about this) To conclude: it is possible to build a cache that updates itself (with only changes committed since the last build) - this will have impact on how fast new searcher is ready to serve requests HTH somebody else too :) roman On Mon, Nov 25, 2013 at 7:54 PM, Roman Chyla roman.ch...@gmail.com wrote: On Mon, Nov 25, 2013 at 12:54 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Roman, I don't fully understand your question. After segment is flushed it's never changed, hence segment-local docids are always the same. Due to merge segment can gone, its' docs become new ones in another segment. This is true for 'global' (Solr-style) docnums, which can flip after merge is happened in the middle of the segments' chain. As well you are saying about segmented cache I can propose you to look at CachingWrapperFilter and NoOpRegenerator as a pattern for such data structures. Thanks Mikhail, the CWF confirms that the idea of regenerating just part of the cache is doable. The CacheRegenerators, on the other hand, make no sense to me - and they are not given any 'signals', so they don't know if they are in the middle of some regeneration or not, and they should not keep a state (of previous index) - as they can be shared by threads that build the cache Best, roman On Sat, Nov 23, 2013 at 9:40 AM, Roman Chyla roman.ch...@gmail.com wrote: Hi, docids are 'ephemeral', but i'd still like to build a search cache with them (they allow for the fastest joins). i'm seeing docids keep changing with updates (especially, in the last index segment) - as per https://issues.apache.org/jira/browse/LUCENE-2897 That would be fine, because i could build the cache from diff (of index state) + reading the latest index segment in its entirety. But can I assume that docids in other segments (other than the last one) will be relatively stable? (ie. when an old doc is deleted, the docid is marked as removed; update doc = delete old create a new docid)? thanks roman -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: New to Solr - Need advice on clustering
Hi Gora Thank you for your reply. We are planning on having a loadbalancer in front of our frontend servers. If I have two distinct solr indexes, how will I keep them synchronized? I expect that one of the frontend servers will have the task of updating the product repository on the e-commerce site. This server will then update the local solr index after product update has finished. Is there an easy way that I can keep the two indexes synchronized without solrcloud? Regards Anders -Oprindelig meddelelse- From: Gora Mohanty Sent: Tuesday, November 26, 2013 2:37 AM To: solr-user@lucene.apache.org Subject: Re: New to Solr - Need advice on clustering On 26 November 2013 01:44, Anders Kåre Olsen a...@mail.dk wrote: Hi Solr-users I’m trying to setup Solr for search and indexing on the project I’m working on. My project is a e-commerce B2B solution. We are planning on setting up 2 frontend servers for the website, and I was planning on installing Solr on these servers. We are using Windows Server 2012 for the frontend servers. We are not expecting a huge load on the servers, so we expect these 2 servers to be adequate to handle both the website and search index. I have been looking at SolrCloud and ZooKeeper. Howver I have read that you need at least 3 ZooKeepers in an ensamble, and I only have 2 servers. I need to handle the situation where one of the servers crashes, so I need both servers to have a Solr index. [...] If you do not want to get into SolrCloud, a simpler solution might be a HTTP load balancer in front of the two Solr instances. Hardware load balancers are better, but more expensive. A software load balancer like haproxy should meet your needs. Regards, Gora
HttpSolrServer - Http Client Connection pooling issue
Hi, Hopefully I am mailing the correct mailid for solr issue. If not then please let me know accordingly. We are using Solr 4.3.1 and we are using HttpSolrServer for querying Solr. We are trying to do a load and stress test using Jmeter and we can see that after certain requests Solr responds in very unusual way. It gets stuck and responds only after sometime. Upon checking the Http Connections we realized that there are so many open connections that are not closed. My questions are: 1. Is there a way to do HTTP connection pooling ? Note that HttpSolrServer instance is static. 2. Can I configure Http Connections using solrconfig file ? Any pointers would be very helpful. -- Thanks, Gaurav
Re: Setting solr.data.dir for SolrCloud instance
Thanks for the reply, Erick. Actually, I didnt not think this through. I just thought it would be a good idea to separate the data from the application code. I guess I'll leave it without setting the datadir parameter and add a symlink. -- View this message in context: http://lucene.472066.n3.nabble.com/Setting-solr-data-dir-for-SolrCloud-instance-tp4103052p4103228.html Sent from the Solr - User mailing list archive at Nabble.com.
Storing solr results in excel
Hi, i am getting two field values from excel and querying solr to give top 1 results. But i need to store the results in another excel sheet. Anyone help me how to store solr results in excel file using solrj Regards, Kumar. -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-solr-results-in-excel-tp4103237.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Storing solr results in excel
wt=csv ? On Tue, Nov 26, 2013 at 11:09 AM, kumar pavan2...@gmail.com wrote: Hi, i am getting two field values from excel and querying solr to give top 1 results. But i need to store the results in another excel sheet. Anyone help me how to store solr results in excel file using solrj Regards, Kumar. -- View this message in context: http://lucene.472066.n3.nabble.com/Storing-solr-results-in-excel-tp4103237.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Please help me to understand debugQuery output
You might want to look at the Solr Relavancy for the same. http://wiki.apache.org/solr/SolrRelevancyFAQ http://wiki.apache.org/solr/SolrRelevancyFAQ Also, it will even better if you look at the link above with outcome you want to get to, like Want to know why this document is better than the other or Why the document in my db did not come up? Amit Aggarwal wrote Hello All, Can any one help me in understanding debugQuery output like this. lst name=explain str 0.6276088 = (MATCH) sum of: 0.6276088 = (MATCH) max of: 0.18323982 = (MATCH) sum of: 0.18323982 = (MATCH) weight(state_search:a in 327) [DefaultSimilarity], result of: 0.18323982 = score(doc=327,freq=2.0 = termFreq=2.0 ), product of: 0.3188151 = queryWeight, product of: 3.2512918 = idf(docFreq=35, maxDocs=342) 0.098057985 = queryNorm 0.5747526 = fieldWeight in 327, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 3.2512918 = idf(docFreq=35, maxDocs=342) 0.125 = fieldNorm(doc=327) 0.2505932 = (MATCH) sum of: 0.2505932 = (MATCH) weight(country_search:a in 327) [DefaultSimilarity], result of: 0.2505932 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.3135134 = queryWeight, product of: 3.1972246 = idf(docFreq=37, maxDocs=342) 0.098057985 = queryNorm 0.79930615 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.1972246 = idf(docFreq=37, maxDocs=342) 0.25 = fieldNorm(doc=327) 0.25283098 = (MATCH) sum of: 0.25283098 = (MATCH) weight(area_search:a in 327) [DefaultSimilarity], result of: 0.25283098 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.398 = queryWeight, product of: 4.06 = idf(docFreq=15, maxDocs=342) 0.098057985 = queryNorm 0.6347222 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.06 = idf(docFreq=15, maxDocs=342) 0.15625 = fieldNorm(doc=327) 0.6276088 = (MATCH) sum of: 0.12957011 = (MATCH) weight(city_search:a in 327) [DefaultSimilarity], result of: 0.12957011 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.3188151 = queryWeight, product of: 3.2512918 = idf(docFreq=35, maxDocs=342) 0.098057985 = queryNorm 0.40641147 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.2512918 = idf(docFreq=35, maxDocs=342) 0.125 = fieldNorm(doc=327) 0.3638727 = (MATCH) weight(city_search:ab in 327) [DefaultSimilarity], result of: 0.3638727 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.5342705 = queryWeight, product of: 5.4485164 = idf(docFreq=3, maxDocs=342) 0.098057985 = queryNorm 0.68106455 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 5.4485164 = idf(docFreq=3, maxDocs=342) 0.125 = fieldNorm(doc=327) 0.13416591 = (MATCH) weight(city_search:b in 327) [DefaultSimilarity], result of: 0.13416591 = score(doc=327,freq=1.0 = termFreq=1.0 ), product of: 0.32441998 = queryWeight, product of: 3.3084502 = idf(docFreq=33, maxDocs=342) 0.098057985 = queryNorm 0.41355628 = fieldWeight in 327, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.3084502 = idf(docFreq=33, maxDocs=342) 0.125 = fieldNorm(doc=327) /str Any links where this explaination is explained ? Thanks -- Amit Aggarwal 8095552012 -- View this message in context: http://lucene.472066.n3.nabble.com/Please-help-me-to-understand-debugQuery-output-tp4103210p4103241.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: New to Solr - Need advice on clustering
Anders, Take a look at Solr Replication. Essentially, you'll treat one as a master one as a slave. Both master slave can be used to serve traffic. If one of them goes down, the other can be used as a master for the interim. http://wiki.apache.org/solr/SolrReplication Sameer. -- http://measuredsearch.com On Mon, Nov 25, 2013 at 9:50 PM, Anders Kåre Olsen a...@mail.dk wrote: Hi Gora Thank you for your reply. We are planning on having a loadbalancer in front of our frontend servers. If I have two distinct solr indexes, how will I keep them synchronized? I expect that one of the frontend servers will have the task of updating the product repository on the e-commerce site. This server will then update the local solr index after product update has finished. Is there an easy way that I can keep the two indexes synchronized without solrcloud? Regards Anders -Oprindelig meddelelse- From: Gora Mohanty Sent: Tuesday, November 26, 2013 2:37 AM To: solr-user@lucene.apache.org Subject: Re: New to Solr - Need advice on clustering On 26 November 2013 01:44, Anders Kåre Olsen a...@mail.dk wrote: Hi Solr-users I’m trying to setup Solr for search and indexing on the project I’m working on. My project is a e-commerce B2B solution. We are planning on setting up 2 frontend servers for the website, and I was planning on installing Solr on these servers. We are using Windows Server 2012 for the frontend servers. We are not expecting a huge load on the servers, so we expect these 2 servers to be adequate to handle both the website and search index. I have been looking at SolrCloud and ZooKeeper. Howver I have read that you need at least 3 ZooKeepers in an ensamble, and I only have 2 servers. I need to handle the situation where one of the servers crashes, so I need both servers to have a Solr index. [...] If you do not want to get into SolrCloud, a simpler solution might be a HTTP load balancer in front of the two Solr instances. Hardware load balancers are better, but more expensive. A software load balancer like haproxy should meet your needs. Regards, Gora