Is term~ effect available as a eDisMax param or a TokenFilter?
Hello, I am trying to match the names. In UI, I can do it by doing name~ or name~2, but I can't expect users to do that and I don't want to do pre-tokenization in the middleware to inject that. Also, only specific fields are names, people can also enter phone numbers, which I don't want to fuzz when matching their fields. I thought eDisMax allowed to specify as part of 'fl' (fl=SURNAME~1 FIRSTNAME~1) but that does not seem to work. I know there are other parameters that do take that, but they all seem to be for phrases distance, not fuzzy. So, the question is, is the same algorithm (levenshtein distance?) available in some other way, like a an TokenFilter? I know there are other name munging filters there (like Metaphone), but was curious specifically about the equivalent one. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
Re: Integrating solr with Hadoop
Thanks Eric, I will watch out for Map reduce option. It will be helpfull if I get any links to set up hadoop with solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Integrating-solr-with-Hadoop-tp4144715p4145157.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: NPE when using facets with the MLT handler.
Hi, i don't think this is ever going to work with the MLT Handler, you should use the regular SearchHandler instead. -Original message- From:SafeJava T t...@safejava.com Sent: Monday 30th June 2014 17:52 To: solr-user@lucene.apache.org Subject: NPE when using facets with the MLT handler. I am getting an NPE when using facets with the MLT handler. I googled for other npe errors with facets, but this trace looked different from the ones I found. We are using Solr 4.9-SNAPSHOT. I have reduced the query to the most basic form I can: q=id:XXXmlt.fl=mlt_fieldfacet=truefacet.field=id I changed it to facet on id, to ensure that the field was present in all results. Any ideas on how to work around this? java.lang.NullPointerException at org.apache.solr.search.facet.SimpleFacets.addFacets(SimpleFacets.java:375) at org.apache.solr.handler.MoreLikeThisHandler.handleRequestBody(MoreLikeThisHandler.java:211) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1955) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:769) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) Thanks, Tom
RE: Memory Leaks in solr 4.8.1
Hi, you can safely ignore this, it is shutting down anyway. Just don't reload the app a lot of times without actually restarting Tomcat. -Original message- From:Aman Tandon amantandon...@gmail.com Sent: Wednesday 2nd July 2014 7:22 To: solr-user@lucene.apache.org Subject: Memory Leaks in solr 4.8.1 Hi, When i am shutting down the solr i am gettng the Memory Leaks error in logs. Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks SEVERE: The web application [/solr] created a ThreadLocal with key of type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value [org.apache.solr.schema.DateField$ThreadLocalDateFormat@1d987b2]) and a value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak. Please check. With Regards Aman Tandon
Re: Understanding fieldNorm differences between 3.6.1 and 4.9 solrs
Wow - so apparently I have terrible recall and should re-read this thread I started on the same topic when upgrading from 1.4 to 3.6 and hit a very similar fieldNorm issue almost two years ago! =) http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201207.mbox/%3CCALyTvnpwZMj4zxPbK0abVpnyRJny=qauijdqmj7e3zgnv7u...@mail.gmail.com%3E In the mean time, I'm still happy to hear any new thoughts / suggestions on making similarity contiguous across upgrades. Thanks again, Aaron On Tue, Jul 1, 2014 at 11:14 PM, Aaron Daubman daub...@gmail.com wrote: In trying to determine some subtle scoring differences (causing occasionally significant ordering differences) among search results, I wrote a parser to normalize debug.explain.structured JSON output. It appears that every score that is different comes down to a difference in fieldNorm, where the 3.6.1 solr is using 0.109375 as the fieldNorm, and the 4.9 solr is using 0.125 as the fieldNorm. [1] What would be causing the different versions to use different field norms (and rather infrequently, as the majority of scores are identical as desired)? Thanks, Aaron [1] Here's a snippet of the diff (of the output from my debug.explain.structured normalizer) for one such difference (apologies for the width): 06808040cd523a296abaf26025148c85: { 06808040cd523a296abaf26025148c85: { * _value: 0.839616605, | _value: 0.854748135, * description: product of:, description: product of:, details: [ details: [ { { * _value: 2.623802, | _value: 2.67108801, * description: sum of:, description: sum of:, details: [ details: [ { { * _value: 0.0644619693, | _value: 0.0736708307, * description: weight(t_style:alternative description: weight(t_style:alternative details: [ details: [ { { _value: 0.0629802298, _value: 0.0629802298, description: queryWeight, description: queryWeight, details: [ details: [ { { _value: 4.18500798, _value: 4.18500798, description: idf(137871) description: idf(137871) } } ] ] }, }, { { * _value: 1.02352709,| _value: 1.1697453, * description: fieldWeight, description: fieldWeight, details: [ details: [ { { _value: 2.23606799, _value: 2.23606799, description: tf(freq=5) description: tf(freq=5) }, }, { { _value: 4.18500798, _value: 4.18500798, description: idf(137871) description: idf(137871) }, }, { { * _value: 0.109375, | _value: 0.125, * * description: fieldNorm description: fieldNorm* } } ] ] } } ] ] }, },
Re: How to integrate nlp in solr
Aman, I feel focusing on Question-Answering and Information Extraction components of NLP should help you achieve what you are looking for. Go through this book *Taming Text * (http://www.manning.com/ingersoll/ ) . Most of your queries should be answered including details on implementation and sample source codes. To state naively : NLP tools gives you the power to extract or interpret knowledge from text, which you basically store in the lucene index in form of fields or store along with the terms using payloads. During query processing time, you similarly gather additional knowledge from the query (using techniques like query expansion, relevance feedback, or ontologies ) and simply map those knowledge with the knowledge gained from the text. Its an effort to move to semantic retrieval rather than simple term matching. Thanks, Parnab On Wed, Jul 2, 2014 at 6:29 AM, Aman Tandon amantandon...@gmail.com wrote: Hi Alex, Thanks alex, one more thing i want to ask that so do we need to add the extra fields for those entities, e.g. Item (bags), color (blue), etc. If some how i managed to implement this nlp then i will definitely publish it on my blog :) With Regards Aman Tandon On Wed, Jul 2, 2014 at 10:34 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Not from me, no. I don't have any real examples for this ready. I suspect the path beyond the basics is VERY dependent on your data and your business requirements. I would start from thinking how would YOU (as a human) do that match. Where does the 'blue' and 'color' and 'college' and 'bags' come from. Then, figuring out what is required for Solr to know to look there. NLP is not magic, just advanced technology. You need to know where you are going to get there. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Wed, Jul 2, 2014 at 11:35 AM, Aman Tandon amantandon...@gmail.com wrote: Any help here With Regards Aman Tandon On Mon, Jun 30, 2014 at 11:00 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Alex, I was try to get knowledge from these tutorials http://www.slideshare.net/teofili/natural-language-search-in-solr https://wiki.apache.org/solr/OpenNLP: this one is kinda bit explaining but the real demo is not present. e.g. query: I want blue color college bags, then how using nlp it will work and how it will search, there is no such brief explanation out there, i will be thankful to you if you can help me in this. With Regards Aman Tandon On Mon, Jun 30, 2014 at 6:38 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: On Sun, Jun 29, 2014 at 10:19 PM, Aman Tandon amantandon...@gmail.com wrote: the appropriate results What are those specifically? You need to be a bit more precise about what you are trying to achieve. Otherwise, there are too many NLP branches and too many approaches. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
OCR - Saving multi-term position
Hello, Many of our indexed documents are scanned and OCR'ed documents. Unfortunately we were not able to improve much the OCR quality (less than 80% word accuracy) for various reasons, a fact which badly hurts the retrieval quality. As we use an open-source OCR, we think of changing every scanned term output to it's main possible variations to get a higher level of confidence. Is there any analyser that supports this kind of need or should I make up a syntax and analyser of my own, i.e the payload syntax? The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4 Thanks, Manuel
RE: Endeca to Solr Migration
We migrated a big application from Endeca (6.0, I think) a several years ago. We were not using any of the business UI tools, but we found that Solr is a lot more flexible and performant than Endeca. But with more flexibility comes more you need to know. The hardest thing was to migrate the Endeca dimensions to Solr facets. We had endeca-api specific dependencies throughout the application, even in the presentation layer. We ended up writing a bridge api that allowed us to keep our endeca-specific code and translate the queries to solr queries. We are storing a cross-reference between the N values from Endeca and key/value pairs to translate something like N=4000 to fq=Language:English. With solr, there is more you need to do in your app that the backend doesn't manage for you. In the end, though, it lets you sparate your concerns better. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: mrg81 [mailto:maya...@gmail.com] Sent: Saturday, June 28, 2014 1:11 PM To: solr-user@lucene.apache.org Subject: Endeca to Solr Migration Hello -- I wanted to get some details on Endeca to Solr Migration. I am interested in few topics: 1. We would like to migrate the Faceted Navigation, Boosting individual records and a few other items. 2. But the biggest question is about the UI [Experience Manager] - I have not found a tool that comes close to Experience Manager. I did read about Hue [In response to Gareth's question on Migration], but it seems that we will have to do a lot of customization to use that. Questions: 1. Is there a UI that we can use? Is it possible to un-hook the Experience Manager UI and point to Solr? 2. How long does a typical migration take? Assuming that we have to migrate the Faceted Navigation and Boosted records? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Endeca-to-Solr-Migration-tp4144582.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OCR - Saving multi-term position
I don't have first hand knowledge of how you implement that, but I bet a look at the WordDelimiterFilter would help you understand how to emit multiple terms with the same positions pretty easily. I've heard of this bag of word variants approach to indexing poor-quality OCR output before for findability reasons and I heard it works out OK. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 2, 2014 at 10:19 AM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Hello, Many of our indexed documents are scanned and OCR'ed documents. Unfortunately we were not able to improve much the OCR quality (less than 80% word accuracy) for various reasons, a fact which badly hurts the retrieval quality. As we use an open-source OCR, we think of changing every scanned term output to it's main possible variations to get a higher level of confidence. Is there any analyser that supports this kind of need or should I make up a syntax and analyser of my own, i.e the payload syntax? The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4 Thanks, Manuel
Customise score
Dear all, Could anybody suggest me how to customize the score? So, I have data like this .. {ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2'} {ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2'} {ID : '0003', Title :'Laptop',Price: 300,Base_score:'155.7'} Notice that I have ID field for uniqueKey. When I query q=MacBook sort=score desc it will return result something like this {ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2',score:1.45} {ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2',score:1.11} But I want solr to produce score by also using my Base_score. The score should be something like this - score = 100.2 + 1.45 = 101.65 - score = 121.2 + 1.11 = 122.31 Then the result should be something like this.. {ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2',score:122.31} {ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2',score:101.65} I'm not familia with Java so I can't write my own function as somebody do. So, which is the easiest way to do this work using existing function from solr? Thank you very much, Chun. -- View this message in context: http://lucene.472066.n3.nabble.com/Customise-score-tp4145214.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customise score
On 2 July 2014 20:32, rachun rachun.c...@gmail.com wrote: Dear all, Could anybody suggest me how to customize the score? So, I have data like this .. {ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2'} {ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2'} {ID : '0003', Title :'Laptop',Price: 300,Base_score:'155.7'} Notice that I have ID field for uniqueKey. When I query q=MacBook sort=score desc it will return result something like this {ID : '0002', Title :'MacBook',Price: 350,Base_score:'100.2',score:1.45} {ID : '0001', Title :'MacBookPro',Price: 400,Base_score:'121.2',score:1.11} But I want solr to produce score by also using my Base_score. The score should be something like this - score = 100.2 + 1.45 = 101.65 - score = 121.2 + 1.11 = 122.31 You should use Solr's sum function query: http://wiki.apache.org/solr/FunctionQuery#sum q=MacBooksort=sum(Base_score, score)+desc should do it. Regards, Gora
Re: Clubbing queries with different criterias together?
Thanks Ahmet, I tried with multiple combinations finally got it using full query as nested query. Is it fine to use full query inside nested query with filters _query_ as below. http://localhost:8983/solr/collection1/select?q=text:sharepointwt=jsonindent=trueAuthenticatedUserName=ljangra_query_:select?q=text:sharepointwt=jsonindent=truefq:acls:(*) Is it still more performant than using two separate queries? Regards. -- View this message in context: http://lucene.472066.n3.nabble.com/Clubbing-queries-with-different-criterias-together-tp4143829p4145217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customise score
Gora, firstly I would like thank you for your quick response. .../select?q=MacBooksort=SUM(base_score, score)+descwt=jsonindent=true I tried that but it didn't work and I got this error message error:{ msg:Can't determine a Sort Order (asc or desc) in sort spec 'SUM(base_score, score) desc', pos=15, code:400}} Best Regards, Chun -- View this message in context: http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145216.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OCR - Saving multi-term position
Problem here is that you wind up with a zillion unique terms in your index, which may lead to performance issues, but you probably already know that :). I've seen situations where running it through a dictionary helps. That is, does each term in the OCR match some dictionary? Problem here is that it then de-values terms that don't happen to be in the dictionary, names for instance. But to answer your question: No, there really isn't a pre-built analysis chain that i know of that does this. Root issue is how to assign confidence? No clue for your specific domain. So payloads seem quite reasonable here. Happens there's a recent end-to-end example, see: http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/ Best, Erick On Wed, Jul 2, 2014 at 7:58 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I don't have first hand knowledge of how you implement that, but I bet a look at the WordDelimiterFilter would help you understand how to emit multiple terms with the same positions pretty easily. I've heard of this bag of word variants approach to indexing poor-quality OCR output before for findability reasons and I heard it works out OK. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 2, 2014 at 10:19 AM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Hello, Many of our indexed documents are scanned and OCR'ed documents. Unfortunately we were not able to improve much the OCR quality (less than 80% word accuracy) for various reasons, a fact which badly hurts the retrieval quality. As we use an open-source OCR, we think of changing every scanned term output to it's main possible variations to get a higher level of confidence. Is there any analyser that supports this kind of need or should I make up a syntax and analyser of my own, i.e the payload syntax? The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4 Thanks, Manuel
Re: OCR - Saving multi-term position
Thanks for your answers Erick and Michael. The term confidence level is an OCR output metric which tells for every word what are the odds it's the actual scanned term. I wish the OCR prog to output all the suspected words that sum up to above ~90% of confidence it is the actual term instead of outputting a single word as default behaviour. I'm happy to hear this approach was used before, I will implement an analyser that indexes these terms in same position to enable positional queries. Hope it works on well. In case it does I will open up a Jira ticket for it. If anyone else has had experience with this use case I'd love hearing, Manuel On Wed, Jul 2, 2014 at 7:28 PM, Erick Erickson erickerick...@gmail.com wrote: Problem here is that you wind up with a zillion unique terms in your index, which may lead to performance issues, but you probably already know that :). I've seen situations where running it through a dictionary helps. That is, does each term in the OCR match some dictionary? Problem here is that it then de-values terms that don't happen to be in the dictionary, names for instance. But to answer your question: No, there really isn't a pre-built analysis chain that i know of that does this. Root issue is how to assign confidence? No clue for your specific domain. So payloads seem quite reasonable here. Happens there's a recent end-to-end example, see: http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/ Best, Erick On Wed, Jul 2, 2014 at 7:58 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I don't have first hand knowledge of how you implement that, but I bet a look at the WordDelimiterFilter would help you understand how to emit multiple terms with the same positions pretty easily. I've heard of this bag of word variants approach to indexing poor-quality OCR output before for findability reasons and I heard it works out OK. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 2, 2014 at 10:19 AM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Hello, Many of our indexed documents are scanned and OCR'ed documents. Unfortunately we were not able to improve much the OCR quality (less than 80% word accuracy) for various reasons, a fact which badly hurts the retrieval quality. As we use an open-source OCR, we think of changing every scanned term output to it's main possible variations to get a higher level of confidence. Is there any analyser that supports this kind of need or should I make up a syntax and analyser of my own, i.e the payload syntax? The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4 Thanks, Manuel
Re: Does Solr move documents between shards when the value of the shard key is updated ?
So - we do end up with two copies / versions of the same document (uniqueid) - one in each of the two shards - Is this a BUG or a FEATURE in Solr ? Have a follow up question - In case one were to attempt to delete the document -lets say usng the CloudSolrServer - deleteById() API - would that attempt to delete the document in both (or all) shards ? How would Solr determine which shard / shards to run the delete against ? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043p4145237.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Migration from Autonomy IDOL to SOLR
I know that this is an old thread, but I wanted to pass on some additional information in blatant self promotion. We've just completed an IDOL to Solr migration for our e commerce site with approximately 40 Million items and anywhere between 200,000 to 300,000 searches per day. I am documenting some lessons learned some some product discriminators here: http://engineering2success.blogspot.com/ -- View this message in context: http://lucene.472066.n3.nabble.com/Migration-from-Autonomy-IDOL-to-SOLR-tp3255377p4145247.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow QTimes - 5 seconds for Small sized Collections
This issue was finally resolved. Adding an explicit Host - IP address mapping on /etc/host file seemed to do the trick. The one strange thing is - before the host file entry was made - we were unable to simulate the 5 second delay from the linux shell by performing a simple nslookup host name. In any case - the issue now stands resolved - Thanks to all. On the other discussion item about the QTime in the SolrQueryResponse NOT matching the QTime in the Solr.log, here is what I found: 1. If the Query from CloudSolrServer hit the right node (i.e. contains the shard with the desired dataset), then the QTimes match 2. If the Query from CloudSolrServer hits a node (NodeX) that does NOT contain our data - then Solr routes the request to the right node (NodeY) to fetch the data. In such situations - QTime in logged in both nodes that the query passes through - albeit with different values. The QTime logged on NodeX matches what we see on SolrQueryResponse - and this time includes the time for inter-node communication between NodeX and NodeY. In essence this means that the QTime in SolrQueryResponse is NOT always a representation of the query time - but could include time spent for inter-node communication. P.S. All of the above statements were made in context of a sharding strategy to co-locate a single customer's document into a single shard. Here is a short wishlist based on the experience in debugging this issue: 1. Wish SolrQueryResponse could contain a list of node names / shard-replica names that a request passed through for processing the query (when debug is turned ON) 2. Wish SolrQueryResponse could provide a breakup of QTime on each of the individual nodes / shard-replicas - instead of returning a single value of QTime -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-QTimes-5-seconds-for-Small-sized-Collections-tp4143681p4145251.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customise score
Hi, Why did you use upper case? What happens when you use : sort=sum(... On Wednesday, July 2, 2014 6:23 PM, rachun rachun.c...@gmail.com wrote: Gora, firstly I would like thank you for your quick response. .../select?q=MacBooksort=SUM(base_score, score)+descwt=jsonindent=true I tried that but it didn't work and I got this error message error:{ msg:Can't determine a Sort Order (asc or desc) in sort spec 'SUM(base_score, score) desc', pos=15, code:400}} Best Regards, Chun -- View this message in context: http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145216.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customise score
I think the white space after the comma is the culprit. No white space is allowed in function queries that are embedded, such as in the sort parameter. -- Jack Krupansky -Original Message- From: Ahmet Arslan Sent: Wednesday, July 2, 2014 2:19 PM To: solr-user@lucene.apache.org Subject: Re: Customise score Hi, Why did you use upper case? What happens when you use : sort=sum(... On Wednesday, July 2, 2014 6:23 PM, rachun rachun.c...@gmail.com wrote: Gora, firstly I would like thank you for your quick response. .../select?q=MacBooksort=SUM(base_score, score)+descwt=jsonindent=true I tried that but it didn't work and I got this error message error:{ msg:Can't determine a Sort Order (asc or desc) in sort spec 'SUM(base_score, score) desc', pos=15, code:400}} Best Regards, Chun -- View this message in context: http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145216.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Migration from Autonomy IDOL to SOLR
Thanks for posting this. -- Jack Krupansky -Original Message- From: wrdrvr Sent: Wednesday, July 2, 2014 1:47 PM To: solr-user@lucene.apache.org Subject: Re: Migration from Autonomy IDOL to SOLR I know that this is an old thread, but I wanted to pass on some additional information in blatant self promotion. We've just completed an IDOL to Solr migration for our e commerce site with approximately 40 Million items and anywhere between 200,000 to 300,000 searches per day. I am documenting some lessons learned some some product discriminators here: http://engineering2success.blogspot.com/ -- View this message in context: http://lucene.472066.n3.nabble.com/Migration-from-Autonomy-IDOL-to-SOLR-tp3255377p4145247.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system
Hi, When we run Solr Map Reduce Indexer Tool ( https://github.com/markrmiller/solr-map-reduce-example), it generates indexes on HDFS The last stage is Go Live to merge the generated index to live SolrCloud index. If the live SolrCloud write index to local file system (rather than HDFS), the Go Live gives such error like this: 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge hdfs:// bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-0 into http://bdvs087.test.com:8983/solr 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error sending live merge command java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: directory '/opt/testdir/solr/node/hdfs:/ bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-1/data/index' does not exist at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233) at java.util.concurrent.FutureTask.get(FutureTask.java:94) at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126) at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867) at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(AccessController.java:310) at javax.security.auth.Subject.doAs(Subject.java:573) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: directory '/opt/testdir/solr/node/hdfs:/ bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-1/data/index' does not exist at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493) at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100) at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Any way to setup SolrCloud to write index to local file system, while allowing the Solr MapReduceIndexerTool's GoLive to merge index generated on HDFS to the SolrCloud? Thanks, Tom
Re: OCR - Saving multi-term position
Take a look at the synonym filter as well. I mean, basically that's exactly what you are doing - adding synonyms at each position. -- Jack Krupansky -Original Message- From: Manuel Le Normand Sent: Wednesday, July 2, 2014 12:57 PM To: solr-user@lucene.apache.org Subject: Re: OCR - Saving multi-term position Thanks for your answers Erick and Michael. The term confidence level is an OCR output metric which tells for every word what are the odds it's the actual scanned term. I wish the OCR prog to output all the suspected words that sum up to above ~90% of confidence it is the actual term instead of outputting a single word as default behaviour. I'm happy to hear this approach was used before, I will implement an analyser that indexes these terms in same position to enable positional queries. Hope it works on well. In case it does I will open up a Jira ticket for it. If anyone else has had experience with this use case I'd love hearing, Manuel On Wed, Jul 2, 2014 at 7:28 PM, Erick Erickson erickerick...@gmail.com wrote: Problem here is that you wind up with a zillion unique terms in your index, which may lead to performance issues, but you probably already know that :). I've seen situations where running it through a dictionary helps. That is, does each term in the OCR match some dictionary? Problem here is that it then de-values terms that don't happen to be in the dictionary, names for instance. But to answer your question: No, there really isn't a pre-built analysis chain that i know of that does this. Root issue is how to assign confidence? No clue for your specific domain. So payloads seem quite reasonable here. Happens there's a recent end-to-end example, see: http://searchhub.org/2014/06/13/end-to-end-payload-example-in-solr/ Best, Erick On Wed, Jul 2, 2014 at 7:58 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I don't have first hand knowledge of how you implement that, but I bet a look at the WordDelimiterFilter would help you understand how to emit multiple terms with the same positions pretty easily. I've heard of this bag of word variants approach to indexing poor-quality OCR output before for findability reasons and I heard it works out OK. Michael Della Bitta Applications Developer o: +1 646 532 3062 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinions https://twitter.com/Appinions | g+: plus.google.com/appinions https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.com http://www.appinions.com/ On Wed, Jul 2, 2014 at 10:19 AM, Manuel Le Normand manuel.lenorm...@gmail.com wrote: Hello, Many of our indexed documents are scanned and OCR'ed documents. Unfortunately we were not able to improve much the OCR quality (less than 80% word accuracy) for various reasons, a fact which badly hurts the retrieval quality. As we use an open-source OCR, we think of changing every scanned term output to it's main possible variations to get a higher level of confidence. Is there any analyser that supports this kind of need or should I make up a syntax and analyser of my own, i.e the payload syntax? The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4 Thanks, Manuel
Re: Customise score
Hi Ahmet, I also tried this .../select?q=MacBooksort=sum(base_score, score)+descwt=jsonindent=true I got the same error error:{ msg:Can't determine a Sort Order (asc or desc) in sort spec 'sum(base_score, score) desc', pos=15, code:400}} Best regards, Chun -- View this message in context: http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145320.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customise score
Hi Jack, I tried as you suggest .../select?q=MacBooksort=sum(base_score,score)+descwt=jsonindent=true but it didn't work and I got this error message error:{ msg:sort param could not be parsed as a query, and is not a field that exists in the index: sum(base_score,score), code:400}} so, when I try something like this .../select?q=MacBooksort=sum(base_score,base_score)+descwt=jsonindent=true it works fine. How to archive this, any idea? Best Regards, Chun -- View this message in context: http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Customise score
You probably don't have a field named score. That said, the Solr error message is not very useful at all! If you want to reference the document score, I don't think there is a direct way to do it, but you can indirectly by using the query function: .../select?q=MacBooksort=sum(base_score,query($q,0))+descwt=jsonindent=true -- Jack Krupansky -Original Message- From: rachun Sent: Wednesday, July 2, 2014 7:44 PM To: solr-user@lucene.apache.org Subject: Re: Customise score Hi Jack, I tried as you suggest .../select?q=MacBooksort=sum(base_score,score)+descwt=jsonindent=true but it didn't work and I got this error message error:{ msg:sort param could not be parsed as a query, and is not a field that exists in the index: sum(base_score,score), code:400}} so, when I try something like this .../select?q=MacBooksort=sum(base_score,base_score)+descwt=jsonindent=true it works fine. How to archive this, any idea? Best Regards, Chun -- View this message in context: http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145322.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: OCR - Saving multi-term position
Hi Manuel, I think OCR error correction is one of well-known NLP tasks. I'd thought it could be implemented in the past by using Lucene. This is a brief idea: 1. You have got a Lucene index. This existing index is made from correct (i.e. error free) documents that are same domain of OCR documents. 2. Tokenize OCR text by ShingleTokenizer. By ShingleTokenizer, you'll get: the quiok tlne quick the quick : 3. Search those phrase in the existing index. I think exact search (PhraseQuery) or FuzzyQuery can be worked. You should get the highest hit count when searching the quick among those phrases. Koji -- http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html (2014/07/02 7:19), Manuel Le Normand wrote: Hello, Many of our indexed documents are scanned and OCR'ed documents. Unfortunately we were not able to improve much the OCR quality (less than 80% word accuracy) for various reasons, a fact which badly hurts the retrieval quality. As we use an open-source OCR, we think of changing every scanned term output to it's main possible variations to get a higher level of confidence. Is there any analyser that supports this kind of need or should I make up a syntax and analyser of my own, i.e the payload syntax? The quick brown fox -- The|1 Tlne|1 quick|2 quiok|2 browm|3 brown|3 fox|4 Thanks, Manuel
Re: Does Solr move documents between shards when the value of the shard key is updated ?
bq: Is this a BUG or a FEATURE in Solr How about just the way it works? You've changed the route key with the same unique key, taking control of the routing. When you change that routing, how is Solr to know where the _old_ document lived? It would have to, say, query the entire cluster for any doc that had the given uniqueKey and delete it, something that'd be horribly slow. As to your follow-up question, I'm not totally sure. I believe the delete is sent to all shards, but why don't you test to see? Best, Erick On Wed, Jul 2, 2014 at 10:22 AM, IJ jay...@gmail.com wrote: So - we do end up with two copies / versions of the same document (uniqueid) - one in each of the two shards - Is this a BUG or a FEATURE in Solr ? Have a follow up question - In case one were to attempt to delete the document -lets say usng the CloudSolrServer - deleteById() API - would that attempt to delete the document in both (or all) shards ? How would Solr determine which shard / shards to run the delete against ? -- View this message in context: http://lucene.472066.n3.nabble.com/Does-Solr-move-documents-between-shards-when-the-value-of-the-shard-key-is-updated-tp4145043p4145237.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Map Reduce Indexer Tool GoLive to SolrCloud with index on local file system
How would the MapReduceIndexerTool (MRIT for short) find the local disk to write from HDFS to for each shard? All it has is the information in the Solr configs, which are usually relative paths on the local Solr machines, relative to SOLR_HOME. Which could be different on each node (that would be screwy, but possible). Permissions would also be a royal pain to get right You _can_ forego the --go-live option and copy from the HDFS nodes to your local drive and then execute the mergeIndexes command, see: https://cwiki.apache.org/confluence/display/solr/Merging+Indexes Note that there is the MergeIndexTool, but there's also the Core Admin command. The sub-indexes are in a partition in HDFS and numbered sequentially. Best, Erick On Wed, Jul 2, 2014 at 3:23 PM, Tom Chen tomchen1...@gmail.com wrote: Hi, When we run Solr Map Reduce Indexer Tool ( https://github.com/markrmiller/solr-map-reduce-example), it generates indexes on HDFS The last stage is Go Live to merge the generated index to live SolrCloud index. If the live SolrCloud write index to local file system (rather than HDFS), the Go Live gives such error like this: 2014-07-02 13:41:01,518 INFO org.apache.solr.hadoop.GoLive: Live merge hdfs:// bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-0 into http://bdvs087.test.com:8983/solr 2014-07-02 13:41:01,796 ERROR org.apache.solr.hadoop.GoLive: Error sending live merge command java.util.concurrent.ExecutionException: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: directory '/opt/testdir/solr/node/hdfs:/ bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-1/data/index' does not exist at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:233) at java.util.concurrent.FutureTask.get(FutureTask.java:94) at org.apache.solr.hadoop.GoLive.goLive(GoLive.java:126) at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:867) at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:609) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:596) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:491) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:434) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(AccessController.java:310) at javax.security.auth.Subject.doAs(Subject.java:573) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1502) at org.apache.hadoop.mapred.Child.main(Child.java:249) Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: directory '/opt/testdir/solr/node/hdfs:/ bdvs086.test.com:9000/tmp/088-140618120223665-oozie-oozi-W/results/part-1/data/index' does not exist at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:493) at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:100) at org.apache.solr.hadoop.GoLive$1.call(GoLive.java:89) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:452) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:314) at java.util.concurrent.FutureTask.run(FutureTask.java:149) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:897) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:919) at java.lang.Thread.run(Thread.java:738) Any way to setup SolrCloud to write index to local file system, while allowing the Solr MapReduceIndexerTool's GoLive to merge index generated on HDFS to the SolrCloud? Thanks, Tom
Re: CollapsingQParserPlugin throws Exception when useFilterForSortedQuery=true
Created the jira .. https://issues.apache.org/jira/browse/SOLR-6222 On 30 June 2014 23:53, Joel Bernstein joels...@gmail.com wrote: Sure, go ahead create the ticket. I think there is more we can here as well. I suspect we can get the CollapsingQParserPlugin to work with useFilterForSortedQuery=true if scoring is not needed for the collapse. I'll take a closer look at this. Joel Bernstein Search Engineer at Heliosearch On Mon, Jun 30, 2014 at 1:43 AM, Umesh Prasad umesh.i...@gmail.com wrote: Hi Joel, Thanks a lot for clarification .. An error message would indeed be a good thing .. Should I open a jira item for same ? On 28 June 2014 19:08, Joel Bernstein joels...@gmail.com wrote: OK, I see the problem. When you use useFilterForSortedQuery true /useFilterForSortedQuery Solr builds a docSet in a way that seems to be incompatible with the CollapsingQParserPlugin. With useFilterForSortedQuery true /useFilterForSortedQuery, Solr doesn't run the main query again when collecting the DocSet. The getDocSetScore() method is expecting the main query to present, because the CollapsingQParserPlugin may need the scores generated from the main query, to select the group head. I think trying to make useFilterForSortedQuery true /useFilterForSortedQuery compatible with CollapsingQParsePlugin is probably not possible. So, a nice error message would be a good thing. Joel Bernstein Search Engineer at Heliosearch On Tue, Jun 24, 2014 at 3:31 AM, Umesh Prasad umesh.i...@gmail.com wrote: Hi , Found another bug with CollapsignQParserPlugin. Not a critical one. It throws an exception when used with useFilterForSortedQuery true /useFilterForSortedQuery Patch attached (against 4.8.1 but reproducible in other branches also) 518 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s%7DdefType=edismaxbf=field%28test_ti%29} hits=2 status=0 QTime=99 4557 T11 C0 oasc.SolrCore.execute [collection1] webapp=null path=null params={q=*%3A*fq=%7B%21collapse+field%3Dgroup_s+nullPolicy%3Dexpand+min%3Dtest_tf%7DdefType=edismaxbf=field%28test_ti%29sort=} hits=4 status=0 QTime=15 4587 T11 C0 oasc.SolrException.log ERROR java.lang.UnsupportedOperationException: Query does not implement createWeight at org.apache.lucene.search.Query.createWeight(Query.java:80) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.getDocSetScore(SolrIndexSearcher.java:879) at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:902) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1381) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:478) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:461) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.util.TestHarness.query(TestHarness.java:295) at org.apache.solr.util.TestHarness.query(TestHarness.java:278) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:676) at org.apache.solr.SolrTestCaseJ4.assertQ(SolrTestCaseJ4.java:669) at org.apache.solr.search.TestCollapseQParserPlugin.testCollapseQueries(TestCollapseQParserPlugin.java:106) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at
RE: Memory Leaks in solr 4.8.1
We reload at interval of 6/7 days and restart may be in 15/18 days if the response becomes too slow On Jul 2, 2014 7:09 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi, you can safely ignore this, it is shutting down anyway. Just don't reload the app a lot of times without actually restarting Tomcat. -Original message- From:Aman Tandon amantandon...@gmail.com Sent: Wednesday 2nd July 2014 7:22 To: solr-user@lucene.apache.org Subject: Memory Leaks in solr 4.8.1 Hi, When i am shutting down the solr i am gettng the Memory Leaks error in logs. Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader checkThreadLocalMapForLeaks SEVERE: The web application [/solr] created a ThreadLocal with key of type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value [org.apache.solr.schema.DateField$ThreadLocalDateFormat@1d987b2]) and a value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak. Please check. With Regards Aman Tandon
Re: How to integrate nlp in solr
Thanks pranab, I am unfamiliar with payloads, can you provide some info about payload and how they are helpful in nlp On Jul 2, 2014 7:41 PM, parnab kumar parnab.2...@gmail.com wrote: Aman, I feel focusing on Question-Answering and Information Extraction components of NLP should help you achieve what you are looking for. Go through this book *Taming Text * (http://www.manning.com/ingersoll/ ) . Most of your queries should be answered including details on implementation and sample source codes. To state naively : NLP tools gives you the power to extract or interpret knowledge from text, which you basically store in the lucene index in form of fields or store along with the terms using payloads. During query processing time, you similarly gather additional knowledge from the query (using techniques like query expansion, relevance feedback, or ontologies ) and simply map those knowledge with the knowledge gained from the text. Its an effort to move to semantic retrieval rather than simple term matching. Thanks, Parnab On Wed, Jul 2, 2014 at 6:29 AM, Aman Tandon amantandon...@gmail.com wrote: Hi Alex, Thanks alex, one more thing i want to ask that so do we need to add the extra fields for those entities, e.g. Item (bags), color (blue), etc. If some how i managed to implement this nlp then i will definitely publish it on my blog :) With Regards Aman Tandon On Wed, Jul 2, 2014 at 10:34 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Not from me, no. I don't have any real examples for this ready. I suspect the path beyond the basics is VERY dependent on your data and your business requirements. I would start from thinking how would YOU (as a human) do that match. Where does the 'blue' and 'color' and 'college' and 'bags' come from. Then, figuring out what is required for Solr to know to look there. NLP is not magic, just advanced technology. You need to know where you are going to get there. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Wed, Jul 2, 2014 at 11:35 AM, Aman Tandon amantandon...@gmail.com wrote: Any help here With Regards Aman Tandon On Mon, Jun 30, 2014 at 11:00 PM, Aman Tandon amantandon...@gmail.com wrote: Hi Alex, I was try to get knowledge from these tutorials http://www.slideshare.net/teofili/natural-language-search-in-solr https://wiki.apache.org/solr/OpenNLP: this one is kinda bit explaining but the real demo is not present. e.g. query: I want blue color college bags, then how using nlp it will work and how it will search, there is no such brief explanation out there, i will be thankful to you if you can help me in this. With Regards Aman Tandon On Mon, Jun 30, 2014 at 6:38 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: On Sun, Jun 29, 2014 at 10:19 PM, Aman Tandon amantandon...@gmail.com wrote: the appropriate results What are those specifically? You need to be a bit more precise about what you are trying to achieve. Otherwise, there are too many NLP branches and too many approaches. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
Re: Streaming large updates with SolrJ
: Now that I think about it, though, is there a way to use the Update Xml : messages with something akin to the cloud solr server? I only see examples : posting to actual Solr instances, but we really need to be able to take : advantage of the zookeepers to send our updates to the appropriate servers. Part of your confusion may be that there are 2 different way of leveraging the SolrServer APIs (either CloudSolrServer, or any other SolrServer implementation)... * syntactic sugar apis like SolrServer.add(...) which require SolrInputDocuments * the lower level methods like SolrRequest.process(solrServer) ...with the later, you can subclass AbstractUpdateRequest and implement getContentStreams() to send whatever (lazy constructed) stream of bytes you want to Solr. Altenatively: you could conider subclassing SolrInputField with something thta knows how to lazy fetch the data you want to stream across the wire, and then (unless i'm missing something?) you can still use the sugar APIs with SolrInputDocuments but only individual field values will need to exist in RAM at any one time (as the BinaryWriter or XmlWriter calls SolrInputField.getValues() on your custom class to stream over the wire) However: if you are using SolrCloud, none of this will help you work arround the previuosly mentioned SOLR-6199, which affects how much RAM Solr needs to use on the server side when forwarding docs arround to replicas. -Hoss http://www.lucidworks.com/
schema / config file names
Is it required for the schema.xml and solrconfig.xml to have those exact filenames? Can I alias schema.xml to foo.xml in some way, for example? Thanks.
Re: schema / config file names
: Is it required for the schema.xml and solrconfig.xml to have those exact : filenames? It's an extremelely good idea ... but strictly speaking no... https://cwiki.apache.org/confluence/display/solr/CoreAdminHandler+Parameters+and+Usage#CoreAdminHandlerParametersandUsage-CREATE This smells like an XY Problem though ... please explain *why* you care what these file names are, and why you want them to be different? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss http://www.lucidworks.com/
RE: Memory Leaks in solr 4.8.1
This is a long standing issue in solr, that has some suggested fixes (see jira comments), but no one has been seriously afected by it enough for anyone to invest time in trying to improve it... https://issues.apache.org/jira/browse/SOLR-2357 In general, the fact that Solr is moving away from being a webapp, and towards being a stand alone java application, makes it even less likeley that this will ever really affect anyone. : Date: Thu, 3 Jul 2014 07:37:03 +0530 : From: Aman Tandon amantandon...@gmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: RE: Memory Leaks in solr 4.8.1 : : We reload at interval of 6/7 days and restart may be in 15/18 days if the : response becomes too slow : On Jul 2, 2014 7:09 PM, Markus Jelsma markus.jel...@openindex.io wrote: : : Hi, you can safely ignore this, it is shutting down anyway. Just don't : reload the app a lot of times without actually restarting Tomcat. : : -Original message- : From:Aman Tandon amantandon...@gmail.com : Sent: Wednesday 2nd July 2014 7:22 : To: solr-user@lucene.apache.org : Subject: Memory Leaks in solr 4.8.1 : : Hi, : : When i am shutting down the solr i am gettng the Memory Leaks error in : logs. : : Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader :checkThreadLocalMapForLeaks :SEVERE: The web application [/solr] created a ThreadLocal with key of : type :[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value :[org.apache.solr.schema.DateField$ThreadLocalDateFormat@1d987b2]) and : a :value of type : [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] :(value : [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a]) :but failed to remove it when the web application was stopped. Threads : are :going to be renewed over time to try and avoid a probable memory leak. : : : Please check. : With Regards : Aman Tandon : : : -Hoss http://www.lucidworks.com/
Re: schema / config file names
That's good to know. I don't actually want to do it. I want to see just how much of Solr's schema and configuration can be reliably validated. The error messages I've been getting back for misconfigured setups are less than ideal at times. But it should be easy for me to validate certain things without talking to Solr at all, like the existence of the schema in ZK, that it's a valid XML file, etc. Is there an XSD or any kind of validation for the schema / solrconfig? There's an unresolved Jira issue in SOLR-1758 that seems promising but never got merged. Thanks. From: Chris Hostetter hossman_luc...@fucit.org To: solr-user@lucene.apache.org, Date: 07/02/2014 10:22 PM Subject:Re: schema / config file names : Is it required for the schema.xml and solrconfig.xml to have those exact : filenames? It's an extremelely good idea ... but strictly speaking no... https://cwiki.apache.org/confluence/display/solr/CoreAdminHandler+Parameters+and+Usage#CoreAdminHandlerParametersandUsage-CREATE This smells like an XY Problem though ... please explain *why* you care what these file names are, and why you want them to be different? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss http://www.lucidworks.com/
Re: Memory Leaks in solr 4.8.1
Thanks chris, independent of servlet container is good. Eagerly waiting for solr 5 :) With Regards Aman Tandon On Thu, Jul 3, 2014 at 7:58 AM, Chris Hostetter hossman_luc...@fucit.org wrote: This is a long standing issue in solr, that has some suggested fixes (see jira comments), but no one has been seriously afected by it enough for anyone to invest time in trying to improve it... https://issues.apache.org/jira/browse/SOLR-2357 In general, the fact that Solr is moving away from being a webapp, and towards being a stand alone java application, makes it even less likeley that this will ever really affect anyone. : Date: Thu, 3 Jul 2014 07:37:03 +0530 : From: Aman Tandon amantandon...@gmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: RE: Memory Leaks in solr 4.8.1 : : We reload at interval of 6/7 days and restart may be in 15/18 days if the : response becomes too slow : On Jul 2, 2014 7:09 PM, Markus Jelsma markus.jel...@openindex.io wrote: : : Hi, you can safely ignore this, it is shutting down anyway. Just don't : reload the app a lot of times without actually restarting Tomcat. : : -Original message- : From:Aman Tandon amantandon...@gmail.com : Sent: Wednesday 2nd July 2014 7:22 : To: solr-user@lucene.apache.org : Subject: Memory Leaks in solr 4.8.1 : : Hi, : : When i am shutting down the solr i am gettng the Memory Leaks error in : logs. : : Jul 02, 2014 10:49:10 AM org.apache.catalina.loader.WebappClassLoader :checkThreadLocalMapForLeaks :SEVERE: The web application [/solr] created a ThreadLocal with key of : type :[org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value :[org.apache.solr.schema.DateField$ThreadLocalDateFormat@1d987b2]) and : a :value of type : [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat] :(value : [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a ]) :but failed to remove it when the web application was stopped. Threads : are :going to be renewed over time to try and avoid a probable memory leak. : : : Please check. : With Regards : Aman Tandon : : : -Hoss http://www.lucidworks.com/
Re: Slow QTimes - 5 seconds for Small sized Collections
On 7/2/2014 11:55 AM, IJ wrote: Here is a short wishlist based on the experience in debugging this issue: 1. Wish SolrQueryResponse could contain a list of node names / shard-replica names that a request passed through for processing the query (when debug is turned ON) 2. Wish SolrQueryResponse could provide a breakup of QTime on each of the individual nodes / shard-replicas - instead of returning a single value of QTime If you have a new enough Solr version, you can include a shards.info parameter set to true, and you will get some information from the communication with each shard. I set this parameter to true in my request handler defaults. I have seen some per-shard info in the debug as well, but I do not know whether this is influenced by shards.info. It looks like this parameter was added in version 4.0. It probably has been enhanced in later releases. Naturally I would recommend that you run the latest release. https://issues.apache.org/jira/browse/SOLR-3134 Thanks, Shawn
Re: schema / config file names
Chris, We have actually done that. Our requirement was basically have a single installation of Solr to assume different roles and each role had its own changes for optimisation done on solrconfig.xml and schema.xml When we start a role we basically adapt to file role_solrconfig.xml and role_schema.xml and then fire up cores for each of these role. Is there a better way to solve this issue? Thanks Tirthankar On 02-Jul-2014, at 10:22 pm, Chris Hostetter hossman_luc...@fucit.org wrote: : Is it required for the schema.xml and solrconfig.xml to have those exact : filenames? It's an extremelely good idea ... but strictly speaking no... https://cwiki.apache.org/confluence/display/solr/CoreAdminHandler+Parameters+and+Usage#CoreAdminHandlerParametersandUsage-CREATE This smells like an XY Problem though ... please explain *why* you care what these file names are, and why you want them to be different? https://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an XY Problem ... that is: you are dealing with X, you are assuming Y will help you, and you are asking about Y without giving more details about the X so that we can understand the full issue. Perhaps the best solution doesn't involve Y at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss http://www.lucidworks.com/ ***Legal Disclaimer*** This communication may contain confidential and privileged material for the sole use of the intended recipient. Any unauthorized review, use or distribution by others is strictly prohibited. If you have received the message by mistake, please advise the sender by reply email and delete the message. Thank you. **
Re: Customise score
Hi, Jack, Thank you very much for you solution its works! I'm sorry that I didn't make it clear at the beginning for 'score' which i mean document score (solr produce it at query time). Thank you very much for all of you, Chun. -- View this message in context: http://lucene.472066.n3.nabble.com/Customise-score-tp4145214p4145359.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Out of Memory when i downdload 5 Million records from sqlserver to solr
On 7/1/2014 4:57 AM, mskeerthi wrote: I have to download my 5 million records from sqlserver to solr into one index. I am getting below exception after downloading 1 Million records. Is there any configuration or another to download from sqlserver to solr. Below is the exception i am getting in solr: org.apache.solr.common.SolrException; auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit JDBC has a bad habit of defaulting to a mode where it will try to load the entire SQL result set into RAM. Different JDBC drivers have different ways of dealing with this problem. For Microsoft SQL Server, here's a guide: https://wiki.apache.org/solr/DataImportHandlerFaq#I.27m_using_DataImportHandler_with_MS_SQL_Server_database_with_sqljdbc_driver._DataImportHandler_is_going_out_of_memory._I_tried_adjustng_the_batchSize_values_but_they_don.27t_seem_to_make_any_difference._How_do_I_fix_this.3F If you have trouble with that really long URL in your mail client, just visit the main FAQ page and click on the link for SQL Server: https://wiki.apache.org/solr/DataImportHandlerFaq Thanks, Shawn
Re: External File Field eating memory
Any replies ?? On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi Team, I have recently implemented EFF in solr. There are about 1.5 lacs(unsorted) values in the external file. After this implementation, the server has become slow. The solr query time has also increased. Can anybody confirm me if these issues are because of this implementation. Is that memory does EFF eats up? Regards Kamal Kishore
Re: External File Field eating memory
How would we know where the problem is? It's your custom implementation. And it's your own documents, so we don't know field sizes/etc. And it's your own metric (ok, Indian metric, but lacs are fairly unknown outside of India). Seriously though, have you tried using any memory profilers and running with/without your EFF implementation or with just dummy return result? Java 8 has some new FlightRecorder and other tools built-in. That would tell you where the leak/usage might be. I think this kind of question, you need to really dig deep in yourself first. Have you tried using EFF but a primitive one that does not load anything from file? Is there performance impact? If not, then the issue is most likely in your code. Maybe it does not shutdown properly when Indexer is reloaded or similar. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Jul 3, 2014 at 12:23 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Any replies ?? On Sat, Jun 28, 2014 at 5:34 PM, Kamal Kishore Aggarwal kkroyal@gmail.com wrote: Hi Team, I have recently implemented EFF in solr. There are about 1.5 lacs(unsorted) values in the external file. After this implementation, the server has become slow. The solr query time has also increased. Can anybody confirm me if these issues are because of this implementation. Is that memory does EFF eats up? Regards Kamal Kishore