RE: Document row in solr Result
Hi Eric, If you want a query informing one customer of its product row at any given time, the easiest way is to filter on submission date greater than this customer's and return the result count. If you have 500 products with an earlier submission date, your row number is 501. Hope this helps, Pierre -Message d'origine- De : Eric Grobler [mailto:impalah...@googlemail.com] Envoyé : lundi 12 septembre 2011 11:00 À : solr-user@lucene.apache.org Objet : Re: Document row in solr Result Hi Manish, Thank you for your time. For upselling reasons I want to inform the customer that: your product is on the last page of the search result. However, click here to put your product back on the first page... Here is an example: I have a phone with productid 635001 in the iphone category. When I sort this category by submissiondate this product will be near the end of the result (on row 9863 in this example). At the moment I have to scan nearly 1 rows in the client to determine the position of this product. Is there a more efficient way to find the position of a specific document in a resultset without returning the full result? q=category:iphone fl=productid sort=submissiondate desc rows=1 row productid submissiondate 1 6565692011-09-12 08:12 2 6564682011-09-12 08:03 3 6562012011-09-11 23:41 ... 9863 6350012011-08-11 17:22 ... 9922 6344232011-08-10 21:51 Regards Ericz On Mon, Sep 12, 2011 at 9:38 AM, Manish Bafna manish.bafna...@gmail.comwrote: You might not be able to find the row index. Can you post your query in detail. The kind of inputs and outputs you are expecting. On Mon, Sep 12, 2011 at 2:01 PM, Eric Grobler impalah...@googlemail.com wrote: Hi Manish, Thanks for your reply - but how will that return me the row index of the original query. Regards Ericz On Mon, Sep 12, 2011 at 9:24 AM, Manish Bafna manish.bafna...@gmail.com wrote: fq - filter query parameter searches within the results. On Mon, Sep 12, 2011 at 1:49 PM, Eric Grobler impalah...@googlemail.com wrote: Hi Solr experts, If you have a site with products sorted by submission date, the product of a customer might be on page 1 on the first day, and then move down to page x as other customers submit newer entries. To find the row of a product you can of course run the query and loop through the result until you find the specific productid like: q=category:myproducttype fl=productid sort=submissiondate desc rows=1 But is there perhaps a more efficient way to do this? Maybe a special syntax to search within the result. Thanks Ericz
RE: What is the different?
Hi, Have you check the queries by using the debugQuery=true parameter ? This could give some hints of what is searched in both cases. Pierre -Message d'origine- De : cnyee [mailto:yeec...@gmail.com] Envoyé : vendredi 22 juillet 2011 05:14 À : solr-user@lucene.apache.org Objet : What is the different? Hi, I have two queries: (1) q = (change management) (2) q = (change management) AND domain_ids:(0^1.3 OR 1) The purpose of the (2) is to boost the records with domain_ids=0. In my database all records has domain_ids = 0 or 1, so domains_ids:(0 or 1) will always returns the full database. Now my questions is - query (2) returns 5000+ results, but query (1) returns 700+ results. Can somebody enlighten me on what is the reasons behind such a vast different in number of results? Many thanks in advance. Yee -- View this message in context: http://lucene.472066.n3.nabble.com/What-is-the-different-tp3190278p3190278.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: commit time and lock
Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
RE: commit time and lock
Solr will response for search during optimization, but commits will have to wait the end of the optimization process. During optimization a new index is generated on disk by merging every single file of the current index into one big file, so you're server will be busy, especially regarding disk access. This may alter your response time and has very negative effect on the replication of index if you have a master/slave architecture. I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx, so maybe you don't really need optimization. What version of solr are you using ? Maybe someone can point toward a relevant link about optimization other than solr wiki http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : vendredi 22 juillet 2011 12:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Thanks for clarity. One more thing I want to know about optimization. Right now I am planning to optimize the server in 24 hour. Optimization is also time taking ( last time took around 13 minutes), so I want to know that : 1. when optimization is under process that time will solr server response or not? 2. if server will not response then how to do optimization of server fast or other way to do optimization so our user will not have to wait to finished optimization process. regards Jonty On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
RE: commit time and lock
Hi Mark I've read that in a thread title Weird optimize performance degradation, where Erick Erickson states that Older versions of Lucene would search faster on an optimized index, but this is no longer necessary., and more recently in a thread you initiated a month ago Question about optimization. I'll also be very interested if anyone had a more precise idea/datas of benefits and tradeoff of optimize vs merge ... Pierre -Message d'origine- De : Marc SCHNEIDER [mailto:marc.schneide...@gmail.com] Envoyé : vendredi 22 juillet 2011 15:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Hello, Pierre, can you tell us where you read that? I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx Marc. On Fri, Jul 22, 2011 at 2:10 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Solr will response for search during optimization, but commits will have to wait the end of the optimization process. During optimization a new index is generated on disk by merging every single file of the current index into one big file, so you're server will be busy, especially regarding disk access. This may alter your response time and has very negative effect on the replication of index if you have a master/slave architecture. I've read here that optimization is not always a requirement to have an efficient index, due to some low level changes in lucene 3.xx, so maybe you don't really need optimization. What version of solr are you using ? Maybe someone can point toward a relevant link about optimization other than solr wiki http://wiki.apache.org/solr/SolrPerformanceFactors#Optimization_Considerations Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : vendredi 22 juillet 2011 12:45 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Thanks for clarity. One more thing I want to know about optimization. Right now I am planning to optimize the server in 24 hour. Optimization is also time taking ( last time took around 13 minutes), so I want to know that : 1. when optimization is under process that time will solr server response or not? 2. if server will not response then how to do optimization of server fast or other way to do optimization so our user will not have to wait to finished optimization process. regards Jonty On Fri, Jul 22, 2011 at 2:44 PM, Pierre GOSSE pierre.go...@arisem.com wrote: Solr still respond to search queries during commit, only new indexations requests will have to wait (until end of commit?). So I don't think your users will experience increased response time during commits (unless your server is much undersized). Pierre -Message d'origine- De : Jonty Rhods [mailto:jonty.rh...@gmail.com] Envoyé : jeudi 21 juillet 2011 20:27 À : solr-user@lucene.apache.org Objet : Re: commit time and lock Actually i m worried about the response time. i k commiting around 500 docs in every 5 minutes. as i know,correct me if i m wrong; at the time of commiting solr server stop responding. my concern is how to minimize the response time so user not need to wait. or any other logic will require for my case. please suggest. regards jonty On Tuesday, June 21, 2011, Erick Erickson erickerick...@gmail.com wrote: What is it you want help with? You haven't told us what the problem you're trying to solve is. Are you asking how to speed up indexing? What have you tried? Have you looked at: http://wiki.apache.org/solr/FAQ#Performance? Best Erick On Tue, Jun 21, 2011 at 2:16 AM, Jonty Rhods jonty.rh...@gmail.com wrote: I am using solrj to index the data. I have around 5 docs indexed. As at the time of commit due to lock server stop giving response so I was calculating commit time: double starttemp = System.currentTimeMillis(); server.add(docs); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); It taking around 9 second to commit the 5000 docs with 15 fields. However I am not confirm the lock time of index whether it is start since server.add(docs); time or server.commit(); time only. If I am changing from above to following server.add(docs); double starttemp = System.currentTimeMillis(); server.commit(); System.out.println(total time in commit = + (System.currentTimeMillis() - starttemp)/1000); then commit time becomes less then 1 second. I am not sure which one is right. please help. regards Jonty
RE: commit time and lock
Merging does not happen often enough to keep deleted documents to a low enough count ? Maybe there's a need to have partial optimization available in solr, meaning that segment with too much deleted document could be copied to a new file without unnecessary datas. That way cleaning deleted datas could be compatible with having light replications. I'm worried by this idea of deleted documents influencing relevance scores, any pointer to how important this influence may be ? Pierre -Message d'origine- De : Shawn Heisey [mailto:s...@elyograg.org] Envoyé : vendredi 22 juillet 2011 16:42 À : solr-user@lucene.apache.org Objet : Re: commit time and lock On 7/22/2011 8:23 AM, Pierre GOSSE wrote: I've read that in a thread title Weird optimize performance degradation, where Erick Erickson states that Older versions of Lucene would search faster on an optimized index, but this is no longer necessary., and more recently in a thread you initiated a month ago Question about optimization. I'll also be very interested if anyone had a more precise idea/datas of benefits and tradeoff of optimize vs merge ... My most recent testing has been with Solr 3.2.0. I have noticed some speedup after optimizing an index, but the gain is not earth-shattering. My index consists of 7 shards. One of them is small, and receives all new documents every two minutes. The others are large, and aside from deletes, are mostly static. Once a day, the oldest data is distributed from the small shard to its proper place in the other six shards. The small shard is optimized once an hour, and usually takes less than a minute. I optimize one large shard every day, so each one gets optimized once every six days. That optimize takes 10-15 minutes. The only reason that I optimize is to remove deleted documents, whatever speedup I get is just icing on the cake. Deleted documents take up space and continue to influence the relevance scoring of queries, so I want to remove them. Thanks, Shawn
RE: searching a subset of SOLR index
The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache. If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 11:10 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
RE: searching a subset of SOLR index
From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ? Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 13:14 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index I have got two applications 1. website The website will enable any user to search the document repository , and the set they search on is known as website presentable 2. windows service The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set is universal set of documents in the doc repository including the website presentable. Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs. The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request. Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ? I have also read about multiple ports for listening request from different apps , can this be used. Regards, JAME VAALET -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 3:52 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache. If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 11:10 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
RE: searching a subset of SOLR index
It is redundancy. You have to balance the cost of redundancy with the cost in performance with your web index requested by your windows service. If your windows service is not too aggressive in its requests, go for shards. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 15:05 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index But incase the website docs contribute around 50 % of the entire docs , why to recreate the indexes . don't you think its redundancy ? Can two web apps (solr instances ) share a single index file to search on it without interfering each other Regards, JAME VAALET Software Developer EXT :8108 Capital IQ -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 5:12 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ? Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 13:14 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index I have got two applications 1. website The website will enable any user to search the document repository , and the set they search on is known as website presentable 2. windows service The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set is universal set of documents in the doc repository including the website presentable. Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs. The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request. Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ? I have also read about multiple ports for listening request from different apps , can this be used. Regards, JAME VAALET -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 3:52 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache. If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 11:10 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
RE: Multiple indexes
I think there are reasons to use seperate indexes for each document type but do combined searches on these indexes (for example if you need separate TFs for each document type). I wonder if in this precise case it wouldn't be pertinent to have a single index with the various document types each having each their own fields set. Isn't TF calculated field by field ?
RE: Document match with no highlight
In WordDelimiterFilter the parameters catenateNumbers, catenateWords, catenateAlls are set to 1. This parameters adds overlapping tokens which could explain that you meet the bug described in the jira issue I mentioned. As I understand WordDelimiterFilter : 0176 R3 1.5 TO should we tokenized with tokens R3 overlapping with R and 3, and 15 overlapping with 1 and 5 This parmeters are set to 0 for query, but having them set to 1 should not correct your problem unless you search for R3 1.5. I think you have to either - set this parameters to 0 in index, but your query won't match anymore - wait for correction to be released in a new solr version, - use solr trunk, - or backport the modifications in the lucene-highlighter version you use. I did a backport for solr 1.4.1 since I won't move to 3.0 until some time, so please ask if you have question about how to do this. Pierre -Message d'origine- De : Phong Dais [mailto:phong.gd...@gmail.com] Envoyé : jeudi 12 mai 2011 20:06 À : solr-user@lucene.apache.org Objet : Re: Document match with no highlight Hi, I read the link provided and I'll need some time to digest what it is saying. Here's my text fieldtype. fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimeterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimeterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer fieldtype Also, I figured out what value in DOC_TEXT cause this issue to occur. With a DOC_TEXT of (without the quotes): 0176 R3 1.5 TO Searching for 3 1 15 returns a match with empty highlight. Searching for 3 1 15~1 returns a match with highlight. Can anyone see anything that I'm missing? Thanks, P. On Thu, May 12, 2011 at 12:27 PM, Pierre GOSSE pierre.go...@arisem.comwrote: Since you're using the standard text field, this should NOT be you're case. Sorry, for the missing NOT in previous phrase. You should have the same issue given what you said, but still, it sound very similar. Are you sure your fieldtype text has nothing special ? a tokenizer or filter that could add some token in your indexed text but not in your query, like for example a WordDelimiter present in index and not query ? Pierre -Message d'origine- De : Pierre GOSSE [mailto:pierre.go...@arisem.com] Envoyé : jeudi 12 mai 2011 18:21 À : solr-user@lucene.apache.org Objet : RE: Document match with no highlight In fact if I did 3 1 15~1 I do get snipet also. Strange, I had a very similar problem, but with overlapping tokens. Since you're using the standard text field, this should be you're case. Maybe you could have a look at this issue, since it sound very familiar to me : https://issues.apache.org/jira/browse/LUCENE-3087 Pierre -Message d'origine- De : Phong Dais [mailto:phong.gd...@gmail.com] Envoyé : jeudi 12 mai 2011 17:26 À : solr-user@lucene.apache.org Objet : Re: Document match with no highlight Hi, field name=DOC_TEXT type=text indexed=true stored=true/ The type text is the default one that came with the default solr 1.4 install w.o any modifications. If I remove the quotes I do get snipets. In fact if I did 3 1 15~1 I do get snipet also. Hope that helps. P. On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote: URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name
RE: Document match with no highlight
In fact if I did 3 1 15~1 I do get snipet also. Strange, I had a very similar problem, but with overlapping tokens. Since you're using the standard text field, this should be you're case. Maybe you could have a look at this issue, since it sound very familiar to me : https://issues.apache.org/jira/browse/LUCENE-3087 Pierre -Message d'origine- De : Phong Dais [mailto:phong.gd...@gmail.com] Envoyé : jeudi 12 mai 2011 17:26 À : solr-user@lucene.apache.org Objet : Re: Document match with no highlight Hi, field name=DOC_TEXT type=text indexed=true stored=true/ The type text is the default one that came with the default solr 1.4 install w.o any modifications. If I remove the quotes I do get snipets. In fact if I did 3 1 15~1 I do get snipet also. Hope that helps. P. On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote: URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response Nothing looks suspicious. Can you provide two things more; fieldType of DOC_TEXT and field definition of DOC_TEXT. Also do you get snippet from the same doc, when you remove quotes from your query?
RE: Document match with no highlight
Since you're using the standard text field, this should NOT be you're case. Sorry, for the missing NOT in previous phrase. You should have the same issue given what you said, but still, it sound very similar. Are you sure your fieldtype text has nothing special ? a tokenizer or filter that could add some token in your indexed text but not in your query, like for example a WordDelimiter present in index and not query ? Pierre -Message d'origine- De : Pierre GOSSE [mailto:pierre.go...@arisem.com] Envoyé : jeudi 12 mai 2011 18:21 À : solr-user@lucene.apache.org Objet : RE: Document match with no highlight In fact if I did 3 1 15~1 I do get snipet also. Strange, I had a very similar problem, but with overlapping tokens. Since you're using the standard text field, this should be you're case. Maybe you could have a look at this issue, since it sound very familiar to me : https://issues.apache.org/jira/browse/LUCENE-3087 Pierre -Message d'origine- De : Phong Dais [mailto:phong.gd...@gmail.com] Envoyé : jeudi 12 mai 2011 17:26 À : solr-user@lucene.apache.org Objet : Re: Document match with no highlight Hi, field name=DOC_TEXT type=text indexed=true stored=true/ The type text is the default one that came with the default solr 1.4 install w.o any modifications. If I remove the quotes I do get snipets. In fact if I did 3 1 15~1 I do get snipet also. Hope that helps. P. On Thu, May 12, 2011 at 9:09 AM, Ahmet Arslan iori...@yahoo.com wrote: URL: http://localhost:8983/solr/select?indent=onversion=2.2q=DOC_TEXT%3A%223+1+15%22fq=start=0 rows=10fl=DOC_TEXT%2Cscoreqt=standardwt=standarddebugQuery=onexplainOther=hl=onhl.fl=DOC_TEXThl.maxAnalyzedChars=-1 XML: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime19/int lst name=params str name=explainOther/ str name=indenton/str str name=hl.flDOC_TEXT/str str name=wtstandard/str str name=hl.maxAnalyzedChars-1/str str name=hlon/str str name=rows10/str str name=version2.2/str str name=debugQueryon/str str name=flDOC_TEXT,score/str str name=start0/str str name=qDOC_TEXT:3 1 15/str str name=qtstandard/str str name=fq/ /lst /lst result name=response numFound='1 start=0 maxScore=0.035959315 doc float name=score0.035959315/float arr name=DOC_TEXTstr ... /str/arr doc /result lst name=highlighting lst name=123456/ /lst lst name=debug str name=rawquerystringDOC_TEXT:3 1 15/str str name=querystringDOC_TEXT:3 1 15/str str name=parsedqueryPhraseQuery(DOC_TEXT:3 1 15)/str str name=parsedquery_toStringDOC_TEXT:3 1 15/str lst name=explain str name=123456 0.035959315 = fieldWeight(DOC_TEXT:3 1 15 in 0), product of: 1.0 = tf(phraseFreq=1.0) 0.92055845 = idf(DOC_TEXT: 3=1 1=1 15=1) 0.0390625 = fieldNorm(field=DOC_TEXT, doc=0) /str /lst str name=QParserLuceneQParser/str arr name=filter_queries str/ /arr arr name=parsed_filter_queries/ lst name=timing ... /lst /response Nothing looks suspicious. Can you provide two things more; fieldType of DOC_TEXT and field definition of DOC_TEXT. Also do you get snippet from the same doc, when you remove quotes from your query?
RE: Allowing looser matches
For (a) I don't think anything exists today providing this mechanism. But (b) is a good description of the dismax handler with a MM parameter of 66%. Pierre -Message d'origine- De : Mark Mandel [mailto:mark.man...@gmail.com] Envoyé : mercredi 13 avril 2011 10:04 À : solr-user@lucene.apache.org Objet : Allowing looser matches Not sure if the title explains it all, or if what I want is even possible, but figured I would ask. Say, I have a series of products I'm selling, and a search of: Blue Wool Rugs Comes in. This returns 0 results, as Blue and Rugs match terms that are indexes, Wool does not. Is there a way to configure my index/searchHandler, to either: (a) if no documents are returned, look to partial matches of the search (e.g. return results with 'Blue rugs', in this case) (b) add results to the overall search, but at a lower score, that have only *some* of the terms being searched in them (in this case, maybe 2/3) Is that even possible? Thanks, Mark -- E: mark.man...@gmail.com T: http://www.twitter.com/neurotic W: www.compoundtheory.com cf.Objective(ANZ) - Nov 17, 18 - Melbourne Australia http://www.cfobjective.com.au Hands-on ColdFusion ORM Training www.ColdFusionOrmTraining.com
RE: Highlighting Problem
Look like special chars are filtered at index time and not replaced by space that would keep correct offset of terms. Can you paste here the definition of the fieldtype in your shema.xml ? Pierre -Message d'origine- De : pottw...@freenet.de [mailto:pottw...@freenet.de] Envoyé : lundi 28 mars 2011 11:16 À : solr-user@lucene.apache.org Objet : Highlighting Problem dear solr specialists, my data looks like this: j]s(dh)fjk [hf]sjkadh asdj(kfh) [skdjfh aslkfjhalwe uigfrhj bsd bsdfga sjfg asdlfj. if I want to query for the first word, the following queries must match: j]s(dh)fjk j]s(dhfjk j]sdhfjk jsdhfjk dhf So the matching should ignore some characters like ( ) [ ] and should match substrings. So far I have the following field definition in the schema.xml: With this definition the matching works as planned. But not for highlighting, there the special characters seem to move the tags to wrong positions, for example searching for jsdhfjk misses the last 3 letters of the words ( = 3 special characters from PatternReplaceFilterFactory) j]s(dh)fjk Solr has so many bells and whistles - what must I do to get a correctly working highlighting? kind regards, F. --- Zeigen Sie uns Ihre beste Seite und gewinnen Sie ein iPad! Machen Sie mit beim freenet Homepage Award 2011
RE: Using Solr 1.4.1 on most recent Tomcat 7.0.11
I do have the xml preamble ?xml version=1.0 encoding=UTF-8? in my config file in conf/Catalina/localhost/ and solr starts ok with Tomcat 7.0.8. Haven't try with 7.0.11 yet. I wonder why your exception point to line 4 column 6, however. Shouldn't it point to line 1 column 1 ? Do you have some blank lines at the start of your XML file or some non blank lines ? Pierre -Message d'origine- De : François Schiettecatte [mailto:fschietteca...@gmail.com] Envoyé : jeudi 17 mars 2011 14:48 À : solr-user@lucene.apache.org Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 Lewis My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my context file and it does not have the xml preamble your has, specifically: '?xml version=1.0 encoding=utf-8?', Here is my context file: Context docBase=/home/omim/lib/java/apache-solr-4.0-2011-02-09_08-06-20.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/home/omim/index/ override=true / /Context --- Hope this helps. Cheers François On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote: Hello list, Is anyone running Solr (in my case 1.4.1) on above Tomcat dist? In the past I have been using guidance in accordance with http://wiki.apache.org/solr/SolrTomcat#Installing_Solr_instances_under_Tomcat but having upgraded from Tomcat 7.0.8 to 7.0.11 I am having problems E.g. INFO: Deploying configuration descriptor wombra.xml This is my context fragment from /home/lewis/Downloads/apache-tomcat-7.0.11/conf/Catalina/localhost 16-Mar-2011 16:57:36 org.apache.tomcat.util.digester.Digester fatalError SEVERE: Parse Fatal Error at line 4 column 6: The processing instruction target matching [xX][mM][lL] is not allowed. org.xml.sax.SAXParseException: The processing instruction target matching [xX][mM][lL] is not allowed. ... 16-Mar-2011 16:57:36 org.apache.catalina.startup.HostConfig deployDescriptor SEVERE: Error deploying configuration descriptor wombra.xml org.xml.sax.SAXParseException: The processing instruction target matching [xX][mM][lL] is not allowed. ... some more ... My configuration descriptor is as follows ?xml version=1.0 encoding=utf-8? Context docBase=/home/lewis/Downloads/wombra/wombra.war crossContext=true Environment name=solr/home type=java.lang.String value=/home/lewis/Downloads/wombra override=true/ /Context Preferably I would upload a WAR file, but I have been working well with the configuration I have been using up until now therefore I didn't question change. I am unfamiliar with the above errors. Can anyone please point me in the right direction? Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education's Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
RE: Concurrent updates/commits
However, the Solr book, in the Commit, Optimise, Rollback section reads: if more than one Solr client were to submit modifications and commit them at similar times, it is possible for part of one client's set of changes to be committed before that client told Solr to commit which suggests that requests are *not* serialised. I read this as If two client submit modifications and commits every couple of minutes, it could happen that modifications of client1 got committed by client2's commit before client1 asks for a commit. As far as I understand Solr commit, they are serialized by design. And committing too often could lead you to trouble if you have many warm-up queries (?). Hope this helps, Pierre -Message d'origine- De : Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Envoyé : mercredi 9 février 2011 16:34 À : solr-user@lucene.apache.org Objet : Concurrent updates/commits Hello, This topic has probably been covered before here, but we're still not very clear about how multiple commits work in Solr. We currently have a requirement to make our domain objects searchable immediately after the get updated in the database by some user action. This could potentially cause multiple updates/commits to be fired to Solr and we are trying to investigate how Solr handles those multiple requests. This thread: http://search-lucene.com/m/0cab31f10Mh/concurrent+commitssubj=commit+concurrency+full+text+search suggests that Solr will handle all of the lower level details and that Before a *COMMIT* is done , lock is obtained and its released after the operation which in my understanding means that Solr will serialise all update/commit requests? However, the Solr book, in the Commit, Optimise, Rollback section reads: if more than one Solr client were to submit modifications and commit them at similar times, it is possible for part of one client's set of changes to be committed before that client told Solr to commit which suggests that requests are *not* serialised. Our questions are: - Does Solr handle concurrent requests or do we need to add synchronisation logic around our code? - If Solr *does* handle concurrent requests, does it serialise each request or has some other strategy for processing those? Thanks, - Savvas
RE: Concurrent updates/commits
Well, Jonathan explanations are much more accurate than mine. :) I took the word serialization as meaning kind of isolation between commits, which is not very smart. Sorry to have introduce more confusion in this. Pierre -Message d'origine- De : Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Envoyé : mercredi 9 février 2011 17:04 À : solr-user@lucene.apache.org Objet : Re: Concurrent updates/commits Hello, Thanks very much for your quick replies. So, according to Pierre, all updates will be immediately posted to Solr, but all commits will be serialised. But doesn't that contradict Jonathan's example where you can end up with FIVE 'new indexes' being warmed? If commits are serialised, then there can only ever be one Index Searcher being auto-warmed at a time or have I got this wrong? The reason we are investigating commit serialisation, is because we want to know whether the commit requests will be blocked until the previous ones finish. Cheers, - Savvas On 9 February 2011 15:44, Pierre GOSSE pierre.go...@arisem.com wrote: However, the Solr book, in the Commit, Optimise, Rollback section reads: if more than one Solr client were to submit modifications and commit them at similar times, it is possible for part of one client's set of changes to be committed before that client told Solr to commit which suggests that requests are *not* serialised. I read this as If two client submit modifications and commits every couple of minutes, it could happen that modifications of client1 got committed by client2's commit before client1 asks for a commit. As far as I understand Solr commit, they are serialized by design. And committing too often could lead you to trouble if you have many warm-up queries (?). Hope this helps, Pierre -Message d'origine- De : Savvas-Andreas Moysidis [mailto: savvas.andreas.moysi...@googlemail.com] Envoyé : mercredi 9 février 2011 16:34 À : solr-user@lucene.apache.org Objet : Concurrent updates/commits Hello, This topic has probably been covered before here, but we're still not very clear about how multiple commits work in Solr. We currently have a requirement to make our domain objects searchable immediately after the get updated in the database by some user action. This could potentially cause multiple updates/commits to be fired to Solr and we are trying to investigate how Solr handles those multiple requests. This thread: http://search-lucene.com/m/0cab31f10Mh/concurrent+commitssubj=commit+concurrency+full+text+search suggests that Solr will handle all of the lower level details and that Before a *COMMIT* is done , lock is obtained and its released after the operation which in my understanding means that Solr will serialise all update/commit requests? However, the Solr book, in the Commit, Optimise, Rollback section reads: if more than one Solr client were to submit modifications and commit them at similar times, it is possible for part of one client's set of changes to be committed before that client told Solr to commit which suggests that requests are *not* serialised. Our questions are: - Does Solr handle concurrent requests or do we need to add synchronisation logic around our code? - If Solr *does* handle concurrent requests, does it serialise each request or has some other strategy for processing those? Thanks, - Savvas
RE: Problem in faceting
Using a facet query like facet.query=+water +treatement +plant ... should give a count of 0 to documents not having all tree terms. This could do the trick, if I understand how this parameter works.