Re: DIH import from MySQL results in garbage text for special chars
The output of Show variables goes like this. I have verified with the hex values and they are different in MySQL and Solr. | Variable_name| Value | +--++ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results| latin1 | | character_set_server | latin1 | | character_set_system | utf8 | | character_sets_dir | /usr/share/mysql/charsets/ *Pranav Prakash* temet nosce On Wed, Sep 26, 2012 at 6:45 PM, Gora Mohanty g...@mimirtech.com wrote: On 21 September 2012 11:19, Pranav Prakash pra...@gmail.com wrote: I am seeing the garbage text in browser, Luke Index Toolbox and everywhere it is the same. My servlet container is Jetty which is the out-of-box one. Many other special chars are getting indexed and stored properly, only few characters causes pain. Could you double-check the encoding on the mysql side? What is the output of mysql SHOW VARIABLES LIKE 'character\_set\_%'; Regards, Gora
Re: DIH import from MySQL results in garbage text for special chars
Mr Prakash, On 27 September 2012 02:06, Pranav Prakash pra...@gmail.com wrote: | Variable_name| Value | +--++ | character_set_client | latin1 | | character_set_connection | latin1 | | character_set_database | latin1 | | character_set_filesystem | binary | | character_set_results| latin1 | | character_set_server | latin1 | | character_set_system | utf8 | These should all be the same (presumably the system encoding). -- H -- Sent from my mobile device Envoyait de mon portable
Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?
Hi Ahmet, thanks for your reply:) I see that it does not come with the 4.0 release, because the given patches do not work with this version. Right? Best regards Vadim 2012/9/26 Ahmet Arslan iori...@yahoo.com: we assume i have a simple query like this with wildcard and tilde: japa* fukushima~10 instead of japan fukushima~10 OR japanese fukushima~10, etc. Do we have a solution in Solr 4.0 to work with these kind of queries? Vadim, two open jira issues: https://issues.apache.org/jira/browse/SOLR-1604 https://issues.apache.org/jira/browse/LUCENE-1486
Re: Items disappearing from Solr index
#What is the field type for that field - string or text? It is a string type. Thanks. On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky j...@basetechnology.comwrote: What is the field type for that field - string or text? -- Jack Krupansky -Original Message- From: Kissue Kissue Sent: Wednesday, September 26, 2012 1:43 PM To: solr-user@lucene.apache.org Subject: Re: Items disappearing from Solr index # It is looking for documents with Emory in the specified field OR Labs in the default search field. This does not seem to be the case. For instance issuing a deleteByQuery for catalogueId: PEARL LINGUISTICS LTD also deletes the contents of a catalogueId with the value: Ncl_**MacNaughtonMcGregorCoaching_** vf010811. Thanks. On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky j...@basetechnology.com* *wrote: It is looking for documents with Emory in the specified field OR Labs in the default search field. -- Jack Krupansky -Original Message- From: Kissue Kissue Sent: Wednesday, September 26, 2012 7:47 AM To: solr-user@lucene.apache.org Subject: Re: Items disappearing from Solr index I have just solved this problem. We have a field called catalogueId. One possible value for this field could be Emory Labs. I found out that when the following delete by query is sent to solr: getSolrServer().deleteByQuery(catalogueId + : + Emory Labs) [Notice that there are no quotes surrounding the catalogueId value - Emory Labs] For some reason this delete by query ends up deleting the contents of some other random catalogues too which is the reason why we are loosing items from the index. When the query is changed to: getSolrServer().deleteByQuery(catalogueId + : + Emory Labs), then it starts to correctly delete only items in the Emory Labs catalogue. So my first question is, what exactly does deleteByQuery do in the first query without the quotes? How is it determining which catalogues to delete? Secondly, shouldn't the correct behaviour be not to delete anything at all in this case since when a search is done for the same catalogueId without the quotes it just simply returns no results? Thanks. On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue kissue...@gmail.com wrote: Hi Erick, Thanks for your reply. Yes i am using delete by query. I am currently logging the number of items to be deleted before handing off to solr. And from solr logs i can it deleted exactly that number. I will verify further. Thanks. On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson erickerick...@gmail.com **wrote: How do you delete items? By ID or by query? My guess is that one of two things is happening: 1 your delete process is deleting too much data. 2 your index process isn't indexing what you think. I'd add some logging to the SolrJ program to see what it thinks is has deleted or added to the index and go from there. Best Erick On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to index and delete items from solr. I basically index items from the db into solr every night. Existing items can be marked for deletion in the db and a delete request sent to solr to delete such items. My process runs as follows every night: 1. Check if items have been marked for deletion and delete from solr. I commit and optimize after the entire solr deletion runs. 2. Index any new items to solr. I commit and optimize after all the new items have been added. Recently i started noticing that huge chunks of items that have not been marked for deletion are disappearing from the index. I checked the solr logs and the logs indicate that it is deleting exactly the number of items requested but still a lot of other items disappear from the index from time to time. Any ideas what might be causing this or what i am doing wrong. Thanks.
Re: How can I create about 100000 independent indexes in Solr?
Hello Monton, I wanted to make sure that you understood me well : I really don't how well does solr scale if the number of fields increases... What I mean here is that the more distinct fields you index, the more memory you will need. So if in your schema, you have something like 15 fields declared, then storing data for 100 distinct customers would generate 1500 fields in the index. I really don't know how well would that scale. The simplest solution is one core per customer but the same issue (memory consumption) will rise at some point, I guess. There must be a clever way to do that... -- Tanguy 2012/9/26 韦震宇 weizhe...@win-trust.com Hi, Tanguy I would do as your suggestion. Best Regards! Monton - Original Message - From: Tanguy Moal tanguy.m...@gmail.com To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk Sent: Tuesday, September 25, 2012 11:05 PM Subject: Re: How can I create about 10 independent indexes in Solr? That is an interesting issue... I was wondering if relying on dynamic fields could be an option... Something like : * field_name: field_type * customer : string * *_field_a1 : type_a * *_field_a2 : type_a * *_field_b1 : type_b * ... And the prefix each field by the customer name, so for customer1, indexed documents are as follow : * customer : customer1 * customer1_field_a1 : value for field_a1 * customer1_field_a2 : value for field_a2 * customer1_field_b1 : value for field_b1 * ... And for customer2 : * customer : customer2 * customer2_field_a1 : value for field_a1 * customer2_field_a2 : value for field_a2 * customer2_field_b1 : value for field_b1 * ... This solution is simple, and helps isolating each customers fields so features like suggester, spellcheck, ..., things relying on frequencies would work (as if in a single core) I just don't how well does solr scale if the number of fields increases... Then scaling could be achieved depending on number of doc / customer and number of customer / core (if amount of fields consumes resources) Could that help ? -- Tanguy 2012/9/25 Toke Eskildsen t...@statsbiblioteket.dk On Tue, 2012-09-25 at 04:21 +0200, 韦震宇 wrote: The company I'm working in have a website to server more than 10 customers, and every customer should have it's own search cataegory. So I should create independent index for every customer. How many of the customers are active at any given time and how large are the indexes? Depending on usage you might be able to have a limited number of indexes open at any given time and opening new indexes on demand.
ExtractingRequestHandler causes Out of Memory Error
Hi guys, I use Manifold CF to crawl files in Windows file server and index them to Solr using Extracting Request Handler. Most of the documents are succesfully indexed but some are failed and Out of Memory Error occurs in Solr, so I need some advice. Those failed files are not so big and they are a csv file of 240MB and a text file of 170MB. Here is environment and machine spec: Solr 3.6 (also Solr4.0Beta) Tomcat 6.0 CentOS 5.6 java version 1.6.0_23 HDD 60GB MEM 2GB JVM Heap: -Xmx1024m -Xms1024m I feel there is enough memory that Solr should be able to extract and index file content. Here is a Solr log below: -- [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) at java.lang.StringBuilder.append(StringBuilder.java:189) at org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) - Anyone has any ideas? Regards, Shigeki
Re: How to retrive value from float field in custom request handler?
Thanks guys.. I was able to retrieve all values now.. But why Solr Field is not having a method to retrieve values for all data types? something similar to Object obj = doc.getField(Field1); Why only stringvalue is exposed in this Field class? doc.getField(Field1).stringValue() Thanks, ravi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-retrive-value-from-float-field-in-custom-request-handler-tp4010478p4010707.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem with Special Characters in SOLR Query
Hi , Im using text_general fieldType for searching in SOLR. while searching keywords along with special characters not getting proper results and getting errors. used special characters like below. 1) - 2) 3) + QUERY :: *solr?q=Healing - Live* *solr?q=Healing Live* *solr?q=Healing ? Live* Error message: The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse '(Healing \': Lexical error at line 1, column 8. Encountered: EOF after : \Healing \\). schema.xml --- fieldType name=text_generalold class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=title type=text_general indexed=true stored=true / field name=text type=text_general indexed=true stored=false multiValued=true/ defaultSearchFieldtext/defaultSearchField copyField source=title dest=text/ Please suggest me in this, and thanks in advance. AnilJayanti -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Problem with Special Characters in SOLR Query
Hi, Just escape all solr special char example: solr?q=Healing \ Live Regards, Irshad On Thu, Sep 27, 2012 at 3:55 PM, aniljayanti anil.jaya...@gmail.com wrote: Hi , Im using text_general fieldType for searching in SOLR. while searching keywords along with special characters not getting proper results and getting errors. used special characters like below. 1) - 2) 3) + QUERY :: *solr?q=Healing - Live* *solr?q=Healing Live* *solr?q=Healing ? Live* Error message: The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse '(Healing \': Lexical error at line 1, column 8. Encountered: EOF after : \Healing \\). schema.xml --- fieldType name=text_generalold class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=title type=text_general indexed=true stored=true / field name=text type=text_general indexed=true stored=false multiValued=true/ defaultSearchFieldtext/defaultSearchField copyField source=title dest=text/ Please suggest me in this, and thanks in advance. AnilJayanti -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Replication and Autocommit
I'll echo Otis, nothing comes to mind... Unless you were indexing stuff to the _slaves_, which you should never do, now or in the past Erick On Thu, Sep 27, 2012 at 12:00 AM, Aleksey Vorona avor...@ea.com wrote: Hi, I remember having some issues with replication and autocommit previously. But now we are using Solr 3.6.1. Are there any known issues or any other reasons to avoid autocommit while using replication? I guess not, just want confirmation from someone confident and competent. -- Aleksey
Re: How can I create about 100000 independent indexes in Solr?
Hi, Tanguy Oh, I understand now. I don't have the issue as you. Though there are so many customers in our site, but the fields they owned are same. so few field fields are ok in my scene. Best Regards! Monton - Original Message - From: Tanguy Moal tanguy.m...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, September 27, 2012 4:34 PM Subject: Re: How can I create about 10 independent indexes in Solr? Hello Monton, I wanted to make sure that you understood me well : I really don't how well does solr scale if the number of fields increases... What I mean here is that the more distinct fields you index, the more memory you will need. So if in your schema, you have something like 15 fields declared, then storing data for 100 distinct customers would generate 1500 fields in the index. I really don't know how well would that scale. The simplest solution is one core per customer but the same issue (memory consumption) will rise at some point, I guess. There must be a clever way to do that... -- Tanguy 2012/9/26 韦震宇 weizhe...@win-trust.com Hi, Tanguy I would do as your suggestion. Best Regards! Monton - Original Message - From: Tanguy Moal tanguy.m...@gmail.com To: solr-user@lucene.apache.org; t...@statsbiblioteket.dk Sent: Tuesday, September 25, 2012 11:05 PM Subject: Re: How can I create about 10 independent indexes in Solr? That is an interesting issue... I was wondering if relying on dynamic fields could be an option... Something like : * field_name: field_type * customer : string * *_field_a1 : type_a * *_field_a2 : type_a * *_field_b1 : type_b * ... And the prefix each field by the customer name, so for customer1, indexed documents are as follow : * customer : customer1 * customer1_field_a1 : value for field_a1 * customer1_field_a2 : value for field_a2 * customer1_field_b1 : value for field_b1 * ... And for customer2 : * customer : customer2 * customer2_field_a1 : value for field_a1 * customer2_field_a2 : value for field_a2 * customer2_field_b1 : value for field_b1 * ... This solution is simple, and helps isolating each customers fields so features like suggester, spellcheck, ..., things relying on frequencies would work (as if in a single core) I just don't how well does solr scale if the number of fields increases... Then scaling could be achieved depending on number of doc / customer and number of customer / core (if amount of fields consumes resources) Could that help ? -- Tanguy 2012/9/25 Toke Eskildsen t...@statsbiblioteket.dk On Tue, 2012-09-25 at 04:21 +0200, 韦震宇 wrote: The company I'm working in have a website to server more than 10 customers, and every customer should have it's own search cataegory. So I should create independent index for every customer. How many of the customers are active at any given time and how large are the indexes? Depending on usage you might be able to have a limited number of indexes open at any given time and opening new indexes on demand.
httpSolrServer and exyternal load balancer
Hi We have the following solr http server bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer id=solrserver constructor-arg value=urlToSlaveLoadBalancer / property name=soTimeout value=1000 / property name=connectionTimeout value=1000 / property name=defaultMaxConnectionsPerHost value=5 / property name=maxTotalConnections value=20 / property name=allowCompression value=true / /bean The issue we face is the f5 balancer is returning a cookie which the client is hanging onto. resulting in the same slave being hit for all requests. one obvious solution is to config the load balancer to be non sticky however politically a non-standard load balancer is timescale suicide. (It is an out sourced corporate thing) I'm not keen to use the LB http solr server as i don't want this to be a concern of the software and have a list of servers etc. (although as a stop gap may well have to) My question is can I configure the solr server to ignore client state ? We are on solr 3.4 Thanks in advance lee c
Re: Items disappearing from Solr index
Wild shot in the dark What happens if you switch from StreamingUpdateSolrServer to HttpSolrServer? What I'm wondering is if somehow you're getting a queueing problem. If you have multiple threads defined for SUSS, it might be possible (and I'm guessing) that the delete bit is getting sent after some of the adds. Frankly I doubt this is the case, but this issue is so weird that I'm grasping at straws. BTW, there's no reason to optimize twice. Actually, the new thinking is that optimizing usually isn't necessary anyway. But if you insist on optimizing there's no reason to do it _both_ after the deletes and after the adds, just do it after the adds. Best Erick On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue kissue...@gmail.com wrote: #What is the field type for that field - string or text? It is a string type. Thanks. On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky j...@basetechnology.comwrote: What is the field type for that field - string or text? -- Jack Krupansky -Original Message- From: Kissue Kissue Sent: Wednesday, September 26, 2012 1:43 PM To: solr-user@lucene.apache.org Subject: Re: Items disappearing from Solr index # It is looking for documents with Emory in the specified field OR Labs in the default search field. This does not seem to be the case. For instance issuing a deleteByQuery for catalogueId: PEARL LINGUISTICS LTD also deletes the contents of a catalogueId with the value: Ncl_**MacNaughtonMcGregorCoaching_** vf010811. Thanks. On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky j...@basetechnology.com* *wrote: It is looking for documents with Emory in the specified field OR Labs in the default search field. -- Jack Krupansky -Original Message- From: Kissue Kissue Sent: Wednesday, September 26, 2012 7:47 AM To: solr-user@lucene.apache.org Subject: Re: Items disappearing from Solr index I have just solved this problem. We have a field called catalogueId. One possible value for this field could be Emory Labs. I found out that when the following delete by query is sent to solr: getSolrServer().deleteByQuery(catalogueId + : + Emory Labs) [Notice that there are no quotes surrounding the catalogueId value - Emory Labs] For some reason this delete by query ends up deleting the contents of some other random catalogues too which is the reason why we are loosing items from the index. When the query is changed to: getSolrServer().deleteByQuery(catalogueId + : + Emory Labs), then it starts to correctly delete only items in the Emory Labs catalogue. So my first question is, what exactly does deleteByQuery do in the first query without the quotes? How is it determining which catalogues to delete? Secondly, shouldn't the correct behaviour be not to delete anything at all in this case since when a search is done for the same catalogueId without the quotes it just simply returns no results? Thanks. On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue kissue...@gmail.com wrote: Hi Erick, Thanks for your reply. Yes i am using delete by query. I am currently logging the number of items to be deleted before handing off to solr. And from solr logs i can it deleted exactly that number. I will verify further. Thanks. On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson erickerick...@gmail.com **wrote: How do you delete items? By ID or by query? My guess is that one of two things is happening: 1 your delete process is deleting too much data. 2 your index process isn't indexing what you think. I'd add some logging to the SolrJ program to see what it thinks is has deleted or added to the index and go from there. Best Erick On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to index and delete items from solr. I basically index items from the db into solr every night. Existing items can be marked for deletion in the db and a delete request sent to solr to delete such items. My process runs as follows every night: 1. Check if items have been marked for deletion and delete from solr. I commit and optimize after the entire solr deletion runs. 2. Index any new items to solr. I commit and optimize after all the new items have been added. Recently i started noticing that huge chunks of items that have not been marked for deletion are disappearing from the index. I checked the solr logs and the logs indicate that it is deleting exactly the number of items requested but still a lot of other items disappear from the index from time to time. Any ideas what might be causing this or what i am doing wrong. Thanks.
Re: Problem with Special Characters in SOLR Query
Hi, thanks, I tried with below query getting result. q=Cheat \- Album Version But getting error with below. q=Oot \ Aboot Error message : -- message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical error at line 1, column 6. Encountered: EOF after : description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical error at line 1, column 6. Encountered: EOF after : ). anilJayanti -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712p4010720.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: httpSolrServer and exyternal load balancer
What client state? Solr servers are stateless, they don't keep any information specific to particular clients so this doesn't seem to be a problem. What Solr _does_ do is cache things like fq clauses, but these are not user-specific. Which actually argues for going to the same slave on the theory that requests from a user are more likely to have the same fq clauses. Consider faceting on shoes. The user clicks mens and you add an fq like fq=gender:mens. Then the user wants dress shoes so you submit another query fq=gender:mensfq=style:dress. The first fq clause has already been calculated and cached so doesn't have to be re-calculated for the second query... But the stickiness is usually the way Solr is used, so this seems like a red herring. FWIW, Erick On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi We have the following solr http server bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer id=solrserver constructor-arg value=urlToSlaveLoadBalancer / property name=soTimeout value=1000 / property name=connectionTimeout value=1000 / property name=defaultMaxConnectionsPerHost value=5 / property name=maxTotalConnections value=20 / property name=allowCompression value=true / /bean The issue we face is the f5 balancer is returning a cookie which the client is hanging onto. resulting in the same slave being hit for all requests. one obvious solution is to config the load balancer to be non sticky however politically a non-standard load balancer is timescale suicide. (It is an out sourced corporate thing) I'm not keen to use the LB http solr server as i don't want this to be a concern of the software and have a list of servers etc. (although as a stop gap may well have to) My question is can I configure the solr server to ignore client state ? We are on solr 3.4 Thanks in advance lee c
Re: Problem with Special Characters in SOLR Query
On Thu, 2012-09-27 at 13:49 +0200, aniljayanti wrote: But getting error with below. q=Oot \ Aboot Error message : -- message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical error at line 1, column 6. Encountered: EOF after : It seems like you are sending the query by performing a REST-call. You need to URL-escape those, because in an url is a delimiter for arguments. Instead of http://localhost:8983/solr/collection1/select/?q=Oot \ Aboot you need to send http://localhost:8983/solr/collection1/select/?q=Oot%20%5C%26%20Aboot
Re: Problem with Special Characters in SOLR Query
Right, you're conflating two separate issues 1 URL escaping. the is a special character in the URL, entirely separate from Solr. Try using %26 rather than \ 2 Query parsing. Once the string gets through the URL and servlet container, it's in query parsing land, where the escaping of _query_ special characters like '-' counts. 3 And just to confuse matters a LOT, when you're looking at URLs, space is translated to '+'. So when you look in your log file, you'll see the query q=me myself reported as q=me+myself which has nothing to do with the Lucene MUST (+) operator Best Erick On Thu, Sep 27, 2012 at 7:49 AM, aniljayanti anil.jaya...@gmail.com wrote: Hi, thanks, I tried with below query getting result. q=Cheat \- Album Version But getting error with below. q=Oot \ Aboot Error message : -- message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical error at line 1, column 6. Encountered: EOF after : description The request sent by the client was syntactically incorrect (org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical error at line 1, column 6. Encountered: EOF after : ). anilJayanti -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712p4010720.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: httpSolrServer and exyternal load balancer
Hi Erick, the load balancer in front of the solr servers is dropping the cookie not the solr server themselves. are you saying the clients http connection manager builds will ignore this state ? it looks like they do not. It looks like the client is passing the cookie back to the load balancer I want to configure the clients not to pass cookies basically. Does that make sense ? On 27 September 2012 12:54, Erick Erickson erickerick...@gmail.com wrote: What client state? Solr servers are stateless, they don't keep any information specific to particular clients so this doesn't seem to be a problem. What Solr _does_ do is cache things like fq clauses, but these are not user-specific. Which actually argues for going to the same slave on the theory that requests from a user are more likely to have the same fq clauses. Consider faceting on shoes. The user clicks mens and you add an fq like fq=gender:mens. Then the user wants dress shoes so you submit another query fq=gender:mensfq=style:dress. The first fq clause has already been calculated and cached so doesn't have to be re-calculated for the second query... But the stickiness is usually the way Solr is used, so this seems like a red herring. FWIW, Erick On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi We have the following solr http server bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer id=solrserver constructor-arg value=urlToSlaveLoadBalancer / property name=soTimeout value=1000 / property name=connectionTimeout value=1000 / property name=defaultMaxConnectionsPerHost value=5 / property name=maxTotalConnections value=20 / property name=allowCompression value=true / /bean The issue we face is the f5 balancer is returning a cookie which the client is hanging onto. resulting in the same slave being hit for all requests. one obvious solution is to config the load balancer to be non sticky however politically a non-standard load balancer is timescale suicide. (It is an out sourced corporate thing) I'm not keen to use the LB http solr server as i don't want this to be a concern of the software and have a list of servers etc. (although as a stop gap may well have to) My question is can I configure the solr server to ignore client state ? We are on solr 3.4 Thanks in advance lee c
Re: httpSolrServer and exyternal load balancer
But again, why do you want to do this? I really think you don't. I'm assuming that when you say this: ...resulting in the same slave being hit for all requests. you mean all requests _from the same client_. If that's not what's happening, then disregard my maundering because when it comes to setting up LBs, I'm clueless. But I can say that many installations have LBs set up with sticky sessions on a per-client basis.. Consider another scenario; replication. If you have 2 slaves, each with a polling interval of 5 minutes note that they are not coordinated. So slave 1 can poll at 14:00:00. Slave 2 at 14:01:00. Say there's been a commit at 14:00:30. Requests to slave 2 will have a different view of the index than slave 1, so if your user resends the exact same request, they may see different results. I could submit the request 5 times in a row and the results would not only be different each time, they would flip-flop back and forth. I wouldn't do this unless and until you have a demonstrated need. Best Erick On Thu, Sep 27, 2012 at 8:07 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi Erick, the load balancer in front of the solr servers is dropping the cookie not the solr server themselves. are you saying the clients http connection manager builds will ignore this state ? it looks like they do not. It looks like the client is passing the cookie back to the load balancer I want to configure the clients not to pass cookies basically. Does that make sense ? On 27 September 2012 12:54, Erick Erickson erickerick...@gmail.com wrote: What client state? Solr servers are stateless, they don't keep any information specific to particular clients so this doesn't seem to be a problem. What Solr _does_ do is cache things like fq clauses, but these are not user-specific. Which actually argues for going to the same slave on the theory that requests from a user are more likely to have the same fq clauses. Consider faceting on shoes. The user clicks mens and you add an fq like fq=gender:mens. Then the user wants dress shoes so you submit another query fq=gender:mensfq=style:dress. The first fq clause has already been calculated and cached so doesn't have to be re-calculated for the second query... But the stickiness is usually the way Solr is used, so this seems like a red herring. FWIW, Erick On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi We have the following solr http server bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer id=solrserver constructor-arg value=urlToSlaveLoadBalancer / property name=soTimeout value=1000 / property name=connectionTimeout value=1000 / property name=defaultMaxConnectionsPerHost value=5 / property name=maxTotalConnections value=20 / property name=allowCompression value=true / /bean The issue we face is the f5 balancer is returning a cookie which the client is hanging onto. resulting in the same slave being hit for all requests. one obvious solution is to config the load balancer to be non sticky however politically a non-standard load balancer is timescale suicide. (It is an out sourced corporate thing) I'm not keen to use the LB http solr server as i don't want this to be a concern of the software and have a list of servers etc. (although as a stop gap may well have to) My question is can I configure the solr server to ignore client state ? We are on solr 3.4 Thanks in advance lee c
Re: Items disappearing from Solr index
Actually this problem occurs even when i am doing just deletes. I tested by sending only one delete query for a single catalogue and had the same problem. I always optimize once. I changed to the syntax you suggested ( {!term f=catalogueId}Emory Labs) and works like a charm. Thanks for the pointer, saved me from another issue that could have occurred at some point. Thanks. On Thu, Sep 27, 2012 at 12:30 PM, Erick Erickson erickerick...@gmail.comwrote: Wild shot in the dark What happens if you switch from StreamingUpdateSolrServer to HttpSolrServer? What I'm wondering is if somehow you're getting a queueing problem. If you have multiple threads defined for SUSS, it might be possible (and I'm guessing) that the delete bit is getting sent after some of the adds. Frankly I doubt this is the case, but this issue is so weird that I'm grasping at straws. BTW, there's no reason to optimize twice. Actually, the new thinking is that optimizing usually isn't necessary anyway. But if you insist on optimizing there's no reason to do it _both_ after the deletes and after the adds, just do it after the adds. Best Erick On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue kissue...@gmail.com wrote: #What is the field type for that field - string or text? It is a string type. Thanks. On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky j...@basetechnology.com wrote: What is the field type for that field - string or text? -- Jack Krupansky -Original Message- From: Kissue Kissue Sent: Wednesday, September 26, 2012 1:43 PM To: solr-user@lucene.apache.org Subject: Re: Items disappearing from Solr index # It is looking for documents with Emory in the specified field OR Labs in the default search field. This does not seem to be the case. For instance issuing a deleteByQuery for catalogueId: PEARL LINGUISTICS LTD also deletes the contents of a catalogueId with the value: Ncl_**MacNaughtonMcGregorCoaching_** vf010811. Thanks. On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky j...@basetechnology.com* *wrote: It is looking for documents with Emory in the specified field OR Labs in the default search field. -- Jack Krupansky -Original Message- From: Kissue Kissue Sent: Wednesday, September 26, 2012 7:47 AM To: solr-user@lucene.apache.org Subject: Re: Items disappearing from Solr index I have just solved this problem. We have a field called catalogueId. One possible value for this field could be Emory Labs. I found out that when the following delete by query is sent to solr: getSolrServer().deleteByQuery(catalogueId + : + Emory Labs) [Notice that there are no quotes surrounding the catalogueId value - Emory Labs] For some reason this delete by query ends up deleting the contents of some other random catalogues too which is the reason why we are loosing items from the index. When the query is changed to: getSolrServer().deleteByQuery(catalogueId + : + Emory Labs), then it starts to correctly delete only items in the Emory Labs catalogue. So my first question is, what exactly does deleteByQuery do in the first query without the quotes? How is it determining which catalogues to delete? Secondly, shouldn't the correct behaviour be not to delete anything at all in this case since when a search is done for the same catalogueId without the quotes it just simply returns no results? Thanks. On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue kissue...@gmail.com wrote: Hi Erick, Thanks for your reply. Yes i am using delete by query. I am currently logging the number of items to be deleted before handing off to solr. And from solr logs i can it deleted exactly that number. I will verify further. Thanks. On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson erickerick...@gmail.com **wrote: How do you delete items? By ID or by query? My guess is that one of two things is happening: 1 your delete process is deleting too much data. 2 your index process isn't indexing what you think. I'd add some logging to the SolrJ program to see what it thinks is has deleted or added to the index and go from there. Best Erick On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer to index and delete items from solr. I basically index items from the db into solr every night. Existing items can be marked for deletion in the db and a delete request sent to solr to delete such items. My process runs as follows every night: 1. Check if items have been marked for deletion and delete from solr. I commit and optimize after the entire solr deletion runs. 2. Index any new items to solr. I commit and optimize after all the new items have been added.
Query filtering
Hello, I'm doing this query to return top 10 facets within a given context, specified via the fq parameter. http://solr/core/select?fq=(...)q=*:*rows=0facet.field=interesting_facetfacet.limit=10 Now, I should search for a term inside the context AND the previously identified top 10 facet values. Is there a way to do this with a single query? thank you in advance, S
Regarding delta-import and full-import
Hi All, Can anyone refer me few number blogs that explains both imports in little bit more detail and with examples. Thanks, Darshan
Re: Regarding delta-import and full-import
(12/09/27 22:45), darshan wrote: Hi All, Can anyone refer me few number blogs that explains both imports in little bit more detail and with examples. Thanks, Darshan Asking Google, I got: http://www.arunchinnachamy.com/apache-solr-mysql-data-import/ http://www.andornot.com/blog/post/Sample-Solr-DataImportHandler-for-XML-Files.aspx http://pooteeweet.org/blog/1827 : koji -- http://soleami.com/blog/starting-lab-work.html
Re: httpSolrServer and exyternal load balancer
Hi Erick Our application has one CommonsHttpSolrServer for each solr core used by our web app. Whilst we have many web app clients solr only has 1 client, our application. Does that make sense. This is why sticky load balancing is an issue for us. I cannot see any where the state is being handled in the CommonsHttpSolrServer impl ? It looks like the state is not being passed by the client or am i missing something? Cheers Lee c On 27 September 2012 14:00, Erick Erickson erickerick...@gmail.com wrote: But again, why do you want to do this? I really think you don't. I'm assuming that when you say this: ...resulting in the same slave being hit for all requests. you mean all requests _from the same client_. If that's not what's happening, then disregard my maundering because when it comes to setting up LBs, I'm clueless. But I can say that many installations have LBs set up with sticky sessions on a per-client basis.. Consider another scenario; replication. If you have 2 slaves, each with a polling interval of 5 minutes note that they are not coordinated. So slave 1 can poll at 14:00:00. Slave 2 at 14:01:00. Say there's been a commit at 14:00:30. Requests to slave 2 will have a different view of the index than slave 1, so if your user resends the exact same request, they may see different results. I could submit the request 5 times in a row and the results would not only be different each time, they would flip-flop back and forth. I wouldn't do this unless and until you have a demonstrated need. Best Erick On Thu, Sep 27, 2012 at 8:07 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi Erick, the load balancer in front of the solr servers is dropping the cookie not the solr server themselves. are you saying the clients http connection manager builds will ignore this state ? it looks like they do not. It looks like the client is passing the cookie back to the load balancer I want to configure the clients not to pass cookies basically. Does that make sense ? On 27 September 2012 12:54, Erick Erickson erickerick...@gmail.com wrote: What client state? Solr servers are stateless, they don't keep any information specific to particular clients so this doesn't seem to be a problem. What Solr _does_ do is cache things like fq clauses, but these are not user-specific. Which actually argues for going to the same slave on the theory that requests from a user are more likely to have the same fq clauses. Consider faceting on shoes. The user clicks mens and you add an fq like fq=gender:mens. Then the user wants dress shoes so you submit another query fq=gender:mensfq=style:dress. The first fq clause has already been calculated and cached so doesn't have to be re-calculated for the second query... But the stickiness is usually the way Solr is used, so this seems like a red herring. FWIW, Erick On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi We have the following solr http server bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer id=solrserver constructor-arg value=urlToSlaveLoadBalancer / property name=soTimeout value=1000 / property name=connectionTimeout value=1000 / property name=defaultMaxConnectionsPerHost value=5 / property name=maxTotalConnections value=20 / property name=allowCompression value=true / /bean The issue we face is the f5 balancer is returning a cookie which the client is hanging onto. resulting in the same slave being hit for all requests. one obvious solution is to config the load balancer to be non sticky however politically a non-standard load balancer is timescale suicide. (It is an out sourced corporate thing) I'm not keen to use the LB http solr server as i don't want this to be a concern of the software and have a list of servers etc. (although as a stop gap may well have to) My question is can I configure the solr server to ignore client state ? We are on solr 3.4 Thanks in advance lee c
Re: Query filtering
I think one way to do this is issue another query and set a bunch of filter queries to restrict interesting_facet to just those ten values returned in the first query. fq=interesting_facet:1 OR interesting_facet:2 etcq=context:whatever Does that help? Amit On Thu, Sep 27, 2012 at 6:33 AM, Finotti Simone tech...@yoox.com wrote: Hello, I'm doing this query to return top 10 facets within a given context, specified via the fq parameter. http://solr/core/select?fq=(...)q=*:*rows=0facet.field=interesting_facetfacet.limit=10 Now, I should search for a term inside the context AND the previously identified top 10 facet values. Is there a way to do this with a single query? thank you in advance, S
Re: Solr Replication and Autocommit
Thank both of you for the responses! -- Aleksey On 12-09-27 03:51 AM, Erick Erickson wrote: I'll echo Otis, nothing comes to mind... Unless you were indexing stuff to the _slaves_, which you should never do, now or in the past Erick On Thu, Sep 27, 2012 at 12:00 AM, Aleksey Vorona avor...@ea.com wrote: Hi, I remember having some issues with replication and autocommit previously. But now we are using Solr 3.6.1. Are there any known issues or any other reasons to avoid autocommit while using replication? I guess not, just want confirmation from someone confident and competent. -- Aleksey
RE: SolrJ - IOException
Thanks for your reply. SOLR Server is not stalled. Just the add fails with this exception. Balaji Gandhi, Senior Software Developer, Horizontal Platform Services Product Engineering │ Apollo Group, Inc. 1225 W. Washington St. | AZ23 | Tempe, AZ 85281 Phone: 602.713.2417 | Email: balaji.gan...@apollogrp.edumailto:balaji.gan...@apollogrp.edu P Go Green. Don't Print. Moreover soft copies can be indexed by algorithms. From: roz dev [via Lucene] [mailto:ml-node+s472066n4010037...@n3.nabble.com] Sent: Monday, September 24, 2012 5:46 PM To: Balaji Gandhi Subject: Re: SolrJ - IOException I have seen this happening We retry and that works. Is your solr server stalled? On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi [hidden email]/user/SendEmail.jtp?type=nodenode=4010037i=0wrote: Hi, I am encountering this error randomly (under load) when posting to Solr using SolrJ. Has anyone encountered a similar error? org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8080/solr/profile at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at Thanks, Balaji -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010037.html To unsubscribe from SolrJ - IOException, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4010026code=YmFsYWppLmdhbmRoaUBhcG9sbG9ncnAuZWR1fDQwMTAwMjZ8LTEwNzE2NTA1NDI=. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010795.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrJ - IOException
Here is the stack trace:- org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server: org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at org.apache.solr.handler.dataimport.thread.task.SolrUploadTask.upload(SolrUploadTask.java:31) at org.apache.solr.handler.dataimport.thread.SolrUploader.run(SolrUploader.java:31) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond at org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101) at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252) at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282) at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247) at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216) at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298) at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125) at org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754) at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353) ... 9 more Balaji Gandhi, Senior Software Developer, Horizontal Platform Services Product Engineering │ Apollo Group, Inc. 1225 W. Washington St. | AZ23 | Tempe, AZ 85281 Phone: 602.713.2417 | Email: balaji.gan...@apollogrp.edumailto:balaji.gan...@apollogrp.edu P Go Green. Don't Print. Moreover soft copies can be indexed by algorithms. From: Toke Eskildsen [via Lucene] [mailto:ml-node+s472066n4010082...@n3.nabble.com] Sent: Tuesday, September 25, 2012 12:19 AM To: Balaji Gandhi Subject: Re: SolrJ - IOException On Tue, 2012-09-25 at 01:50 +0200, balaji.gandhi wrote: I am encountering this error randomly (under load) when posting to Solr using SolrJ. Has anyone encountered a similar error? org.apache.solr.client.solrj.SolrServerException: IOException occured when talking to server at: http://localhost:8080/solr/profile at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414) [...] This looks suspiciously like a potential bug in the HTTP keep-alive flow that we encountered some weeks ago. I am guessing that you are issuing more than 100 separate updates/second. Could you please provide the full stack trace? If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010082.html To unsubscribe from SolrJ - IOException, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4010026code=YmFsYWppLmdhbmRoaUBhcG9sbG9ncnAuZWR1fDQwMTAwMjZ8LTEwNzE2NTA1NDI=. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010796.html Sent from the Solr - User mailing list archive at Nabble.com.
How to run Solr Cloud using Tomcat?
I've gone through the guide on running Solr Cloud using Jetty but it's not practical to use JAVA_OPTS etc on real cloud deployments. I don't see how to extend these instructions to running on Tomcat. Has anyone run Solr Cloud under Tomcat successfully? Did they document how? Thanks Roy
RE: How to run Solr Cloud using Tomcat?
Hi - on Debian systems there's a /etc/default/tomcat properties file you can use to set your flags. -Original message- From:Benjamin, Roy rbenja...@ebay.com Sent: Thu 27-Sep-2012 19:57 To: solr-user@lucene.apache.org Subject: How to run Solr Cloud using Tomcat? I've gone through the guide on running Solr Cloud using Jetty but it's not practical to use JAVA_OPTS etc on real cloud deployments. I don't see how to extend these instructions to running on Tomcat. Has anyone run Solr Cloud under Tomcat successfully? Did they document how? Thanks Roy
Re: How to run Solr Cloud using Tomcat?
Hi Roy, jepp, it works with Tomcat 6 and an external Zookeeper. I will publish a blogpost about it tomorrow on sentric.ch My blogpost is ready, but i had no time to publish it in the last couple of days:) Best regards Vadim 2012/9/27 Markus Jelsma markus.jel...@openindex.io: Hi - on Debian systems there's a /etc/default/tomcat properties file you can use to set your flags. -Original message- From:Benjamin, Roy rbenja...@ebay.com Sent: Thu 27-Sep-2012 19:57 To: solr-user@lucene.apache.org Subject: How to run Solr Cloud using Tomcat? I've gone through the guide on running Solr Cloud using Jetty but it's not practical to use JAVA_OPTS etc on real cloud deployments. I don't see how to extend these instructions to running on Tomcat. Has anyone run Solr Cloud under Tomcat successfully? Did they document how? Thanks Roy
Re: httpSolrServer and exyternal load balancer
Ahh, I finally think I get it. I was missing the connection being the CommonsHttpSolrServer. That's the thing that's locking on to a particular slave I'm afraid I'm not up enough on the internals here to be much help, so I'll have to defer Erick. On Thu, Sep 27, 2012 at 10:20 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi Erick Our application has one CommonsHttpSolrServer for each solr core used by our web app. Whilst we have many web app clients solr only has 1 client, our application. Does that make sense. This is why sticky load balancing is an issue for us. I cannot see any where the state is being handled in the CommonsHttpSolrServer impl ? It looks like the state is not being passed by the client or am i missing something? Cheers Lee c On 27 September 2012 14:00, Erick Erickson erickerick...@gmail.com wrote: But again, why do you want to do this? I really think you don't. I'm assuming that when you say this: ...resulting in the same slave being hit for all requests. you mean all requests _from the same client_. If that's not what's happening, then disregard my maundering because when it comes to setting up LBs, I'm clueless. But I can say that many installations have LBs set up with sticky sessions on a per-client basis.. Consider another scenario; replication. If you have 2 slaves, each with a polling interval of 5 minutes note that they are not coordinated. So slave 1 can poll at 14:00:00. Slave 2 at 14:01:00. Say there's been a commit at 14:00:30. Requests to slave 2 will have a different view of the index than slave 1, so if your user resends the exact same request, they may see different results. I could submit the request 5 times in a row and the results would not only be different each time, they would flip-flop back and forth. I wouldn't do this unless and until you have a demonstrated need. Best Erick On Thu, Sep 27, 2012 at 8:07 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi Erick, the load balancer in front of the solr servers is dropping the cookie not the solr server themselves. are you saying the clients http connection manager builds will ignore this state ? it looks like they do not. It looks like the client is passing the cookie back to the load balancer I want to configure the clients not to pass cookies basically. Does that make sense ? On 27 September 2012 12:54, Erick Erickson erickerick...@gmail.com wrote: What client state? Solr servers are stateless, they don't keep any information specific to particular clients so this doesn't seem to be a problem. What Solr _does_ do is cache things like fq clauses, but these are not user-specific. Which actually argues for going to the same slave on the theory that requests from a user are more likely to have the same fq clauses. Consider faceting on shoes. The user clicks mens and you add an fq like fq=gender:mens. Then the user wants dress shoes so you submit another query fq=gender:mensfq=style:dress. The first fq clause has already been calculated and cached so doesn't have to be re-calculated for the second query... But the stickiness is usually the way Solr is used, so this seems like a red herring. FWIW, Erick On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll lee.a.carr...@googlemail.com wrote: Hi We have the following solr http server bean class=org.apache.solr.client.solrj.impl.CommonsHttpSolrServer id=solrserver constructor-arg value=urlToSlaveLoadBalancer / property name=soTimeout value=1000 / property name=connectionTimeout value=1000 / property name=defaultMaxConnectionsPerHost value=5 / property name=maxTotalConnections value=20 / property name=allowCompression value=true / /bean The issue we face is the f5 balancer is returning a cookie which the client is hanging onto. resulting in the same slave being hit for all requests. one obvious solution is to config the load balancer to be non sticky however politically a non-standard load balancer is timescale suicide. (It is an out sourced corporate thing) I'm not keen to use the LB http solr server as i don't want this to be a concern of the software and have a list of servers etc. (although as a stop gap may well have to) My question is can I configure the solr server to ignore client state ? We are on solr 3.4 Thanks in advance lee c
Re: Change config to use port 8080 instead of port 8983
i just tried this with tomcat and the props work for me. Did you wipe out your zoo_data before starting with the additional system properties? here's how i ran it: JAVA_OPTS=-DzkRun -DnumShards=1 -Djetty.port=8080 -Dbootstrap_conf=true -Dhost=127.0.0.1 bin/catalina.sh run -- Sami Siren On Thu, Sep 27, 2012 at 9:47 PM, JesseBuesking jessebuesk...@gmail.com wrote: I've set the JAVA_OPTS you mentioned (Djetty.port and Dhost), but zookeeper still says that the node runs on port 8983 (clusterstate.json is the same). Would you happen to have any other suggestions that I could try? -- View this message in context: http://lucene.472066.n3.nabble.com/Change-config-to-use-port-8080-instead-of-port-8983-tp4010663p4010805.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 4.0.snapshot to 4.0.beta index migration
Thanks, that's what we decided to do too. -- View this message in context: http://lucene.472066.n3.nabble.com/4-0-snapshot-to-4-0-beta-index-migration-tp4009247p4010828.html Sent from the Solr - User mailing list archive at Nabble.com.
Can SOLR Index UTF-16 Text
Our SOLR setup (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8 files. Recently, however, we noticed that it has issues with indexing certain text files eg. UTF-16 files. See attachment for an example (tarred+zipped) tesla-utf16.txt http://lucene.472066.n3.nabble.com/file/n4010834/tesla-utf16.txt Looking at the text terms, I see 35 terms ie, (1,2,3,...9,0,a,b,c,.z) !! . A UTF-8 version of this file indexes fine. Here's what the index analyzer looks like Are UTF-16 text files supported? Any thoughts ? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Can-SOLR-Index-UTF-16-Text-tp4010834.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtractingRequestHandler causes Out of Memory Error
These are very large files and this is not enough memory. Do you upload these as files? If the CSV file is one document per line, you can split it up. Unix has a 'split' command which does this very nicely. - Original Message - | From: Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp | To: solr-user@lucene.apache.org | Sent: Thursday, September 27, 2012 2:22:06 AM | Subject: ExtractingRequestHandler causes Out of Memory Error | | Hi guys, | | | I use Manifold CF to crawl files in Windows file server and index | them to | Solr using Extracting Request Handler. | Most of the documents are succesfully indexed but some are failed and | Out | of Memory Error occurs in Solr, so I need some advice. | | Those failed files are not so big and they are a csv file of 240MB | and a | text file of 170MB. | | Here is environment and machine spec: | Solr 3.6 (also Solr4.0Beta) | Tomcat 6.0 | CentOS 5.6 | java version 1.6.0_23 | HDD 60GB | MEM 2GB | JVM Heap: -Xmx1024m -Xms1024m | | I feel there is enough memory that Solr should be able to extract and | index | file content. | | Here is a Solr log below: | -- | [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError: | Java heap space | at java.util.Arrays.copyOf(Arrays.java:2882) | at | java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) | at | java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) | at java.lang.StringBuilder.append(StringBuilder.java:189) | at | org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293) | at | org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) | at | org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) | at | org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) | at | org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) | at | org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) | at | org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) | at | org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) | at | org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) | at | org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) | at | org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268) | at | org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134) | at | org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) | at | org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) | at | org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) | at | org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) | at | org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) | at | org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) | at | org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244) | at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) | at | org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) | at | org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) | at | org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) | at | org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) | at | filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) | at | org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) | at | org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) | at | org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) | | - | | Anyone has any ideas? | | Regards, | | Shigeki |
Re: ExtractingRequestHandler causes Out of Memory Error
Please try to increase -Xmx and see how much RAM you need for it to succeed. I believe it is simply a case where this particular file needs double memory (480Mb) to parse and you have only allocated 1Gb (which is not particularly much). Perhaps the code could be optimized to avoid the Arrays.copyOf() call.. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 27. sep. 2012 kl. 11:22 skrev Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp: Hi guys, I use Manifold CF to crawl files in Windows file server and index them to Solr using Extracting Request Handler. Most of the documents are succesfully indexed but some are failed and Out of Memory Error occurs in Solr, so I need some advice. Those failed files are not so big and they are a csv file of 240MB and a text file of 170MB. Here is environment and machine spec: Solr 3.6 (also Solr4.0Beta) Tomcat 6.0 CentOS 5.6 java version 1.6.0_23 HDD 60GB MEM 2GB JVM Heap: -Xmx1024m -Xms1024m I feel there is enough memory that Solr should be able to extract and index file content. Here is a Solr log below: -- [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) at java.lang.StringBuilder.append(StringBuilder.java:189) at org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) - Anyone has any ideas? Regards, Shigeki
Filter query not null or in list
Hi everyone, I have a group field which restricts the permission for each user. A user can belong to multiple groups. A document can belong to only Group (ie) non multi valued. There are some documents which are unrestricted, hence group id is null. How can I use the filter for a given user so that it includes results from both Group=NULL and Group=(X or Y or Z) ? I try something like this, but doesnt work: -Group:[* TO *] OR Group:(X OR Y OR Z) Note that the Group is a UUID field. Is it possible to assign a default UUID value ? Any help is much appreciated. Thanks Kiran
Re: Filter query not null or in list
Add a *:* before the negative query. (*:* -Group:[* TO *]) OR Group:(X OR Y OR Z) -- Jack Krupansky -Original Message- From: Kiran J Sent: Thursday, September 27, 2012 8:07 PM To: solr-user@lucene.apache.org Subject: Filter query not null or in list Hi everyone, I have a group field which restricts the permission for each user. A user can belong to multiple groups. A document can belong to only Group (ie) non multi valued. There are some documents which are unrestricted, hence group id is null. How can I use the filter for a given user so that it includes results from both Group=NULL and Group=(X or Y or Z) ? I try something like this, but doesnt work: -Group:[* TO *] OR Group:(X OR Y OR Z) Note that the Group is a UUID field. Is it possible to assign a default UUID value ? Any help is much appreciated. Thanks Kiran
RE: File content indexing
Hi Erik, I really meant to send this message earlier, I read code and tested, your suggestion solved my problem, really appreciate! Thanks very much for helps, Lisheng -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Tuesday, September 18, 2012 5:04 PM To: solr-user@lucene.apache.org Subject: Re: File content indexing Solr Cell can already do this. See the stream.file parameter and content steam info on the wiki. Erik On Sep 18, 2012, at 19:56, Zhang, Lisheng lisheng.zh...@broadvision.com wrote: Hi, Sorry I just sent out an unfinished message! Reading Solr cell, we indexing a file by first upload it through HTTP to solr, in my experience it is rather expensive to pass a big file through HTTP. If the file is local, maybe the better way is to pass file path to solr so that solr can use java.io API to get file content, maybe this can be much faster? I am thinking to change solr a little to do, do you think this is a sensible thing to do (I know how to do, but not sure it can improve performance significantly)? Thanks very much for helps, Lisheng
Re: ExtractingRequestHandler causes Out of Memory Error
Hi Jan. Thank you very much for your advice. So I understand Solr needs more memory to parse the files. To parse a file of size x, it needs double memory (2x). Then how much memory allocation should be taken to heap size? 8x? 16x? Regards, Shigeki 2012/9/28 Jan Høydahl jan@cominvent.com Please try to increase -Xmx and see how much RAM you need for it to succeed. I believe it is simply a case where this particular file needs double memory (480Mb) to parse and you have only allocated 1Gb (which is not particularly much). Perhaps the code could be optimized to avoid the Arrays.copyOf() call.. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 27. sep. 2012 kl. 11:22 skrev Shigeki Kobayashi shigeki.kobayas...@g.softbank.co.jp: Hi guys, I use Manifold CF to crawl files in Windows file server and index them to Solr using Extracting Request Handler. Most of the documents are succesfully indexed but some are failed and Out of Memory Error occurs in Solr, so I need some advice. Those failed files are not so big and they are a csv file of 240MB and a text file of 170MB. Here is environment and machine spec: Solr 3.6 (also Solr4.0Beta) Tomcat 6.0 CentOS 5.6 java version 1.6.0_23 HDD 60GB MEM 2GB JVM Heap: -Xmx1024m -Xms1024m I feel there is enough memory that Solr should be able to extract and index file content. Here is a Solr log below: -- [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515) at java.lang.StringBuilder.append(StringBuilder.java:189) at org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268) at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) - Anyone has any ideas? Regards, Shigeki
Re: Getting the distribution information of scores from query
Thanks! That did the trick! Although it required some more work in the component level of generating the same query key as the index searcher else when you go to try and fetch scores for a cached query result, I got a lot of NPE since the stats are computed in the collector level which for me isn't set since the cache hit bypasses the lucene level. I'll write up what I did and probably try and open source the work for others to see. The stuff with PostFiltering is nice but needs some examples and documentation.. hopefully mine will help the cause. Thanks again Amit On Wed, Sep 26, 2012 at 5:13 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: I suggest to create a component, put it after QueryComponent. in prepare it should add own PostFilter into list of request filters, your post filter will be able to inject own DelegatingCollector, then you can just add collected histogram into result named list http://searchhub.org/dev/2012/02/10/advanced-filter-caching-in-solr/ On Tue, Sep 25, 2012 at 10:03 PM, Amit Nithian anith...@gmail.com wrote: We have a federated search product that issues multiple parallel queries to solr cores and fetches the results and blends them. The approach we were investigating was taking the scores, normalizing them based on some distribution (normal distribution seems reasonable) and use that z score as the way to blend the results (else you'll be blending scores on different scales). To accomplish this, I was looking to get the distribution of the scores for the query as an analog to the stats component but seem to see the only way to accomplish this would be to create a custom collector that would accumulate and store this information (mean, std-dev etc) since the stats component only operates on indexed fields. Is there an easy way to tell Solr to use a custom collector without having to modify the SolrIndexSearcher class? Maybe is there an alternative way to get this information? Thanks Amit -- Sincerely yours Mikhail Khludnev Tech Lead Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com