Re: search filter
Looks I am getting exception as below May 22, 2013 10:52:11 PM org.apache.solr.common.SolrException log SEVERE: java.lang.NumberFormatException: For input string: [3 TO 9] OR salary:0 at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:438) at java.lang.Long.parseLong(Long.java:478) Regards kamal On Thu, May 23, 2013 at 11:19 AM, Kamal Palei palei.ka...@gmail.com wrote: HI Rafał Kuć I tried fq=Salary:[5+TO+10]+OR+Salary:0 and as well as fq=Salary:[5 TO 10] OR Salary:0 both, both the cases I retrieved 0 results. I use drupal along with solr, my code looks as below. * if($include_0_salary == 1) { $conditions['fq'][0] = 'salary:[' . $min_ctc . '+TO+' . $max_ctc . ']+OR+salary:0'; } else { $conditions['fq'][0] = 'salary:[' . $min_ctc . ' TO ' . $max_ctc . ']'; } $conditions['fq'][1] = 'experience:[' . $min_exp . ' TO ' . $max_exp . ']'; $results = apachesolr_search_search_execute($keys, $conditions); * Looks when iclude_0_salary is false, I am getting results as expected. If iclude_0_salary is true, I get 0 results, that means for me *$conditions['fq'][0]= salary:[5 TO 10] OR salary:0* did not work. Can somebody help me what the wrong I am doing here... Best regards kamal On Wed, May 22, 2013 at 7:00 PM, Rafał Kuć r@solr.pl wrote: Hello! You can try sending a filter like this fq=Salary:[5+TO+10]+OR+Salary:0 It should work -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Dear All Can I write a search filter for a field having a value in a range or a specific value. Say if I want to have a filter like 1. Select profiles with salary 5 to 10 or Salary 0. So I expect profiles having salary either 0 , 5, 6, 7, 8, 9, 10 etc. It should be possible, can somebody help me with syntax of 'fq' filter please. Best Regards kamal
OPENNLP current patch compiling problem for 4.x branch
Hi, I checked out from here http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0 and downloaded the latest patch LUCENE-2899-current.patch. Applied the patch ok but when I did 'ant compile' I got the following error: == [javac] /home/lucene_solr_4_3_0/lucene/analysis/opennlp/src/java/org/apache/lucene/a nalysis/opennlp/FilterPayloadsFilter.java:43: error r: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] 1 error == Compiled it on trunk without problem. Is this patch supposed to work for 4.X? Regards, Patrick
Re: Solr french search optimisation
Hello again, Is any one could help me, plase David Le 22/05/2013 18:09, It-forum a écrit : Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David
Re: Solr french search optimisation
Hello, I think you're confusing three different things: 1) schema and fields definition is for precision/recall: treating differently a field means different search results and results ranking 2) the pomppe a chaler problem is more a spellchecking problem http://wiki.apache.org/solr/SpellCheckComponent 3) solère and solaire is a phonetic search problem http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory Hope this helps a little, cristian 2013/5/23 It-forum it-fo...@meseo.fr Hello again, Is any one could help me, plase David Le 22/05/2013 18:09, It-forum a écrit : Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David
Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery
Consider the following: Solr 4.3, 2 node test cluster, each is a leader. During (or immediately after, before hard commit) indexing I shutdown one of them and restart later. The tlog is about 200Mb size. I see recurring 'Reordered DBQs detected' in the log, seems like an endless loop because THE VERY SAME update query appears thousands of times, runs for a long time now. In the meanwhile, the node is inaccessible (obviously) but in the Zk state it appears as active, NOT in recovery mode or down. It seems that this is caused by a recent changed in ZkController which adds recovery logic into 'register' routine. Regards, Alexey -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to query docs with an indexed polygon field in java?
Hi Kevenz, kevenz wrote ... String sql = indexType:219 AND geo:Contains(POINT(114.078327401257,22.5424866754136)); ... Then I got an error at java.lang.IllegalArgumentException: missing parens: Contains. Is there any suggestion? First of all, if your query shape is a point, then use Intersects, which semantically equivalent but works much faster. One error in your query is that your quotes look messed up. Another is that you used a comma to separate the X and Y when you should use a space (because you are using WKT syntax via POINT). Try this: indexType:219 AND geo:Contains(POINT(114.078327401257 22.5424866754136)) This will also work using lat comma lon non-WKT syntax: indexType:219 AND geo:Contains(22.5424866754136, 114.078327401257) Disclaimer: I didn't run these, I just typed them in the email. ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-query-docs-with-an-indexed-polygon-field-in-java-tp4065512p4065550.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery
a small change: it's not an endless loop, but a painfully slow processing which includes running a delete query and then insertion. Each document from the tlog takes tens of seconds to process (more than 100 times slower than during normal insertion process) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4065551.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr french search optimisation
You can also think about using a SynonymFilter if you can list the misspelled words. That's a quick and dirty solution. But it's easier to add a pomppe - pompe in a synonym list than tuning a phonetic filter. NB: an indexation is required whenever the synonyms file change Franck Brisbart Le jeudi 23 mai 2013 à 08:59 +0200, Cristian Cascetta a écrit : Hello, I think you're confusing three different things: 1) schema and fields definition is for precision/recall: treating differently a field means different search results and results ranking 2) the pomppe a chaler problem is more a spellchecking problem http://wiki.apache.org/solr/SpellCheckComponent 3) solère and solaire is a phonetic search problem http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory Hope this helps a little, cristian 2013/5/23 It-forum it-fo...@meseo.fr Hello again, Is any one could help me, plase David Le 22/05/2013 18:09, It-forum a écrit : Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David
Re: [ANNOUNCE] Web Crawler
Hi, Release 3.0.3 was tested with : * Oracle Java 6 but should work fine with version 7 * Tomcat 5.5 and 6 and 7 * PHP 5.2.x and 5.3.x * Apache 2.2.x * MongoDB 64 bits 2.2 (know issue with 2.4) The new release 4.0.0-alpha-2 is available under Github - https://github.com/bejean/crawl-anywhere The pre-requisites are : Oracle Java 6 or Tomcat 5.5 or Apache 2.2 or PHP 5.2.x or 5.3.x or 5.4.x MongoDB 64 bits 2.2 or Solr 3.x or (configuration files provided for Solr 4.3.0) And the up to date installation instructions are here http://www.crawl-anywhere.com/installation-v400/ Please read the Github project home page, all information are provided. Regards. Dominique Le 23/05/13 07:38, Rajesh Nikam a écrit : Hi, crawl anywhere seems to using old versions of java, tomcat, etc. http://www.crawl-anywhere.com/installation-v300/ Will it work with new versions of these required software ? Is there updated installation guide available ? Thanks Rajesh On Wed, May 22, 2013 at 6:48 PM, Dominique Bejean dominique.bej...@eolya.fr mailto:dominique.bej...@eolya.fr wrote: Hi, Crawl-Anywhere is now open-source - https://github.com/bejean/crawl-anywhere Best regards. Le 02/03/11 10:02, findbestopensource a écrit : Hello Dominique Bejean, Good job. We identified almost 8 open source web crawlers http://www.findbestopensource.com/tagged/webcrawler I don't know how far yours would be different from the rest. Your license states that it is not open source but it is free for personnel use. Regards Aditya www.findbestopensource.com http://www.findbestopensource.com http://www.findbestopensource.com On Wed, Mar 2, 2011 at 5:55 AM, Dominique Bejean dominique.bej...@eolya.fr mailto:dominique.bej...@eolya.fr mailto:dominique.bej...@eolya.fr mailto:dominique.bej...@eolya.fr wrote: Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a lot of possible parameters (no all mandatory) : * number of simultaneous items crawled by site * recrawl period rules based on item type (html, PDF, …) * item type inclusion / exclusion rules * item path inclusion / exclusion / strategy rules * max depth * web site authentication * language * country * tags * collections * ... The pileline includes various ready to use stages (text extraction, language detection, Solr ready to index xml writer, ...). All is very configurable and extendible either by scripting or java coding. With scripting technology, you can help the crawler to handle javascript links or help the pipeline to extract relevant title and cleanup the html pages (remove menus, header, footers, ..) With java coding, you can develop your own pipeline stage stage The Crawl Anywhere web site provides good explanations and screen shots. All is documented in a wiki. The current version is 1.1.4. You can download and try it out from here : www.crawl-anywhere.com http://www.crawl-anywhere.com http://www.crawl-anywhere.com Regards Dominique -- Dominique Béjean +33 6 08 46 12 43 skype: dbejean www.eolya.fr http://www.eolya.fr www.crawl-anywhere.com http://www.crawl-anywhere.com www.mysolrserver.com http://www.mysolrserver.com -- Dominique Béjean +33 6 08 46 12 43 skype: dbejean www.eolya.fr www.crawl-anywhere.com
Distributed query: strange behavior.
Hello, guys! I'm running Solr 4.3.0 and I've notice an strange behavior during distributed queries execution. Currently I have three Solr servers as shards and I when I do the following query... http://localhost:11080/twitter/data/select?q=*:**rows=10* shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=jsonhttp://localhost:11080/twitter/data/select?q=*:*rows=10sort=docIndexDate%20descshards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=json *Numfound* = 47131 I've query each Solr shard server one by one and the total number of documents is correct. However, when I change rows parameter from 10 to 100 the total numFound of documents change: http://localhost:11080/twitter/data/select?q=*:**rows=100* shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=jsonhttp://localhost:11080/twitter/data/select?q=*:*rows=10sort=docIndexDate%20descshards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=json *Numfound* = 47124 And if i set rows=50 again the numFound count changes: http://localhost:11080/twitter/data/select?q=*:*rows=50shards=localhost:11080/twitter/data,localhost:12080/twitter/data,localhost:13080/twitter/datawt=json *Numfound* = 47129 What's happening here? Anybody knows? It's a distributed search bug or something? Thank you very much in advance! Best regards, -- - Luis Cappa
Re: Boosting Documents
Oh thank you Chris, this is much clearer, and thank you for updating the Wiki too. On 05/22/2013 08:29 PM, Chris Hostetter wrote: : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for : any fields where the index-time boost should be stored. : : In my case where I only need to boost the whole document (not a specific : field), do I have to activate the omitNorms=false for all the fields : in the schema ? docBoost is really just syntactic sugar for a field boost on each field i the document -- it's factored into the norm value for each field in the document. (I'll update the wiki to make this more clear) If you do a query that doesn't utilize any field which has norms, then the docBoost you specified when indexing the document never comes into play. In general, doc boosts and field boosts, and the way they come into play as part of the field norm is fairly inflexible, and (in my opinion) antiquated. A much better way of dealing with this type of problem is also discussed in the section of the wiki you linked to. Imeediately below... http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts ...you'll find... http://wiki.apache.org/solr/SolrRelevancyFAQ#Field_Based_Boosting -Hoss
Re: Solr french search optimisation
Hello, Tx Cristian for your details. I totally agreed with your explanation, this is 2 differents aspect which I need to solve. Could you clarify few more thinks : - SpellchekComponent and Phonetic, should be use while indexing or only while querying ? - Does spellcheck component return only the right spelling, or is it used to search into result? - If i want to solve Spelling, Phonetic, stemming problem in french language. Can I use only one field or should I use several with different filters ? Regards David Le 23/05/2013 08:59, Cristian Cascetta a écrit : Hello, I think you're confusing three different things: 1) schema and fields definition is for precision/recall: treating differently a field means different search results and results ranking 2) the pomppe a chaler problem is more a spellchecking problem http://wiki.apache.org/solr/SpellCheckComponent 3) solère and solaire is a phonetic search problem http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PhoneticFilterFactory Hope this helps a little, cristian 2013/5/23 It-forum it-fo...@meseo.fr Hello again, Is any one could help me, plase David Le 22/05/2013 18:09, It-forum a écrit : Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.**MappingCharFilterFactory mapping=mapping-**ISOLatin1Accent.txt/ tokenizer class=solr.** WhitespaceTokenizerFactory/ filter class=solr.**WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.**LowerCaseFilterFactory/ filter class=solr.**SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David
Re: Facet pivot 50.000.000 different values
In case anyone is interested, I solved my problem using the grouping feature: *query* -- filter query (if any) *field* -- field that you want to count (in my case field B) SolrQuery solrQuery = new SolrQuery(query); solrQuery.add(group, true); solrQuery.add(group.field, B); // Group by the field solrQuery.add(group.ngroups, true); solrQuery.setRows(0); And in the response *getNGroups()* will give you the total number of distinct values (total number of B distinct values) Cheers, Carlos. 2013/5/18 Carlos Bonilla carlosbonill...@gmail.com Hi Mikhail, yes the thing is that I need to take into account different queries and that's why I can't use the Terms Component. Cheers. 2013/5/17 Mikhail Khludnev mkhlud...@griddynamics.com On Fri, May 17, 2013 at 12:47 PM, Carlos Bonilla carlosbonill...@gmail.comwrote: We only need to calculate how many different B values have more than 1 document but it takes ages Carlos, It's not clear whether you need to take results of a query into account or just gather statistics from index. if later you can just enumerate terms and watch into TermsEnum.docFreq() . Am I getting it right? -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Solr french search optimisation
Could you clarify few more thinks : - SpellchekComponent and Phonetic, should be use while indexing or only while querying ? SpellCheck: you can define a specific field for spellchecking (in this sense it's a query/schema time) or you can create a specific vocabulary for spell-checking. I strongly suggest to go through documentation http://wiki.apache.org/solr/SpellCheckComponent for this component, every time I used it I've had the need to customize and adapt configuration. - Does spellcheck component return only the right spelling, or is it used to search into result? I'm not sure, please check the documentation, but I remeber that you can configure it to directly re-execute the spell-corrected query AND show some alternatives/suggestions to the user (obviously this is a display/frontend choice) - If i want to solve Spelling, Phonetic, stemming problem in french language. Can I use only one field or should I use several with different filters ? I don't think it's possible to use only one field, in my experience I can suggest you to use multiple fields for multiple scopes, if you're scared by the index-size remember that fields that are indexed and NOT stored don't grow your index so much. Set as stored only fields you need to display to end-user.
Re: Solr Faceting doesn't return values.
*str name=msgorg.apache.solr.search.SyntaxError: Cannot parse '*mm_state_code:(**TX)*': Encountered : : at line 1, column 14. Was expecting one of:* This suggests to me that you kept the df parameter in the query hence it was forming mm_state_code:mm_state_code:(TX), can you try exactly they way I gave you - i.e. without the df parameter? Also, can you post schema.xml and /select handler config from solrconfig.xml? On 22 May 2013 18:36, samabhiK qed...@gmail.com wrote: When I use your query, I get : ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status400/int int name=QTime12/int lst name=params str name=facettrue/str str name=dfmm_state_code/str str name=indenttrue/str str name=q*mm_state_code:(**TX)*/str str name=_1369244078714/str str name=debugall/str str name=facet.fieldsa_site_city/str str name=wtxml/str /lst /lst lst name=error str name=msgorg.apache.solr.search.SyntaxError: Cannot parse '*mm_state_code:(**TX)*': Encountered : : at line 1, column 14. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... LPARAMS ... NUMBER ... /str int name=code400/int /lst /response Not sure why the data wont show up. Almost all the records has the field sa_site_city has data and is also indexed. :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.2 - Fuzzy Search Problems
Thanks Chris , for my 2nd Query (~1 returns words with 2 editing distance), it may be the issue. still m looking for my last issue. hope jira helps to resolve that. Chris Hostetter-3 wrote : : 2) although I set editing distance to 1 in my query (e.g. worde~1), solr : returns me results having 2 editing distance (like WORDOES, WORHEE, WORKEE, : .. ect. ) fuzzy search works on *terms* in your index -- if you use a stemme when you index your data (your schema shows that you are) then a word in your input like WORDOES might wind up in your index as a term within the edit distance you specified (ie: wordo or word or something similar) : 3) Last and major issue, I had very few data at startup in my solr core (say : around 1K - 2K ), at that time, when i was searching with worde~1 , it was : returning many records (around 450). : : Then I ingested few more records in my solr core (say around 1K). It was : ingested successfully , no errors or warning in Log. After that when I : performed the same fuzzy search (worde~1) on previous records only, not in : new ingested records , It did not return me previous results(around 450) as : well, and return total 1 record only having highlight as WORD!N . This sounds like the same issue as discribed in SOLR-4824... https://issues.apache.org/jira/browse/SOLR-4824 -Hoss -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-2-Fuzzy-Search-Problems-tp4063199p4065576.html Sent from the Solr - User mailing list archive at Nabble.com.
index multiple files into one index entity
Hello solr team, I want to index multiple fields into one solr index entity, with the same id. We are using solr 4.1 I try it with following source fragment: public void addContentSet(ContentSet contentSet) throws SearchProviderException { ... ContentStreamUpdateRequest csur = generateCSURequest(contentSet.getIndexId(), contentSet); String indexId = contentSet.getIndexId(); ConcurrentUpdateSolrServer server = serverPool.getUpdateServer(indexId); server.request(csur); ... } private ContentStreamUpdateRequest generateCSURequest(String indexId, ContentSet contentSet) throws IOException { ContentStreamUpdateRequest csur = new ContentStreamUpdateRequest(confStore.getExtractUrl()); ModifiableSolrParams parameters = csur.getParams(); if (parameters == null) { parameters = new ModifiableSolrParams(); } parameters.set(literalsOverride, false); // maps the tika default content attribute to the Attribute with name 'fulltext' parameters.set(fmap.content, SearchSystemAttributeDef.FULLTEXT.getName()); // create an empty content stream, this seams necessary for ContentStreamUpdateRequest csur.addContentStream(new ImaContentStream()); for (Content content : contentSet.getContentList()) { csur.addContentStream(new ImaContentStream(content)); // for each content stream add additional attributes parameters.add(literal. + SearchSystemAttributeDef.CONTENT_ID.getName(), content.getBinaryObjectId().toString()); parameters.add(literal. + SearchSystemAttributeDef.CONTENT_KEY.getName(), content.getContentKey()); parameters.add(literal. + SearchSystemAttributeDef.FILE_NAME.getName(), content.getContentName()); parameters.add(literal. + SearchSystemAttributeDef.MIME_TYPE.getName(), content.getMimeType()); } parameters.set(literal.id , indexId); // adding some other attributes ... csur.setParams(parameters); return csur; } During debugging I can see that the method 'server.request(csur)' read for each ImaContentStream the buffer. When I'm looking on solr catalina log I see that the attached files reach the solr servlet. INFO: Releasing directory:/data/V-4-1/master0/data/index Apr 25, 2013 5:48:07 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [master0] webapp=/solr-4-1 path=/update/extract params={literal.searchconnectortest15_c8150e41_cc49_4a .. literal.id=26afa5dc-40ad-442a-ac79-0e7880c06aa1 . {add=[26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910940958720), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910971367424), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910976610304), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910983950336), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910989193216), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910995484672)]} 0 58 But only the latest in the content list will be indexed. My schema.xml has the following field definitions: field name=id type=string indexed=true stored=true required=true / field name=content type=text_general indexed=false stored=true multiValued=true/ field name=contentkey type=string indexed=true stored=true multiValued=true/ field name=contentid type=string indexed=true stored=true multiValued=true/ field name=contentfilename type=string indexed=true stored=true multiValued=true/ field name=contentmimetype type=string indexed=true stored=true multiValued=true/ field name=fulltext type=text_general indexed=true stored=true multiValued=true/ I'm using the tika ExtractingRequestHandler which can extract binary files. requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler Is it possible to index multiple files with the same id? It is necessary to implement my own RequestHandler? With best regards Mark
Solr DIH - Small index still take time?
Hi, This is the situation, I have two sources of data in my dataimport handler, one is huge, the other is tiny: Source A: 10-20 records Source B: 50,000,000 records I was wondering what happens if I was to do a DIH just on Source A every 10 mins, and only run the DIH on source B every 24 hours. Would running my DIH on Source A be extremely quick, because the data we are importing is small, or would it still be time consuming, because it would have to rebuild the index of the entire SOLR (i.e 50,000,010 records). Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-DIH-Small-index-still-take-time-tp4065582.html Sent from the Solr - User mailing list archive at Nabble.com.
Question about Coordination factor
Hello Folks, Sorry, my last email was a bit messy, so I am sending it again. I have a question about coordination factor to ensure my understanding of this value is correct. If I have documents that contain some keywords like the following: Doc1: A, B, C Doc2: A, C Doc3: B, C And my query is A OR B OR C OR D. In this case, Coord factor value for each documents will be the following: Doc1: 3/4 Doc2: 2/4 Doc3: 2/4 In the same fashion, respective value of coord factor is the following if I have a query C OR D: Doc1: 1/2 Doc2: 1/2 Doc3: 1/2 Is this correct? or Did I miss something? Please correct me if I am wrong. Regards, Kazuaki
Re: Question about Coordination factor
This looks correct. On Thu, May 23, 2013 at 7:37 AM, Kazuaki Hiraga kazuaki.hir...@gmail.comwrote: Hello Folks, Sorry, my last email was a bit messy, so I am sending it again. I have a question about coordination factor to ensure my understanding of this value is correct. If I have documents that contain some keywords like the following: Doc1: A, B, C Doc2: A, C Doc3: B, C And my query is A OR B OR C OR D. In this case, Coord factor value for each documents will be the following: Doc1: 3/4 Doc2: 2/4 Doc3: 2/4 In the same fashion, respective value of coord factor is the following if I have a query C OR D: Doc1: 1/2 Doc2: 1/2 Doc3: 1/2 Is this correct? or Did I miss something? Please correct me if I am wrong. Regards, Kazuaki -- Anshum Gupta http://www.anshumgupta.net
Re: Can anyone explain this Solr query behavior?
Please post the results of adding debug=query to the URL. That'll tell us what the query parser spits out which is much easier to analyze. Best Erick On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju shan...@ebrary.com wrote: This query returns 0 documents: *q=(+Title:() +Classification:() +Contributors:() +text:())* This returns 1 document: *q=doc-id:3000* And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 AND (+Title:() +Classification:() +Contributors:() +text:())* Am I missing something here? Can someone please explain? I am using Solr 4.2.1 Thanks -Shankar
Re: fq facet on double and non-indexed field
bq: So cant we do fq on non-indexed field No. By definition the fq clause is a search and you can only search on indexed fields. Best Erick On Wed, May 22, 2013 at 5:08 PM, gpssolr2020 psgoms...@gmail.com wrote: Hi i am trying to apply filtering on non-indexed double field .But its not returning any results. So cant we do fq on non-indexed field? can not use FieldCache on a field which is neither indexed nor has doc values: EXCH_RT_AMT /str int name=code400/int We are using Solr4.2. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/fq-facet-on-double-and-non-indexed-field-tp4065457.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Approach to apply full index from master to slaves?
What's your max warming searcher value? About warming queries, that may be _adding_ to your problem. I'd first try removing many of them, especially if you have your cache autowarm settings very high, try 16 or so. Autowarming is all about pre-loading the caches etc, but you reach diminishing returns pretty quickly. And what are all the threads doing? Best Erick On Wed, May 22, 2013 at 11:14 PM, William Bell billnb...@gmail.com wrote: We have a 3GB index. We index on the master and then replicate to the slaves. But the issue is that after the slaves switch over - we get deadlocking, # of threads increase to 500, and most times the SOLR instance just plain locks up. We tried adding a bunch of warming queries, but we still have a major performance hit and same issues. Are there any other tweaks and recommendations? Are others experiencing this? -- Bill Bell billnb...@gmail.com cell 720-256-8076
hook to know when a DOC is committed.
I need to know when a document is committed in SOLR - i.e. is searchable. Is there anyone who has a solution on how to do this. I'm aware of three methods to create hooks for knowing when a doc is added or a commit is performed, but the doc(id) does not seem to be included for the commit-hooks (naturally I guess): A. subclass DirectUpdateHandler2 and override commit and/or addDoc B. subclass UpdateRequestProcessor (and include it in the update-chain) and override processAdd and/or processCommit C. implement SolrEventListener and implement postCommit and/or postSoftCommit The use-case is to let other parts of a system know that a document is searchable without having to create a poller which has to have state on when/how it polls. Any ideas or tricks out there? Fredrik -- Fredrik Rødland Mail:fredrik.rodl...@finn.no FINN.no Cell:+47 99 21 98 17 Twitter: @fredrikr Oslo, NORWAY Web: http://about.me/fmr
Re: search filter
On 23 May 2013 11:19, Kamal Palei palei.ka...@gmail.com wrote: HI Rafał Kuć I tried fq=Salary:[5+TO+10]+OR+Salary:0 and as well as fq=Salary:[5 TO 10] OR Salary:0 both, both the cases I retrieved 0 results. [...] Please try the suggested filter query from the Solr admin. interface, or by typing it directly into the browser URL bar. My guess is that there is still some issue with your Drupal/Solr integration. Regards, Gora
Re: OPENNLP current patch compiling problem for 4.x branch
by definition, there is no LUCENE_44 constant in a 4.3 distro! Just change it to LUCENE_43 (or whatever you find in the Version class that suits your needs) or try this on a 4.x checkout. Best Erick On Thu, May 23, 2013 at 2:08 AM, Patrick Mi patrick...@touchpointgroup.com wrote: Hi, I checked out from here http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0 and downloaded the latest patch LUCENE-2899-current.patch. Applied the patch ok but when I did 'ant compile' I got the following error: == [javac] /home/lucene_solr_4_3_0/lucene/analysis/opennlp/src/java/org/apache/lucene/a nalysis/opennlp/FilterPayloadsFilter.java:43: error r: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] 1 error == Compiled it on trunk without problem. Is this patch supposed to work for 4.X? Regards, Patrick
Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery
Tangential to the issue you raise is that this is a huge tlog. It indicates that you aren't doing a hard commit (openSearcher=false) very often. That operation will truncate your tlog which should speed recovery/startup. You're also chewing up some memory with a tlog that size since pointers to the tlog are kept for each document. This comment doesn't address your comment about the change to ZkController, I'll leave that to someone who knows the code. Best Erick On Thu, May 23, 2013 at 3:14 AM, AlexeyK lex.kudi...@gmail.com wrote: a small change: it's not an endless loop, but a painfully slow processing which includes running a delete query and then insertion. Each document from the tlog takes tens of seconds to process (more than 100 times slower than during normal insertion process) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4065551.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: hook to know when a DOC is committed.
A poller really is the most sensible, practical, and easiest route to go. If you add the versions=true parameter to your update request and have the transaction log enabled the update response will have the version numbers for each document id, then the poller can also tell if an update has been committed as well. Also, with soft commit, documents should be visible must more rapidly. Do you have some other, unmentioned requirement that you feel is biasing you against a sensible poller? Clue us in as to the nature of such a requirement. -- Jack Krupansky -Original Message- From: Fredrik Rødland Sent: Thursday, May 23, 2013 7:53 AM To: solr-user@lucene.apache.org Subject: hook to know when a DOC is committed. I need to know when a document is committed in SOLR - i.e. is searchable. Is there anyone who has a solution on how to do this. I'm aware of three methods to create hooks for knowing when a doc is added or a commit is performed, but the doc(id) does not seem to be included for the commit-hooks (naturally I guess): A. subclass DirectUpdateHandler2 and override commit and/or addDoc B. subclass UpdateRequestProcessor (and include it in the update-chain) and override processAdd and/or processCommit C. implement SolrEventListener and implement postCommit and/or postSoftCommit The use-case is to let other parts of a system know that a document is searchable without having to create a poller which has to have state on when/how it polls. Any ideas or tricks out there? Fredrik -- Fredrik Rødland Mail:fredrik.rodl...@finn.no FINN.no Cell:+47 99 21 98 17 Twitter: @fredrikr Oslo, NORWAY Web: http://about.me/fmr
Re: Solr DIH - Small index still take time?
That should work. Just watch out for (set value of) preImportDeleteQuery. Otherwise, when you do full import you may accidentally delete items from the other set. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 23, 2013 at 6:25 AM, Spadez james_will...@hotmail.com wrote: Hi, This is the situation, I have two sources of data in my dataimport handler, one is huge, the other is tiny: Source A: 10-20 records Source B: 50,000,000 records I was wondering what happens if I was to do a DIH just on Source A every 10 mins, and only run the DIH on source B every 24 hours. Would running my DIH on Source A be extremely quick, because the data we are importing is small, or would it still be time consuming, because it would have to rebuild the index of the entire SOLR (i.e 50,000,010 records). Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-DIH-Small-index-still-take-time-tp4065582.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index multiple files into one index entity
I just skimmed your post, but I'm responding to the last bit. If you have uniqueKey defined as id in schema.xml then no, you cannot have multiple documents with the same ID. Whenever a new doc comes in it replaces the old doc with that ID. You can remove the uniqueKey definition and do what you want, but there are very few Solr installations with no uniqueKey and it's probably a better idea to make your id's truly unique. Best Erick On Thu, May 23, 2013 at 6:14 AM, mark.ka...@t-systems.com wrote: Hello solr team, I want to index multiple fields into one solr index entity, with the same id. We are using solr 4.1 I try it with following source fragment: public void addContentSet(ContentSet contentSet) throws SearchProviderException { ... ContentStreamUpdateRequest csur = generateCSURequest(contentSet.getIndexId(), contentSet); String indexId = contentSet.getIndexId(); ConcurrentUpdateSolrServer server = serverPool.getUpdateServer(indexId); server.request(csur); ... } private ContentStreamUpdateRequest generateCSURequest(String indexId, ContentSet contentSet) throws IOException { ContentStreamUpdateRequest csur = new ContentStreamUpdateRequest(confStore.getExtractUrl()); ModifiableSolrParams parameters = csur.getParams(); if (parameters == null) { parameters = new ModifiableSolrParams(); } parameters.set(literalsOverride, false); // maps the tika default content attribute to the Attribute with name 'fulltext' parameters.set(fmap.content, SearchSystemAttributeDef.FULLTEXT.getName()); // create an empty content stream, this seams necessary for ContentStreamUpdateRequest csur.addContentStream(new ImaContentStream()); for (Content content : contentSet.getContentList()) { csur.addContentStream(new ImaContentStream(content)); // for each content stream add additional attributes parameters.add(literal. + SearchSystemAttributeDef.CONTENT_ID.getName(), content.getBinaryObjectId().toString()); parameters.add(literal. + SearchSystemAttributeDef.CONTENT_KEY.getName(), content.getContentKey()); parameters.add(literal. + SearchSystemAttributeDef.FILE_NAME.getName(), content.getContentName()); parameters.add(literal. + SearchSystemAttributeDef.MIME_TYPE.getName(), content.getMimeType()); } parameters.set(literal.id , indexId); // adding some other attributes ... csur.setParams(parameters); return csur; } During debugging I can see that the method 'server.request(csur)' read for each ImaContentStream the buffer. When I'm looking on solr catalina log I see that the attached files reach the solr servlet. INFO: Releasing directory:/data/V-4-1/master0/data/index Apr 25, 2013 5:48:07 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [master0] webapp=/solr-4-1 path=/update/extract params={literal.searchconnectortest15_c8150e41_cc49_4a .. literal.id=26afa5dc-40ad-442a-ac79-0e7880c06aa1 . {add=[26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910940958720), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910971367424), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910976610304), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910983950336), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910989193216), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910995484672)]} 0 58 But only the latest in the content list will be indexed. My schema.xml has the following field definitions: field name=id type=string indexed=true stored=true required=true / field name=content type=text_general indexed=false stored=true multiValued=true/ field name=contentkey type=string indexed=true stored=true multiValued=true/ field name=contentid type=string indexed=true stored=true multiValued=true/ field name=contentfilename type=string indexed=true stored=true multiValued=true/ field name=contentmimetype type=string indexed=true stored=true multiValued=true/ field name=fulltext type=text_general indexed=true stored=true multiValued=true/ I'm using the tika ExtractingRequestHandler which can extract binary files. requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler Is it possible to index multiple files with the same id? It is necessary to implement my own
Re: hook to know when a DOC is committed.
On 23. mai 2013, at 14:05, Jack Krupansky j...@basetechnology.com wrote: Hi Jack, thanks for your answer. A poller really is the most sensible, practical, and easiest route to go. If you add the versions=true parameter to your update request and have the transaction log enabled the update response will have the version numbers for each document id, then the poller can also tell if an update has been committed as well. The poller will still have to retry before advertising a doc as searchable - won't it? Do you have some other, unmentioned requirement that you feel is biasing you against a sensible poller? Clue us in as to the nature of such a requirement. My plan was to link sold with our already established high-volume messaging-system. So each time a document is searchable a message would be broadcasted on a given channel. Our system consist of approx 10 indexes and 8 replications of each of these, so keeping track of all these by pollers would require a whole bunch of logic. Having a pushed-based system would facilitate knowing where when a document is searchable quite a lot. regards, Fredrik -- Fredrik Rødland Mail:fredrik.rodl...@finn.no FINN.no Cell:+47 99 21 98 17 Twitter: @fredrikr Oslo, NORWAY Web: http://about.me/fmr
Re: fq facet on double and non-indexed field
Thanks Erick.. i hope we cant do q also on non-indexed field. Whats is the difference between q and fq other than cache . Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/fq-facet-on-double-and-non-indexed-field-tp4065457p4065604.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.3 fails to load MySQL driver
Hi, in my attempt to migrate for m 3.6.x to 4.3.0 I stumbled upon an issue loading the MySQL driver from the [instance]/lib dir: Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.dataimport.DataImportHandler at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) ... 18 more To narrow it down, I use the plain example configuration with the following changes: - Add a dataimport requestHandler to example/conf/solrconfig.xml (copied from a working solr 3.6.x) - Created example/conf/data-config.xml with dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver ... and SQL statement (both copied from a working solr 3.6.x) - placed the current driver mysql-connector-java-5.1.25-bin.jar in example/lib As to my knowledge the lib dir is included automatically to the path. To make sure I tried to: - add lib dir=./lib / to explicit to solrconf.xml - add absolute path to solrarconf.xml - changed solr.xml to use solr persistent=true sharedLib=lib All to no avail. System Info: - OpenJDK Runtime Environmentm 1.7.0_19 - Solr 4.3.0 - mysql-connector-java-5.1.25-bin.jar The same configuration run fine with a solr 3.6.x on the very same machine. Any help is appreciated! Cheers Chris -- Christian Köhler
Re: hook to know when a DOC is committed.
Yes, by definition, a poller retries. But by picking a sensible default for initial poll and retry (possibly an initial delay tuned to match average update/commit time) couple with a traditional exponential backoff, that should not be a problem at all. In other words, an average request would not require a retry. Even so, do you feel that there is some sort of problem with retry? If so, please state what it is. Again, if you utilize soft commit, the time to commit will be significantly reduced. Or, just go ahead a force a commit on every commit here the delay of a poll request is not acceptable. But I'd recommend the tuned poller. would require a whole bunch of logic - and you think the commit hooks and your push model implementation (on both Solr and client side) will be less logic?!! -- Jack Krupansky -Original Message- From: Fredrik Rødland Sent: Thursday, May 23, 2013 8:18 AM To: solr-user@lucene.apache.org Subject: Re: hook to know when a DOC is committed. On 23. mai 2013, at 14:05, Jack Krupansky j...@basetechnology.com wrote: Hi Jack, thanks for your answer. A poller really is the most sensible, practical, and easiest route to go. If you add the versions=true parameter to your update request and have the transaction log enabled the update response will have the version numbers for each document id, then the poller can also tell if an update has been committed as well. The poller will still have to retry before advertising a doc as searchable - won't it? Do you have some other, unmentioned requirement that you feel is biasing you against a sensible poller? Clue us in as to the nature of such a requirement. My plan was to link sold with our already established high-volume messaging-system. So each time a document is searchable a message would be broadcasted on a given channel. Our system consist of approx 10 indexes and 8 replications of each of these, so keeping track of all these by pollers would require a whole bunch of logic. Having a pushed-based system would facilitate knowing where when a document is searchable quite a lot. regards, Fredrik -- Fredrik Rødland Mail:fredrik.rodl...@finn.no FINN.no Cell:+47 99 21 98 17 Twitter: @fredrikr Oslo, NORWAY Web: http://about.me/fmr
Bug in spellcheck.alternativeTermCount
I was playing around with spellcheck.alternativeTermCount and noticed that if it is set to zero, Solr gives an exception with certain queries. Maybe the value isn't supposed to be zero, but I don't think an exception is the expected behaviour. Rounak
Restaurant availability from database
Hi, I am are building a website that lists restaurant information and I also like to include the availability information. I've created a custom ValueSourceParser and ValueSource that retrieve the availability information from a MySQL database. An example query is as follows. http://localhost:8983/solr/collection1/select?q=restaurant_id:*fl=*,available:availability(2013-05-23, 2, 1700, 2359) This results in a psuedo (boolean) field available per document result and this works as expected. But my problem is that I also need the total number of available restaurants. Is there a way to count the number of available restaurants over the whole result set? I tried the stats component, but it doesn't seem to work with pseudo fields. Thanks in advance, Ronald -- View this message in context: http://lucene.472066.n3.nabble.com/Restaurant-availability-from-database-tp4065609.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.3 fails to load MySQL driver
Check the Solr log on startup - it will explicitly state which lib directories/files will be used. Make sure they agree with where the DIH jars reside. Keep in mind that the directory structure of Solr changed - use the lib from 4.3 solrconfig. Try to use DIH in the standard Solr 4.3 example first. Then mimic that in your customization. -- Jack Krupansky -Original Message- From: Christian Köhler Sent: Thursday, May 23, 2013 8:25 AM To: solr-user@lucene.apache.org Subject: Solr 4.3 fails to load MySQL driver Hi, in my attempt to migrate for m 3.6.x to 4.3.0 I stumbled upon an issue loading the MySQL driver from the [instance]/lib dir: Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.dataimport.DataImportHandler at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) ... 18 more To narrow it down, I use the plain example configuration with the following changes: - Add a dataimport requestHandler to example/conf/solrconfig.xml (copied from a working solr 3.6.x) - Created example/conf/data-config.xml with dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver ... and SQL statement (both copied from a working solr 3.6.x) - placed the current driver mysql-connector-java-5.1.25-bin.jar in example/lib As to my knowledge the lib dir is included automatically to the path. To make sure I tried to: - add lib dir=./lib / to explicit to solrconf.xml - add absolute path to solrarconf.xml - changed solr.xml to use solr persistent=true sharedLib=lib All to no avail. System Info: - OpenJDK Runtime Environmentm 1.7.0_19 - Solr 4.3.0 - mysql-connector-java-5.1.25-bin.jar The same configuration run fine with a solr 3.6.x on the very same machine. Any help is appreciated! Cheers Chris -- Christian Köhler
Shardsplitting
Hi When having a collection with 3 shards en 2 replica's for each shard and I want to split shard1. Does it matter where to start the splitshard command in the cloud or should it be started on the master of that shard? BR, Arkadi
Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery
Huge tlogs seems to be a common problem. Should we make it flush automatically on huge file size? Could be configurable on the updateLog tag? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 23. mai 2013 kl. 14:03 skrev Erick Erickson erickerick...@gmail.com: Tangential to the issue you raise is that this is a huge tlog. It indicates that you aren't doing a hard commit (openSearcher=false) very often. That operation will truncate your tlog which should speed recovery/startup. You're also chewing up some memory with a tlog that size since pointers to the tlog are kept for each document. This comment doesn't address your comment about the change to ZkController, I'll leave that to someone who knows the code. Best Erick On Thu, May 23, 2013 at 3:14 AM, AlexeyK lex.kudi...@gmail.com wrote: a small change: it's not an endless loop, but a painfully slow processing which includes running a delete query and then insertion. Each document from the tlog takes tens of seconds to process (more than 100 times slower than during normal insertion process) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4065551.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regular expression in solr
Regex expressions work on individual terms. Positional information is irrelevant when it comes to regex matching - it's not matching across terms*. The syntax allowed is documented here https://lucene.apache.org/core/4_3_0/core/org/apache/lucene/util/automaton/RegExp.html - it's not quite the full standard syntax. ^ and $ aren't mentioned there. The beginning of the regex implicitly starts at the beginning of the term. So whatever constitutes a term is the granularity of what matches. string fields operate on the entire string. A text field that is analyzed will regex match on the individual terms that emerge from the index-time analysis process. Erik * Though with the surround query parser you can do proximity matching using wildcarded terms in sophisticated ways. On May 22, 2013, at 16:42 , Lance Norskog wrote: If the indexed data includes positions, it should be possible to implement ^ and $ as the first and last positions. On 05/22/2013 04:08 AM, Oussama Jilal wrote: There is no ^ or $ in the solr regex since the regular expression will match tokens (not the complete indexed text). So the results you get will basicly depend on your way of indexing, if you use the regex on a tokenized field and that is not what you want, try to use a copy field wich is not tokenized and then use the regex on that one. On 05/22/2013 11:53 AM, Stéphane Habett Roux wrote: I just can't get the $ endpoint to work. I am not sure but I heard it works with the Java Regex engine (a little obvious if it is true ...), so any Java regex tutorial would help you. On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote: Yes, it works for me too. But many times result is not as expected. Is there some guide on use of regex in solr? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 4:00 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr I don't think so, it always worked for me without anything special, just try it and see :) On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote: @Oussama Thank you for your reply. Is it as simple as that? I mean no additional settings required? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: - - - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . - - - DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the
AW: index multiple files into one index entity
Hello Erick, Thank you for your fast answer. Maybe I don't exclaim my question clearly. I want index many files to one index entity. I will use the same behavior as any other multivalued field which can indexed to one unique id. So I think every ContentStreamUpdateRequest represent one index entity, isn't it? And with each addContentStream I will add one File to this entity. Thank you and with best Regards Mark -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Donnerstag, 23. Mai 2013 14:11 An: solr-user@lucene.apache.org Betreff: Re: index multiple files into one index entity I just skimmed your post, but I'm responding to the last bit. If you have uniqueKey defined as id in schema.xml then no, you cannot have multiple documents with the same ID. Whenever a new doc comes in it replaces the old doc with that ID. You can remove the uniqueKey definition and do what you want, but there are very few Solr installations with no uniqueKey and it's probably a better idea to make your id's truly unique. Best Erick On Thu, May 23, 2013 at 6:14 AM, mark.ka...@t-systems.com wrote: Hello solr team, I want to index multiple fields into one solr index entity, with the same id. We are using solr 4.1 I try it with following source fragment: public void addContentSet(ContentSet contentSet) throws SearchProviderException { ... ContentStreamUpdateRequest csur = generateCSURequest(contentSet.getIndexId(), contentSet); String indexId = contentSet.getIndexId(); ConcurrentUpdateSolrServer server = serverPool.getUpdateServer(indexId); server.request(csur); ... } private ContentStreamUpdateRequest generateCSURequest(String indexId, ContentSet contentSet) throws IOException { ContentStreamUpdateRequest csur = new ContentStreamUpdateRequest(confStore.getExtractUrl()); ModifiableSolrParams parameters = csur.getParams(); if (parameters == null) { parameters = new ModifiableSolrParams(); } parameters.set(literalsOverride, false); // maps the tika default content attribute to the Attribute with name 'fulltext' parameters.set(fmap.content, SearchSystemAttributeDef.FULLTEXT.getName()); // create an empty content stream, this seams necessary for ContentStreamUpdateRequest csur.addContentStream(new ImaContentStream()); for (Content content : contentSet.getContentList()) { csur.addContentStream(new ImaContentStream(content)); // for each content stream add additional attributes parameters.add(literal. + SearchSystemAttributeDef.CONTENT_ID.getName(), content.getBinaryObjectId().toString()); parameters.add(literal. + SearchSystemAttributeDef.CONTENT_KEY.getName(), content.getContentKey()); parameters.add(literal. + SearchSystemAttributeDef.FILE_NAME.getName(), content.getContentName()); parameters.add(literal. + SearchSystemAttributeDef.MIME_TYPE.getName(), content.getMimeType()); } parameters.set(literal.id , indexId); // adding some other attributes ... csur.setParams(parameters); return csur; } During debugging I can see that the method 'server.request(csur)' read for each ImaContentStream the buffer. When I'm looking on solr catalina log I see that the attached files reach the solr servlet. INFO: Releasing directory:/data/V-4-1/master0/data/index Apr 25, 2013 5:48:07 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [master0] webapp=/solr-4-1 path=/update/extract params={literal.searchconnectortest15_c8150e41_cc49_4a .. literal.id=26afa5dc-40ad-442a-ac79-0e7880c06aa1 . {add=[26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910940958720), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910971367424), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910976610304), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910983950336), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910989193216), 26afa5dc-40ad-442a-ac79-0e7880c06aa1 (1433265910995484672)]} 0 58 But only the latest in the content list will be indexed. My schema.xml has the following field definitions: field name=id type=string indexed=true stored=true required=true / field name=content type=text_general indexed=false stored=true multiValued=true/ field name=contentkey type=string indexed=true stored=true multiValued=true/ field name=contentid type=string indexed=true stored=true multiValued=true/ field name=contentfilename type=string indexed=true stored=true multiValued=true/ field name=contentmimetype type=string indexed=true stored=true multiValued=true/ field name=fulltext type=text_general
Re: Solr 4.3: node is seen as active in Zk while in recovery mode + endless recovery
the hard commit is set to about 20 minutes, while ram buffer is 256Mb. We will add more frequent hard commits without refreshing the searcher, that for the tip. from what I understood from the code, for each 'add' command there is a test for a 'delete by query'. if there is an older dbq, it's run after the 'add' operation if its version 'add' version. in my case, there are a lot of documents to be inserted, and a single large DBQ. My question is: shouldn't this be done in bulks? Why is it necessary to run the DBQ after each insertion? Supposedly there are 1000 insertions it's run 1000 times. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-3-node-is-seen-as-active-in-Zk-while-in-recovery-mode-endless-recovery-tp4065549p4065628.html Sent from the Solr - User mailing list archive at Nabble.com.
Broken pipe
Any idea why I got a Broken pipe? INFO - 2013-05-23 13:37:19.881; org.apache.solr.core.SolrCore; [messages_shard3_replica1] webapp=/solr path=/select/ params={sort=score+descfl=id,smsc_module,smsc_modulekey,smsc_userid,smsc_ssid,smsc_description,smsc_description_ngram,smsc_content,smsc_content_ngram,smsc_courseid,smsc_lastdate,score,metadata_stream_size,metadata_stream_source_info,metadata_stream_name,metadata_stream_content_type,last_modified,author,title,subjectdebugQuery=truedefaultOperator=ANDindent=onstart=0q=(smsc_content:banaan+||+smsc_content_ngram:banaan+||+smsc_description:banaan+||+smsc_description_ngram:banaan)+%26%26+(smsc_lastdate:[2000-04-23T15:14:40Z+TO+2013-05-23T15:14:40Z])+%26%26+(smsc_ssid:9)collection=messageswt=xmlrows=50version=2.2} hits=119 status=0 QTime=81108 ERROR - 2013-05-23 13:37:19.892; org.apache.solr.common.SolrException; null:ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:406) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:342) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:431) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:419) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) at org.apache.solr.util.FastWriter.flush(FastWriter.java:141) at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155) at org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:85) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:41) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:644) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:372) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:366) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:240) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:117) at org.apache.coyote.http11.AbstractOutputBuffer.doWrite(AbstractOutputBuffer.java:192) at org.apache.coyote.Response.doWrite(Response.java:505) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:401) ... 30 more ERROR - 2013-05-23 13:37:19.893; org.apache.solr.common.SolrException; null:ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:406) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:342) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:431) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:419) at
Re: Distributed query: strange behavior.
On 5/23/2013 1:51 AM, Luis Cappa Banda wrote: I've query each Solr shard server one by one and the total number of documents is correct. However, when I change rows parameter from 10 to 100 the total numFound of documents change: I've seen this problem on the list before and the cause has been determined each time to be caused by documents with the same uniqueKey value appearing in more than one shard. What I think happens here: With rows=10, you get the top ten docs from each of the three shards, and each shard sends its numFound for that query to the core that's coordinating the search. The coordinator adds up numFound, looks through those thirty docs, and arranges them according to the requested sort order, returning only the top 10. In this case, there happen to be no duplicates. With rows=100, you get a total of 300 docs. This time, duplicates are found and removed by the coordinator. I think that the coordinator adjusts the total numFound by the number of duplicate documents it removed, in an attempt to be more accurate. I don't know if adjusting numFound when duplicates are found in a sharded query is the right thing to do, I'll leave that for smarter people. Perhaps Solr should return a message with the results saying that duplicates were found, and if a config option is not enabled, the server should throw an exception and return a 4xx HTTP error code. One idea for a config parameter name would be allowShardDuplicates, but something better can probably be found. Thanks, Shawn
AW: Broken pipe
This usually happens when the client sending the request to Solr has given up waiting for the response (terminated the connection). In your example, we see that the Solr query time is 81 seconds. Probably the client issuing the request has a time-out of maybe 30 or 60 seconds. André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Donnerstag, 23. Mai 2013 15:40 An: solr-user@lucene.apache.org Betreff: Broken pipe Any idea why I got a Broken pipe? INFO - 2013-05-23 13:37:19.881; org.apache.solr.core.SolrCore; [messages_shard3_replica1] webapp=/solr path=/select/ params={sort=score+descfl=id,smsc_module,smsc_modulekey,smsc_userid,smsc_ssid,smsc_description,smsc_description_ngram,smsc_content,smsc_content_ngram,smsc_courseid,smsc_lastdate,score,metadata_stream_size,metadata_stream_source_info,metadata_stream_name,metadata_stream_content_type,last_modified,author,title,subjectdebugQuery=truedefaultOperator=ANDindent=onstart=0q=(smsc_content:banaan+||+smsc_content_ngram:banaan+||+smsc_description:banaan+||+smsc_description_ngram:banaan)+%26%26+(smsc_lastdate:[2000-04-23T15:14:40Z+TO+2013-05-23T15:14:40Z])+%26%26+(smsc_ssid:9)collection=messageswt=xmlrows=50version=2.2} hits=119 status=0 QTime=81108 ERROR - 2013-05-23 13:37:19.892; org.apache.solr.common.SolrException; null:ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:406) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:342) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:431) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:419) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) at org.apache.solr.util.FastWriter.flush(FastWriter.java:141) at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155) at org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:85) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:41) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:644) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:372) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:366) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:240) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:117) at org.apache.coyote.http11.AbstractOutputBuffer.doWrite(AbstractOutputBuffer.java:192) at org.apache.coyote.Response.doWrite(Response.java:505) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:401) ... 30 more ERROR -
Re: Solr 4.3 fails to load MySQL driver
On 5/23/2013 6:25 AM, Christian Köhler wrote: in my attempt to migrate for m 3.6.x to 4.3.0 I stumbled upon an issue loading the MySQL driver from the [instance]/lib dir: Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.dataimport.DataImportHandler The best thing to do is take the lib directives out of solrconfig.xml and put your extra jars in ${solr.solr.home}/lib, where solr.solr.home is the directory where solr.xml lives. NB: There might be two solr.xml files in your setup, but if there are, one of them will tell your servlet container how to start solr, the correct file tells solr about cores. Normally, you can set up another global lib directory, absolute or relative to solr.solr.home, with the sharedLib attribute in solr.xml, but that doesn't work in 4.3.0 - only ${solr.solr.home}/lib works in that specific version. Here's the bug report: https://issues.apache.org/jira/browse/SOLR-4791 I discovered another glitch last night in the 4.4 development version and filed a bug report, but I've been informed that I've been doing it wrong for the last couple of years: https://issues.apache.org/jira/browse/SOLR-4852 Thanks, Shawn
Problem with document routing with Solr 4.2.1
Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong?
Re: Problem with document routing with Solr 4.2.1
That's strange. The default value of rows param is 10 so you should be getting 10 results back unless your StandardRequestHandler config in solrconfig has set rows to 0 or if none of your fields are stored. On Thu, May 23, 2013 at 7:40 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong? -- Regards, Shalin Shekhar Mangar.
RE: Bug in spellcheck.alternativeTermCount
Can you give instructions on how to reproduce problem? James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Rounak Jain [mailto:rouna...@gmail.com] Sent: Thursday, May 23, 2013 7:36 AM To: solr-user@lucene.apache.org Subject: Bug in spellcheck.alternativeTermCount I was playing around with spellcheck.alternativeTermCount and noticed that if it is set to zero, Solr gives an exception with certain queries. Maybe the value isn't supposed to be zero, but I don't think an exception is the expected behaviour. Rounak
Re: Broken pipe
Also happens (same reason) if you are behind a smart load-balance and it decides to time out and fail over. Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 23, 2013 at 9:59 AM, André Widhani andre.widh...@digicol.de wrote: This usually happens when the client sending the request to Solr has given up waiting for the response (terminated the connection). In your example, we see that the Solr query time is 81 seconds. Probably the client issuing the request has a time-out of maybe 30 or 60 seconds. André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Donnerstag, 23. Mai 2013 15:40 An: solr-user@lucene.apache.org Betreff: Broken pipe Any idea why I got a Broken pipe? INFO - 2013-05-23 13:37:19.881; org.apache.solr.core.SolrCore; [messages_shard3_replica1] webapp=/solr path=/select/ params={sort=score+descfl=id,smsc_module,smsc_modulekey,smsc_userid,smsc_ssid,smsc_description,smsc_description_ngram,smsc_content,smsc_content_ngram,smsc_courseid,smsc_lastdate,score,metadata_stream_size,metadata_stream_source_info,metadata_stream_name,metadata_stream_content_type,last_modified,author,title,subjectdebugQuery=truedefaultOperator=ANDindent=onstart=0q=(smsc_content:banaan+||+smsc_content_ngram:banaan+||+smsc_description:banaan+||+smsc_description_ngram:banaan)+%26%26+(smsc_lastdate:[2000-04-23T15:14:40Z+TO+2013-05-23T15:14:40Z])+%26%26+(smsc_ssid:9)collection=messageswt=xmlrows=50version=2.2} hits=119 status=0 QTime=81108 ERROR - 2013-05-23 13:37:19.892; org.apache.solr.common.SolrException; null:ClientAbortException: java.net.SocketException: Broken pipe at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:406) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:342) at org.apache.catalina.connector.OutputBuffer.writeBytes(OutputBuffer.java:431) at org.apache.catalina.connector.OutputBuffer.write(OutputBuffer.java:419) at org.apache.catalina.connector.CoyoteOutputStream.write(CoyoteOutputStream.java:91) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) at org.apache.solr.util.FastWriter.flush(FastWriter.java:141) at org.apache.solr.util.FastWriter.flushBuffer(FastWriter.java:155) at org.apache.solr.response.TextResponseWriter.close(TextResponseWriter.java:85) at org.apache.solr.response.XMLResponseWriter.write(XMLResponseWriter.java:41) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:644) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:372) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) at
Re: Restaurant availability from database
Check out Gilt's presentation. It might give you some ideas, including possibly on refactoring your entities around 'availability' as a document: http://www.lucenerevolution.org/sites/default/files/Personalized%20Search%20on%20the%20Largest%20Flash%20Sale%20Site%20in%20America.pdf Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, May 23, 2013 at 8:36 AM, rajh ron...@trimm.nl wrote: Hi, I am are building a website that lists restaurant information and I also like to include the availability information. I've created a custom ValueSourceParser and ValueSource that retrieve the availability information from a MySQL database. An example query is as follows. http://localhost:8983/solr/collection1/select?q=restaurant_id:*fl=*,available:availability(2013-05-23, 2, 1700, 2359) This results in a psuedo (boolean) field available per document result and this works as expected. But my problem is that I also need the total number of available restaurants. Is there a way to count the number of available restaurants over the whole result set? I tried the stats component, but it doesn't seem to work with pseudo fields. Thanks in advance, Ronald -- View this message in context: http://lucene.472066.n3.nabble.com/Restaurant-availability-from-database-tp4065609.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem with document routing with Solr 4.2.1
I know. If a stop routing the documents and simply use a standard 'id' field then I am getting back my fields. I forgot to tell you how the collection was created. http://localhost:8201/solr/admin/collections?action=CREATEname=CurrentnumShards=15replicationFactor=3maxShardsPerNode=9 Since I am using the numshards parameter then composite routing should be working... unless I misunderstood something -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: May-23-13 10:27 AM To: solr-user@lucene.apache.org Subject: Re: Problem with document routing with Solr 4.2.1 That's strange. The default value of rows param is 10 so you should be getting 10 results back unless your StandardRequestHandler config in solrconfig has set rows to 0 or if none of your fields are stored. On Thu, May 23, 2013 at 7:40 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+So lrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong? -- Regards, Shalin Shekhar Mangar. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré.
Core admin action CREATE fails for existing core
It seems to me that the behavior of the Core admin action CREATE has changed when going from Solr 4.1 to 4.3. With 4.1, I could re-configure an existing core (changing path/name to solrconfig.xml for example). In 4.3, I get an error message: SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 'core-tex69b6iom1djrbzmlmg83-index2' already exists. Is this change intended? André
Re: Shardsplitting
Hi Arkadi, It does not matter where you invoke that command because ultimately that command is executed by the Overseer node. That being said, shard splitting has some bugs whose fixes will be released with Solr 4.3.1 so I'd suggest that you wait until then to use this feature. On Thu, May 23, 2013 at 6:09 PM, Arkadi Colson ark...@smartbit.be wrote: Hi When having a collection with 3 shards en 2 replica's for each shard and I want to split shard1. Does it matter where to start the splitshard command in the cloud or should it be started on the master of that shard? BR, Arkadi -- Regards, Shalin Shekhar Mangar.
Re: OPENNLP current patch compiling problem for 4.x branch
Hi Patrick, I think you should check out and apply the patch to branch_4x, rather than the lucene_solr_4_3_0 tag: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x Steve On May 23, 2013, at 2:08 AM, Patrick Mi patrick...@touchpointgroup.com wrote: Hi, I checked out from here http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_3_0 and downloaded the latest patch LUCENE-2899-current.patch. Applied the patch ok but when I did 'ant compile' I got the following error: == [javac] /home/lucene_solr_4_3_0/lucene/analysis/opennlp/src/java/org/apache/lucene/a nalysis/opennlp/FilterPayloadsFilter.java:43: error r: cannot find symbol [javac] super(Version.LUCENE_44, input); [javac] ^ [javac] symbol: variable LUCENE_44 [javac] location: class Version [javac] 1 error == Compiled it on trunk without problem. Is this patch supposed to work for 4.X? Regards, Patrick
RE: Problem with document routing with Solr 4.2.1
If that can help.. adding distrib=false or shard.keys= is giving back results. -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: May-23-13 10:39 AM To: solr-user@lucene.apache.org Subject: RE: Problem with document routing with Solr 4.2.1 I know. If a stop routing the documents and simply use a standard 'id' field then I am getting back my fields. I forgot to tell you how the collection was created. http://localhost:8201/solr/admin/collections?action=CREATEname=CurrentnumShards=15replicationFactor=3maxShardsPerNode=9 Since I am using the numshards parameter then composite routing should be working... unless I misunderstood something -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: May-23-13 10:27 AM To: solr-user@lucene.apache.org Subject: Re: Problem with document routing with Solr 4.2.1 That's strange. The default value of rows param is 10 so you should be getting 10 results back unless your StandardRequestHandler config in solrconfig has set rows to 0 or if none of your fields are stored. On Thu, May 23, 2013 at 7:40 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+So lrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong? -- Regards, Shalin Shekhar Mangar. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré.
Re: Core admin action CREATE fails for existing core
Yes, this did change - it's actually a protection for a previous change though. There was a time when you did a core reload by just making a new core with the same name and closing the old core - that is no longer really supported though - the proper way to do this is to use SolrCore#reload, and that has been the case for all of 4.x release if I remember right. I supported making this change to force people who might still be doing what is likely quite a buggy operation to switch to the correct code. Sorry about the inconvenience. - Mark On May 23, 2013, at 10:45 AM, André Widhani andre.widh...@digicol.de wrote: It seems to me that the behavior of the Core admin action CREATE has changed when going from Solr 4.1 to 4.3. With 4.1, I could re-configure an existing core (changing path/name to solrconfig.xml for example). In 4.3, I get an error message: SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 'core-tex69b6iom1djrbzmlmg83-index2' already exists. Is this change intended? André
Re: fq facet on double and non-indexed field
On May 23, 2013, at 14:25 , gpssolr2020 psgoms...@gmail.com wrote: Thanks Erick.. i hope we cant do q also on non-indexed field. Whats is the difference between q and fq other than cache . Thanks. How do you expect to search on a field that is non-indexed (and thus non-searchable)?
RE: .skip.autorecovery=Y + restart solr after crash + losing many documents
Hi Otis, Thank you for your reply. I'm in the middle of that upgrade and will report back when testing is complete. I'd like to get some nice set of reproducible steps so I'm not just ranting on. :) Regards, Gilles -Original Message- From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] Sent: 20 May 2013 04:29 To: solr-user@lucene.apache.org Subject: Re: .skip.autorecovery=Y + restart solr after crash + losing many documents Hi Gilles, Could you upgrade to 4.3.0 and see if you can reproduce? Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, May 13, 2013 at 5:26 PM, Gilles Comeau gilles.com...@polecat.co wrote: Hi all, We write to two same-named cores in the same collection for redundancy, and are not taking advantage of the full benefits of solr cloud replication. We use solrcloud.skip.autorecovery=true so that Solr doesn't try to sync the indexes when it starts up. However, we find that if the core is not optimized prior to shutting it down (in a crash situation), we can lose all of the data after starting up. The files are written to disk, but we can lose a full 24 hours worth of data as they are all removed when we start SOLR. (I don't think it is a commit issue) If we optimize before shutting down, we never lose any data. Sadly, sometimes SOLR is in a state where optimizing is not an option. Can anyone think of why that might be? Is there any special configuration you need if you want to write directly to two cores rather than use replication? Version 4.0, this used to work in our 4.0 nightly build, but broke when we migrated to 4.0 production.(until we test and migrate to the replication setup - it won't be too long and I'm a bit embarrassed to be asking this question!) Regards, Gilles
Re: Core admin action CREATE fails for existing core
I think the wiki needs to be updated to reflect this? http://wiki.apache.org/solr/CoreAdmin If somebody adds me as an editor (AlanWoodward), I'll do it. Alan Woodward www.flax.co.uk On 23 May 2013, at 16:43, Mark Miller wrote: Yes, this did change - it's actually a protection for a previous change though. There was a time when you did a core reload by just making a new core with the same name and closing the old core - that is no longer really supported though - the proper way to do this is to use SolrCore#reload, and that has been the case for all of 4.x release if I remember right. I supported making this change to force people who might still be doing what is likely quite a buggy operation to switch to the correct code. Sorry about the inconvenience. - Mark On May 23, 2013, at 10:45 AM, André Widhani andre.widh...@digicol.de wrote: It seems to me that the behavior of the Core admin action CREATE has changed when going from Solr 4.1 to 4.3. With 4.1, I could re-configure an existing core (changing path/name to solrconfig.xml for example). In 4.3, I get an error message: SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 'core-tex69b6iom1djrbzmlmg83-index2' already exists. Is this change intended? André
Re: Core admin action CREATE fails for existing core
Alan, I've added AlanWoodward to the Solr AdminGroup page. On May 23, 2013, at 12:29 PM, Alan Woodward a...@flax.co.uk wrote: I think the wiki needs to be updated to reflect this? http://wiki.apache.org/solr/CoreAdmin If somebody adds me as an editor (AlanWoodward), I'll do it. Alan Woodward www.flax.co.uk On 23 May 2013, at 16:43, Mark Miller wrote: Yes, this did change - it's actually a protection for a previous change though. There was a time when you did a core reload by just making a new core with the same name and closing the old core - that is no longer really supported though - the proper way to do this is to use SolrCore#reload, and that has been the case for all of 4.x release if I remember right. I supported making this change to force people who might still be doing what is likely quite a buggy operation to switch to the correct code. Sorry about the inconvenience. - Mark On May 23, 2013, at 10:45 AM, André Widhani andre.widh...@digicol.de wrote: It seems to me that the behavior of the Core admin action CREATE has changed when going from Solr 4.1 to 4.3. With 4.1, I could re-configure an existing core (changing path/name to solrconfig.xml for example). In 4.3, I get an error message: SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 'core-tex69b6iom1djrbzmlmg83-index2' already exists. Is this change intended? André
Re: Restaurant availability from database
Thank you for your answer. Do you mean I should index the availability data as a document in Solr? Because the availability data in our databases is around 6,509,972 records and contains the availability per number of seats and per 15 minutes. I also tried this method, and as far as I know it's only possible to join the availability documents and not to include that information per result document. An example API response (created from the Solr response): { restaurants: [ { id: 13906, name: Allerlei, zipcode: 6511DP, house_number: 59, available: true }, { id: 13907, name: Voorbeeld, zipcode: 6512DP, house_number: 39, available: false } ], resultCount: 12156, resultCountAvailable: 55, } I'm currently hacking around the problem by executing the search again with a very high value for the rows parameter and counting the number of available restaurants on the backend, but this causes a big performance impact (as expected). -- View this message in context: http://lucene.472066.n3.nabble.com/Restaurant-availability-from-database-tp4065609p4065710.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Core admin action CREATE fails for existing core
Thanks! Alan Woodward www.flax.co.uk On 23 May 2013, at 17:38, Steve Rowe wrote: Alan, I've added AlanWoodward to the Solr AdminGroup page. On May 23, 2013, at 12:29 PM, Alan Woodward a...@flax.co.uk wrote: I think the wiki needs to be updated to reflect this? http://wiki.apache.org/solr/CoreAdmin If somebody adds me as an editor (AlanWoodward), I'll do it. Alan Woodward www.flax.co.uk On 23 May 2013, at 16:43, Mark Miller wrote: Yes, this did change - it's actually a protection for a previous change though. There was a time when you did a core reload by just making a new core with the same name and closing the old core - that is no longer really supported though - the proper way to do this is to use SolrCore#reload, and that has been the case for all of 4.x release if I remember right. I supported making this change to force people who might still be doing what is likely quite a buggy operation to switch to the correct code. Sorry about the inconvenience. - Mark On May 23, 2013, at 10:45 AM, André Widhani andre.widh...@digicol.de wrote: It seems to me that the behavior of the Core admin action CREATE has changed when going from Solr 4.1 to 4.3. With 4.1, I could re-configure an existing core (changing path/name to solrconfig.xml for example). In 4.3, I get an error message: SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 'core-tex69b6iom1djrbzmlmg83-index2' already exists. Is this change intended? André
Re: Solr 4.3 fails to load MySQL driver
Hi, thanx for pointing this out to me. 1152 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrConfig – Adding specified lib dirs to ClassLoader org.apache.solr.core.SolrResourceLoader – Adding 'file:/home/christian/zfmk/solr/solr-4.3.0/example/lib/mysql-connector-java-5.1.25-bin.jar' to classloader The mysql-connector-java DOES get loaded, but is not available to org.apache.solr.core.SolrResourceLoader.findClass Has something changed for the syntax creating a dataimport handler? solrconfig.xml: --- requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdata-config.xml/str /lst /requestHandler data-config.xml: dataConfig dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/koehler_zfmk user=my_user password=secret/ document name=content entity name=rawidentificationid query=SELECT * FROM foobar; /entity /document /dataConfig I use this configuration successfully with 3.6 Regards Chris Am 23.05.2013 14:39, schrieb Jack Krupansky: Check the Solr log on startup - it will explicitly state which lib directories/files will be used. Make sure they agree with where the DIH jars reside. Keep in mind that the directory structure of Solr changed - use the lib from 4.3 solrconfig. Try to use DIH in the standard Solr 4.3 example first. Then mimic that in your customization. -- Jack Krupansky -Original Message- From: Christian Köhler Sent: Thursday, May 23, 2013 8:25 AM To: solr-user@lucene.apache.org Subject: Solr 4.3 fails to load MySQL driver Hi, in my attempt to migrate for m 3.6.x to 4.3.0 I stumbled upon an issue loading the MySQL driver from the [instance]/lib dir: Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.dataimport.DataImportHandler at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:266) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:448) ... 18 more To narrow it down, I use the plain example configuration with the following changes: - Add a dataimport requestHandler to example/conf/solrconfig.xml (copied from a working solr 3.6.x) - Created example/conf/data-config.xml with dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver ... and SQL statement (both copied from a working solr 3.6.x) - placed the current driver mysql-connector-java-5.1.25-bin.jar in example/lib As to my knowledge the lib dir is included automatically to the path. To make sure I tried to: - add lib dir=./lib / to explicit to solrconf.xml - add absolute path to solrarconf.xml - changed solr.xml to use solr persistent=true sharedLib=lib All to no avail. System Info: - OpenJDK Runtime Environmentm 1.7.0_19 - Solr 4.3.0 - mysql-connector-java-5.1.25-bin.jar The same configuration run fine with a solr 3.6.x on the very same machine. Any help is appreciated! Cheers Chris -- Christian Köhler ganzgraph gmbh Bornheimer Straße 37 53111 Bonn koeh...@ganzgraph.de http://www.ganzgraph.de/ Tel.: +49-(0)228-227 99 400 Fax : +49-(0)228-227 99 409 Geschäftsführer: Christian Köhler, Thorsten Orth Unternehmenssitz: Bonn Handelsregister-Nummer: HRB 19066 beim Amtsgericht: Bonn UstId-Nr: DE 280482111
Re: Solr 4.3 fails to load MySQL driver
: in my attempt to migrate for m 3.6.x to 4.3.0 I stumbled upon an issue loading : the MySQL driver from the [instance]/lib dir: : : Caused by: java.lang.ClassNotFoundException: : org.apache.solr.handler.dataimport.DataImportHandler one of us is mistaken by what that error means. you say it means that the MySQL driver isn't being loaded, but nothing in your mail suggests to me that there is a problem loading hte MySql driver. what i see is that Solr can't seem to load the DIH class, suggesting that the dataimporthandler jar is not getting loaded. There may or nay not also be a problem loading the MySQL driver, but nothing is even going to attempt to do so unless Solr can successfully construct an instance of the DataImportHandler. So unless there are more details to your error that start mentioning the MySql classes, i would check your lib settings for loading the DIH jars and make sure those are right. -Hoss
Re: Fast faceting over large number of distinct terms
Interesting solution. My concern is how to select the most frequent terms in the story_text field in a way that would make sense to the user. Only including the X most common non-stopword terms in a document could easily cause important patterns to be missed. There's a similar issue with only returning counts for terms in the top N documents matching a particular query. Also is there an efficient way to add term counts on the client side? I thought of using the TermVectorComponent to get document level frequency counts and then using something like Hadoop to add them up. However, I couldn't find any documentation on using the results of a solr query to feed a map reduce operation. -- David On Wed, May 22, 2013 at 11:12 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Here's a possibility: At index time extract important terms (and/or phrases) from this story_text and store top N of them in a separate field (which will be much smaller/shorter). Then facet on that. Or just retrieve it and manually parse and count in the client if that turns out to be faster. I did this in the previous decade before Solr was available and it worked well. I limited my counting to top N (200?) hits. Otis -- Solr ElasticSearch Support http://sematext.com/ On Wed, May 22, 2013 at 10:54 PM, David Larochelle dlaroche...@cyber.law.harvard.edu wrote: The goal of the system is to obtain data that can be used to generate word clouds so that users can quickly get a sense of the aggregate contents of all documents matching a particular query. For example, a user might want to see a word cloud of all documents discussing 'Iraq' in a particular new papers. Faceting on story_text gives counts of individual words rather than entire text strings. I think this is because of the tokenization that happens automatically as part of the text_general type. I'm happy to look at alternatives to faceting but I wasn't able to find one that provided aggregate word counts for just the documents matching a particular query rather than an individual documents or the entire index. -- David On Wed, May 22, 2013 at 10:32 PM, Brendan Grainger brendan.grain...@gmail.com wrote: Hi David, Out of interest, what are you trying to accomplish by faceting over the story_text field? Is it generally the case that the story_text field will contain values that are repeated or categorize your documents somehow? From your description: story_text is used to store free form text obtained by crawling new papers and blogs, it doesn't seem that way, so I'm not sure faceting is what you want in this situation. Cheers, Brendan On Wed, May 22, 2013 at 9:49 PM, David Larochelle dlaroche...@cyber.law.harvard.edu wrote: I'm trying to quickly obtain cumulative word frequency counts over all documents matching a particular query. I'm running in Solr 4.3.0 on a machine with 16GB of ram. My index is 2.5 GB and has around ~350,000 documents. My schema includes the following fields: field name=id type=string indexed=true stored=true required=true multiValued=false / field name=media_id type=int indexed=true stored=true required=true multiValued=false / field name=story_text type=text_general indexed=true stored=true termVectors=true termPositions=true termOffsets=true / story_text is used to store free form text obtained by crawling new papers and blogs. Running faceted searches with the fc or fcs methods fails with the error Too many values for UnInvertedField faceting on field story_text http://localhost:8983/solr/query?q=id:106714828_6621facet=truefacet.limit=10facet.pivot=publish_date,story_textrows=0facet.method=fcs Running faceted search with the 'enum' method succeeds but takes a very long time. http://localhost:8983/solr/query?q=includes:foobarfacet=truefacet.limit=100facet.pivot=media_id,includesfacet.method=enumrows=0 http://localhost:8983/solr/query?q=includes:mccainfacet=truefacet.limit=100facet.pivot=media_id,includesfacet.method=enumrows=0 The frustrating thing is even if the query only returns a few hundred documents, it still takes 10 minutes or longer to get the cumulative word count results. Eventually we're hoping to build a system that will return results in a few seconds and scale to hundreds of millions of documents. Is there anyway to get this level of performance out of Solr/Lucene? Thanks, David -- Brendan Grainger www.kuripai.com
Re: Upgrading from SOLR 3.5 to 4.2.1 Results.
Actually , It's pretty high end for most of the users. Rishi, u can post the real h/w details and our typical deployment . No :of cpus per node No :of disks per host Vms per host Gc params No :of cores per instance Noble Paul Sent from phone On 21 May 2013 01:47, Rishi Easwaran rishi.easwa...@aol.com wrote: No, we just upgraded to 4.2.1. With the size of our complex and effort required apply our patches and rollout, our upgrades are not that often. -Original Message- From: Noureddine Bouhlel nouredd...@ecotour.com To: solr-user solr-user@lucene.apache.org Sent: Mon, May 20, 2013 3:36 pm Subject: Re: Upgrading from SOLR 3.5 to 4.2.1 Results. Hi Rishi, Have you done any tests with Solr 4.3 ? Regards, Cordialement, BOUHLEL Noureddine On 17 May 2013 21:29, Rishi Easwaran rishi.easwa...@aol.com wrote: Hi All, Its Friday 3:00pm, warm sunny outside and it was a good week. Figured I'd share some good news. I work for AOL mail team and we use SOLR for our mail search backend. We have been using it since pre-SOLR 1.4 and strong supporters of SOLR community. We deal with millions indexes and billions of requests a day across our complex. We finished full rollout of SOLR 4.2.1 into our production last week. Some key highlights: - ~75% Reduction in Search response times - ~50% Reduction in SOLR Disk busy , which in turn helped with ~90% Reduction in errors - Garbage collection total stop reduction by over 50% moving application throughput into the 99.8% - 99.9% range - ~15% reduction in CPU usage We did not tune our application moving from 3.5 to 4.2.1 nor update java. For the most part it was a binary upgrade, with patches for our special use case. Now going forward we are looking at prototyping SOLR Cloud for our search system, upgrade java and tomcat, tune our application further. Lots of fun stuff :) Have a great weekend everyone. Thanks, Rishi.
Re: Solr 4.3 fails to load MySQL driver
Hi one of us is mistaken by what that error means. you say it means that the MySQL driver isn't being loaded, but nothing in your mail suggests to me that there is a problem loading hte MySql driver. what i see is that Solr can't seem to load the DIH class, suggesting that the dataimporthandler jar is not getting loaded. I corrected myself in my last mail: the MySQL driver IS loaded (thanx for pointing out to me where to look). There may or nay not also be a problem loading the MySQL driver, but I only SUSPECT of the MySQL driver being the culprit for the dataimporthandler jar is not getting loaded. Not sure! MySql classes, i would check your lib settings for loading the DIH jars I am not using DIH. IMHO its just the plain example code in solr-4.3.0/example/solr/collection1/ that is being called. I include the full trace back to clarify my problem (hopefully) Cheers Chris /home/solr-4.3.0/example# java -jar start.jar 0[main] INFO org.eclipse.jetty.server.Server – jetty-8.1.8.v20121106 19 [main] INFO org.eclipse.jetty.deploy.providers.ScanningAppProvider – Deployment monitor /home/solr/solr-4.3.0/example/contexts at interval 0 24 [main] INFO org.eclipse.jetty.deploy.DeploymentManager – Deployable added: /home/solr/solr-4.3.0/example/contexts/solr-jetty-context.xml 653 [main] INFO org.eclipse.jetty.webapp.StandardDescriptorProcessor – NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet Null identity service, trying login service: null Finding identity service: null 674 [main] INFO org.eclipse.jetty.server.handler.ContextHandler – started o.e.j.w.WebAppContext{/solr,file:/home/solr/solr-4.3.0/example/solr-webapp/webapp/},/home/solr/solr-4.3.0/example/webapps/solr.war 674 [main] INFO org.eclipse.jetty.server.handler.ContextHandler – started o.e.j.w.WebAppContext{/solr,file:/home/solr/solr-4.3.0/example/solr-webapp/webapp/},/home/solr/solr-4.3.0/example/webapps/solr.war 688 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – SolrDispatchFilter.init() 703 [main] INFO org.apache.solr.core.SolrResourceLoader – JNDI not configured for solr (NoInitialContextEx) 704 [main] INFO org.apache.solr.core.SolrResourceLoader – solr home defaulted to 'solr/' (could not find system property or JNDI) 713 [main] INFO org.apache.solr.core.CoreContainer – looking for solr config file: /home/solr/solr-4.3.0/example/solr/solr.xml 715 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer 1857140958 716 [main] INFO org.apache.solr.core.CoreContainer – Loading CoreContainer using Solr Home: 'solr/' 716 [main] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: 'solr/' 962 [main] INFO org.apache.solr.core.CoreContainer – loading shared library: /home/solr/solr-4.3.0/example/solr/lib 962 [main] ERROR org.apache.solr.core.SolrResourceLoader – Can't find (or read) file to add to classloader: solr/lib 971 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting socketTimeout to: 0 973 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting urlScheme to: http:// 973 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting connTimeout to: 0 974 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxConnectionsPerHost to: 20 974 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting corePoolSize to: 0 974 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maximumPoolSize to: 2147483647 974 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting maxThreadIdleTime to: 5 974 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting sizeOfQueue to: -1 975 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – Setting fairnessPolicy to: false 980 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating new http client, config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false 1073 [main] INFO org.apache.solr.core.CoreContainer – Registering Log Listener 1087 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.CoreContainer – Creating SolrCore 'collection1' using instanceDir: solr/collection1 1088 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for directory: 'solr/collection1/' 1143 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrConfig – Adding specified lib dirs to ClassLoader 1144 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/home/solr/solr-4.3.0/example/lib/jetty-util-8.1.8.v20121106.jar' to classloader 1144 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader – Adding 'file:/home/solr/solr-4.3.0/example/lib/servlet-api-3.0.jar' to
RE: Problem with document routing with Solr 4.2.1
I must add the shard.keys= does not return anything on two on my nodes. But that is to be expected since I'm using a replication factor of 3 on a cloud of 5 servers -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: May-23-13 11:27 AM To: solr-user@lucene.apache.org Subject: RE: Problem with document routing with Solr 4.2.1 If that can help.. adding distrib=false or shard.keys= is giving back results. -Original Message- From: Jean-Sebastien Vachon [mailto:jean-sebastien.vac...@wantedanalytics.com] Sent: May-23-13 10:39 AM To: solr-user@lucene.apache.org Subject: RE: Problem with document routing with Solr 4.2.1 I know. If a stop routing the documents and simply use a standard 'id' field then I am getting back my fields. I forgot to tell you how the collection was created. http://localhost:8201/solr/admin/collections?action=CREATEname=CurrentnumShards=15replicationFactor=3maxShardsPerNode=9 Since I am using the numshards parameter then composite routing should be working... unless I misunderstood something -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: May-23-13 10:27 AM To: solr-user@lucene.apache.org Subject: Re: Problem with document routing with Solr 4.2.1 That's strange. The default value of rows param is 10 so you should be getting 10 results back unless your StandardRequestHandler config in solrconfig has set rows to 0 or if none of your fields are stored. On Thu, May 23, 2013 at 7:40 PM, Jean-Sebastien Vachon jean-sebastien.vac...@wantedanalytics.com wrote: Hi All, I just started indexing data in my brand new Solr Cloud running on 4.2.1. Since I am a big user of the grouping feature, I need to route my documents on the proper shard. Following the instruction found here: http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+So lrCloud I set my document id to something like this 'fieldA!id' where fieldA is the key I want to use to distribute my documents. (All documents with the same value for fieldA will be sent to the same shard). When I query my index, I can see that the number of documents increase but there are no fields at all in the index. http://10.0.5.211:8201/solr/Current/select?q=*:* response lst name=responseHeader int name=status0/int int name=QTime11/int lst name=params str name=q*:*/str /lst /lst result name=response numFound=26318 start=0 maxScore=1.0/ /response Specifying fields in the 'fl' parameter does nothing. What am I doing wrong? -- Regards, Shalin Shekhar Mangar. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré. - Aucun virus trouvé dans ce message. Analyse effectuée par AVG - www.avg.fr Version: 2013.0.3336 / Base de données virale: 3162/6319 - Date: 12/05/2013 La Base de données des virus a expiré.
Re: Solr 4.3 fails to load MySQL driver
: I only SUSPECT of the MySQL driver being the culprit for the dataimporthandler : jar is not getting loaded. Not sure! the dataimporthandler *class* is not getting loaded the dataimporthandler *jar* is not getting loaded. : MySql classes, i would check your lib settings for loading the DIH : jars : : I am not using DIH. IMHO its just the plain example code in : solr-4.3.0/example/solr/collection1/ that is being called. i'm totally confused ... DIH == DataImportHandler ... it's just an acronym, you say you aren't using DIH, but you are having a problem loading DIH, so DIH is used in your configs. : I include the full trace back to clarify my problem (hopefully) ... : org.apache.solr.core.SolrResourceLoader – new SolrResourceLoader for : directory: 'solr/collection1/' : 1143 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrConfig – : Adding specified lib dirs to ClassLoader : 1144 [coreLoadExecutor-3-thread-1] INFO : org.apache.solr.core.SolrResourceLoader – Adding : 'file:/home/solr/solr-4.3.0/example/lib/jetty-util-8.1.8.v20121106.jar' to : classloader ...ok, for starters this makes no sense, and may be the cause of some problems. you aparently have your collection1 configs setup to load all of the classes from the /home/solr/solr-4.3.0/example/example/lib directory as part of the collection1 classloader. you really don't want to do that. It will most likeley cause you all sorts of problems, even if it's unrelated to the current problem. Second, note in particular all of the lines that look like that line above -- specifically lines that say org.apache.solr.core.SolrResourceLoader - Addming to classloader. besides the ones refering to /home/solr/solr-4.3.0/example/lib/ (which is almost certainly not what you want) you then have a bunch refering to contrib/extraction and contrib/langid, and contrib/velocity -- all of which is great, those plugins and their dependencies are now available to use. but no where does it ever say anything about adding contrib/dataimporthandler jars to the classloader. which means your config isn't setup to load any of hte dataimporthandler jars as plugins which means when it's done loading plugins, and it starts to initialize things like RequestHandlers, and it finds a refrence to the DataImportHandler, it doesn't know what that means... : Caused by: java.lang.ClassNotFoundException: : org.apache.solr.handler.dataimport.DataImportHandler if you look at the 4.3 DIH examples, you'll note that the only solrconfig.xml files that mention DataImportHandler also include lib directives like the following in order to load dataimporthandler as a plugin... lib dir=../../../../dist/ regex=solr-dataimporthandler-.*\.jar / ... requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler -Hoss
AW: Core admin action CREATE fails for existing core
Mark, Alan, thanks for explaining and updating the wiki. When reloading the core using action=CREATE with Solr 4.1 I could specify the path to schema and config. In fact I used this to reconfigure the core to use a specific one of two prepared config files depending on some external index state (instead of making changes to one and the same config file). action=RELOAD does not understand the corresponding request parameters schema and config (which is why I used CREATE, not RELOAD in the first place). So the functionality to switch to a different config file for an existing core is no longer there, I guess? Thanks, André Von: Alan Woodward [a...@flax.co.uk] Gesendet: Donnerstag, 23. Mai 2013 18:43 An: solr-user@lucene.apache.org Betreff: Re: Core admin action CREATE fails for existing core Thanks! Alan Woodward www.flax.co.uk On 23 May 2013, at 17:38, Steve Rowe wrote: Alan, I've added AlanWoodward to the Solr AdminGroup page. On May 23, 2013, at 12:29 PM, Alan Woodward a...@flax.co.uk wrote: I think the wiki needs to be updated to reflect this? http://wiki.apache.org/solr/CoreAdmin If somebody adds me as an editor (AlanWoodward), I'll do it. Alan Woodward www.flax.co.uk On 23 May 2013, at 16:43, Mark Miller wrote: Yes, this did change - it's actually a protection for a previous change though. There was a time when you did a core reload by just making a new core with the same name and closing the old core - that is no longer really supported though - the proper way to do this is to use SolrCore#reload, and that has been the case for all of 4.x release if I remember right. I supported making this change to force people who might still be doing what is likely quite a buggy operation to switch to the correct code. Sorry about the inconvenience. - Mark On May 23, 2013, at 10:45 AM, André Widhani andre.widh...@digicol.de wrote: It seems to me that the behavior of the Core admin action CREATE has changed when going from Solr 4.1 to 4.3. With 4.1, I could re-configure an existing core (changing path/name to solrconfig.xml for example). In 4.3, I get an error message: SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 'core-tex69b6iom1djrbzmlmg83-index2' already exists. Is this change intended? André
Re: Core admin action CREATE fails for existing core
Your right - that does seem to be a new limitation. Could you create a JIRA issue for it? It would be fairly simple to add another reload method that also took the name of a new solrconfig/schema file. - Mark On May 23, 2013, at 4:11 PM, André Widhani andre.widh...@digicol.de wrote: Mark, Alan, thanks for explaining and updating the wiki. When reloading the core using action=CREATE with Solr 4.1 I could specify the path to schema and config. In fact I used this to reconfigure the core to use a specific one of two prepared config files depending on some external index state (instead of making changes to one and the same config file). action=RELOAD does not understand the corresponding request parameters schema and config (which is why I used CREATE, not RELOAD in the first place). So the functionality to switch to a different config file for an existing core is no longer there, I guess? Thanks, André Von: Alan Woodward [a...@flax.co.uk] Gesendet: Donnerstag, 23. Mai 2013 18:43 An: solr-user@lucene.apache.org Betreff: Re: Core admin action CREATE fails for existing core Thanks! Alan Woodward www.flax.co.uk On 23 May 2013, at 17:38, Steve Rowe wrote: Alan, I've added AlanWoodward to the Solr AdminGroup page. On May 23, 2013, at 12:29 PM, Alan Woodward a...@flax.co.uk wrote: I think the wiki needs to be updated to reflect this? http://wiki.apache.org/solr/CoreAdmin If somebody adds me as an editor (AlanWoodward), I'll do it. Alan Woodward www.flax.co.uk On 23 May 2013, at 16:43, Mark Miller wrote: Yes, this did change - it's actually a protection for a previous change though. There was a time when you did a core reload by just making a new core with the same name and closing the old core - that is no longer really supported though - the proper way to do this is to use SolrCore#reload, and that has been the case for all of 4.x release if I remember right. I supported making this change to force people who might still be doing what is likely quite a buggy operation to switch to the correct code. Sorry about the inconvenience. - Mark On May 23, 2013, at 10:45 AM, André Widhani andre.widh...@digicol.de wrote: It seems to me that the behavior of the Core admin action CREATE has changed when going from Solr 4.1 to 4.3. With 4.1, I could re-configure an existing core (changing path/name to solrconfig.xml for example). In 4.3, I get an error message: SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 'core-tex69b6iom1djrbzmlmg83-index2' already exists. Is this change intended? André
Re: Solr 4.3 fails to load MySQL driver
Hi, i'm totally confused ... DIH == DataImportHandler ... it's just an acronym, you say you aren't using DIH, but you are having a problem loading DIH, so DIH is used in your configs. sorry for the confusion. I was just trying to say: I use the example code from solr-4.3.0/example/solr and not from solr-4.3.0/example/example-DIH ...ok, for starters this makes no sense, and may be the cause of some problems. you aparently have your collection1 configs setup to load all of the classes from the /home/solr/solr-4.3.0/example/example/lib directory as part of the collection1 classloader. you really don't want to do that. It will most likeley cause you all sorts of problems, even if it's unrelated to the current problem. For solr is was recomended to place the MySQL driver in solr_3.6.2/example/lib/ This dir is load by default in 3.6 (as I did not add any additional lib dirs). Thats why I did this in 4.3 as well. What's the best practice to place third party libs? I added example/lib/ to collection1/conf/solrconfig.xml as lib dir Without this, the MySQL driver is not loaded according to the org.apache.solr.core.SolrResourceLoader – Adding xxx messages but no where does it ever say anything about adding contrib/dataimporthandler jars to the classloader. collection1/conf/solrconfig.xml has the following lib dirs by default: lib dir=../../../contrib/extraction/lib regex=.*\.jar / lib dir=../../../dist/ regex=solr-cell-\d.*\.jar / lib dir=../../../contrib/clustering/lib/ regex=.*\.jar / lib dir=../../../dist/ regex=solr-clustering-\d.*\.jar / lib dir=../../../contrib/langid/lib/ regex=.*\.jar / lib dir=../../../dist/ regex=solr-langid-\d.*\.jar / lib dir=../../../contrib/velocity/lib regex=.*\.jar / lib dir=../../../dist/ regex=solr-velocity-\d.*\.jar / Looks the same to me as in 3.6. which means your config isn't setup to load any of hte dataimporthandler jars as plugins That means I have to configure the dataimporthandler manually in 4.3? If yes, this is the root of all problems ... which means when it's done loading plugins, and it starts to initialize things like RequestHandlers, and it finds a refrence to the DataImportHandler, it doesn't know what that means... : Caused by: java.lang.ClassNotFoundException: : org.apache.solr.handler.dataimport.DataImportHandler if you look at the 4.3 DIH examples, you'll note that the only solrconfig.xml files that mention DataImportHandler also include lib directives like the following in order to load dataimporthandler as a plugin... lib dir=../../../../dist/ regex=solr-dataimporthandler-.*\.jar / included this ... to no avail. requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler still does not load. Regards Chris
Core admin action CREATE fails to persist some settings in solr.xml with Solr 4.3
When I create a core with Core admin handler using these request parameters: action=CREATE name=core-tex69bbum21ctk1kq6lmkir-index3 schema=/etc/opt/dcx/solr/conf/schema.xml instanceDir=/etc/opt/dcx/solr/ config=/etc/opt/dcx/solr/conf/solrconfig.xml dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3 in Solr 4.1, solr.xml would have the following entry: core schema=/etc/opt/dcx/solr/conf/schema.xml loadOnStartup=true instanceDir=/etc/opt/dcx/solr/ transient=false name=core-tex69bbum21ctk1kq6lmkir-index3 config=/etc/opt/dcx/solr/conf/solrconfig.xml dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3/ collection=core-tex69bbum21ctk1kq6lmkir-index3/ while in Solr 4.3 schema, config and dataDir will be missing: core loadOnStartup=true instanceDir=/etc/opt/dcx/solr/ transient=false name=core-tex69bbum21ctk1kq6lmkir-index3 collection=core-tex69bbum21ctk1kq6lmkir-index3/ The new core would use the settings specified during CREATE, but after a Solr restart they are lost (fall back to some defaults), as they are not persisted in solr.xml. Is this a bug or am I doing something wrong here? André
AW: Core admin action CREATE fails for existing core
Ok - yes, will do so tomorrow. Thanks, André Von: Mark Miller [markrmil...@gmail.com] Gesendet: Donnerstag, 23. Mai 2013 22:46 An: solr-user@lucene.apache.org Betreff: Re: Core admin action CREATE fails for existing core Your right - that does seem to be a new limitation. Could you create a JIRA issue for it? It would be fairly simple to add another reload method that also took the name of a new solrconfig/schema file. - Mark On May 23, 2013, at 4:11 PM, André Widhani andre.widh...@digicol.de wrote: Mark, Alan, thanks for explaining and updating the wiki. When reloading the core using action=CREATE with Solr 4.1 I could specify the path to schema and config. In fact I used this to reconfigure the core to use a specific one of two prepared config files depending on some external index state (instead of making changes to one and the same config file). action=RELOAD does not understand the corresponding request parameters schema and config (which is why I used CREATE, not RELOAD in the first place). So the functionality to switch to a different config file for an existing core is no longer there, I guess? Thanks, André Von: Alan Woodward [a...@flax.co.uk] Gesendet: Donnerstag, 23. Mai 2013 18:43 An: solr-user@lucene.apache.org Betreff: Re: Core admin action CREATE fails for existing core Thanks! Alan Woodward www.flax.co.uk On 23 May 2013, at 17:38, Steve Rowe wrote: Alan, I've added AlanWoodward to the Solr AdminGroup page. On May 23, 2013, at 12:29 PM, Alan Woodward a...@flax.co.uk wrote: I think the wiki needs to be updated to reflect this? http://wiki.apache.org/solr/CoreAdmin If somebody adds me as an editor (AlanWoodward), I'll do it. Alan Woodward www.flax.co.uk On 23 May 2013, at 16:43, Mark Miller wrote: Yes, this did change - it's actually a protection for a previous change though. There was a time when you did a core reload by just making a new core with the same name and closing the old core - that is no longer really supported though - the proper way to do this is to use SolrCore#reload, and that has been the case for all of 4.x release if I remember right. I supported making this change to force people who might still be doing what is likely quite a buggy operation to switch to the correct code. Sorry about the inconvenience. - Mark On May 23, 2013, at 10:45 AM, André Widhani andre.widh...@digicol.de wrote: It seems to me that the behavior of the Core admin action CREATE has changed when going from Solr 4.1 to 4.3. With 4.1, I could re-configure an existing core (changing path/name to solrconfig.xml for example). In 4.3, I get an error message: SEVERE: org.apache.solr.common.SolrException: Error CREATEing SolrCore 'core-tex69b6iom1djrbzmlmg83-index2': Core with name 'core-tex69b6iom1djrbzmlmg83-index2' already exists. Is this change intended? André
Warning: no uniqueKey specified in schema.
Hi, I just downloaded Apache Solr 4.3.0 from http://lucene.apache.org/solr/. I then got into the /example directory and started Solr with: java -Djava.util.logging.config.file=etc/logging.properties -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar I have not made any changes at this point and I get the following Warning: no uniqueKey specified in schema. I have no clue why this error occurs because the schema.xml has uniqueKeyid/uniqueKey. Isn’t this correctly defined?? I have not changed the examples in any way, just ran them. I would like to add that if I use the normal Solr (not the one with the DataImportHandler): java -Djava.util.logging.config.file=etc/logging.properties -jar start.jar This warning does not occur here. I’d appreciate any clues on why this warning occurs in the example-DIH. Thank you, O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Warning-no-uniqueKey-specified-in-schema-tp4065791.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Warning: no uniqueKey specified in schema.
On 5/23/2013 3:50 PM, O. Olson wrote: I just downloaded Apache Solr 4.3.0 from http://lucene.apache.org/solr/. I then got into the /example directory and started Solr with: java -Djava.util.logging.config.file=etc/logging.properties -Dsolr.solr.home=./example-DIH/solr/ -jar start.jar I have not made any changes at this point and I get the following Warning: no uniqueKey specified in schema. One of the cores defined in example-DIH, specifically the one named tika, does not have uniqueKey in its schema. example/example-DIH/solr/tika/conf/schema.xml Thanks, Shawn
Re: Restaurant availability from database
Hossman did a presentation on something similar to this using spatial data at a Solr meetup some months ago. http://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/ May be helpful to you. On Thu, May 23, 2013 at 9:40 AM, rajh ron...@trimm.nl wrote: Thank you for your answer. Do you mean I should index the availability data as a document in Solr? Because the availability data in our databases is around 6,509,972 records and contains the availability per number of seats and per 15 minutes. I also tried this method, and as far as I know it's only possible to join the availability documents and not to include that information per result document. An example API response (created from the Solr response): { restaurants: [ { id: 13906, name: Allerlei, zipcode: 6511DP, house_number: 59, available: true }, { id: 13907, name: Voorbeeld, zipcode: 6512DP, house_number: 39, available: false } ], resultCount: 12156, resultCountAvailable: 55, } I'm currently hacking around the problem by executing the search again with a very high value for the rows parameter and counting the number of available restaurants on the backend, but this causes a big performance impact (as expected). -- View this message in context: http://lucene.472066.n3.nabble.com/Restaurant-availability-from-database-tp4065609p4065710.html Sent from the Solr - User mailing list archive at Nabble.com.
Note on The Book
To those of you who may have heard about the Lucene/Solr book that I and two others are writing on Lucene and Solr, some bad and good news. The bad news: The book contract with O’Reilly has been canceled. The good news: I’m going to proceed with self-publishing (possibly on Lulu or even Amazon) a somewhat reduced scope Solr-only Reference Guide (with hints of Lucene). The scope of the previous effort was too great, even for O’Reilly – a book larger than 800 pages (or even 600) that was heavy on reference and lighter on “guide” just wasn’t fitting in with their traditional “guide” model. In truth, Solr is just too complex for a simple guide that covers it all, let alone Lucene as well. I’ll announce more details in the coming weeks, but I expect to publish an e-book-only version of the book, focused on Solr reference (and plenty of guide as well), possibly on Lulu, plus eventually publish 4-8 individual print volumes for people who really want the paper. One model I may pursue is to offer the current, incomplete, raw, rough, draft as a $7.99 e-book, with the promise of updates every two weeks or a month as new and revised content and new releases of Solr become available. Maybe the individual e-book volumes would be $2 or $3. These are just preliminary ideas. Feel free to let me know what seems reasonable or excessive. For paper: Do people really want perfect bound, or would you prefer spiral bound that lies flat and folds back easily? I suppose we could offer both – which should be considered “premium”? I’ll announce more details next week. The immediate goal will be to get the “raw rough draft” available to everyone ASAP. For those of you who have been early reviewers – your effort will not have been in vain. I have all your comments and will address them over the next month or two or three. Just for some clarity, the existing Solr Wiki and even the recent contribution of the LucidWorks Solr Reference to Apache really are still great contributions to general knowledge about Solr, but the book is intended to go much deeper into detail, especially with loads of examples and a lot more narrative guide. For example, the book has a complete list of the analyzer filters, each with a clean one-liner description. Ditto for every parameter (although I would note that the LucidWorks Solr Reference does a decent job of that as well.) Maybe, eventually, everything in the book COULD (and will) be integrated into the standard Solr doc, but until then, a single, integrated reference really is sorely needed. And, the book has a lot of narrative guide and walking through examples as well. Over time, I’m sure both will evolve. And just to be clear, the book is not a simple repurposing of the Solr wiki content – EVERY description of everything has been written fresh, from scratch. So, for example, analyzer filters get both short one-liner summary descriptions as well as more detailed descriptions, plus formal attribute specifications and numerous examples, including sample input and outputs (the LucidWorks Solr Reference does a better job with examples as well.) The book has been written in parallel with branch_4x and that will continue. -- Jack Krupansky
Re: Can anyone explain this Solr query behavior?
Hi Erick, Here's the output after turning on the debug flag: *q=text:()debug=query* yields response lst name=responseHeader int name=status0/int int name=QTime17/int lst name=params str name=indenttrue/str str name=qtext:()/str str name=debugquery/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0/result lst name=debug str name=rawquerystringtext:()/str str name=querystringtext:()/str str name=parsedquery(+())/no_coord/str str name=parsedquery_toString+()/str str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ /lst /response *q=doc-id:3000debug=query* yields response lst name=responseHeader int name=status0/int int name=QTime17/int lst name=params str name=qdoc-id:3000/str str name=debugquery/str /lst /lst result name=response numFound=1 start=0 maxScore=11.682044 doc : : /doc /result lst name=debug str name=rawquerystringdoc-id:3000/str str name=querystringdoc-id:3000/str str name=parsedquery(+doc-id:3000)/no_coord/str str name=parsedquery_toString+doc-id:`#8;#0;#0;#23;8/str str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ /lst /response *q=doc-id:3000 AND text:()debug=query* yields response lst name=responseHeader int name=status0/int int name=QTime23/int lst name=params str name=qdoc-id:3000 AND text:()/str str name=debugquery/str /lst /lst result name=response numFound=631647 start=0 maxScore=8.056607 doc : /doc : /doc doc : /doc doc : /doc doc : /doc doc : /doc /result lst name=debug str name=rawquerystringdoc-id:3000 AND text:()/str str name=querystringdoc-id:3000 AND text:()/str str name=parsedquery (+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0/no_coord /str str name=parsedquery_toString +(doc-id:`#8;#0;#0;#23;8 (Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0)) /str str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ /lst /response *solrconfig.xml:* requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str str name=defTypeedismax/str str name=qftext^1.0 Title^3.0 Classification^2.0 Contributors^2.0 Publisher^2.0/str /lst *schema.xml:* field name=text type=my_text indexed=true stored=false required= false/* * dynamicField name=* type=my_text indexed=true stored=true multiValued=false/ fieldType name=my_text class=solr.TextField analyzer type=index class=MyAnalyzer/ analyzer type=query class=MyAnalyzer/ analyzer type=multiterm class=MyAnalyzer/ /fieldType * * *Note:* MyAnalyzer among few other customizations, uses WhitespaceTokenizer and LoweCaseFilter Thanks a lot. -Shankar On Thu, May 23, 2013 at 4:34 AM, Erick Erickson erickerick...@gmail.comwrote: Please post the results of adding debug=query to the URL. That'll tell us what the query parser spits out which is much easier to analyze. Best Erick On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju shan...@ebrary.com wrote: This query returns 0 documents: *q=(+Title:() +Classification:() +Contributors:() +text:())* This returns 1 document: *q=doc-id:3000* And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 AND (+Title:() +Classification:() +Contributors:() +text:())* Am I missing something here? Can someone please explain? I am using Solr 4.2.1 Thanks -Shankar -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
FW: howto: get the value from a multivalued field?
hi, all - how can I retrieve the value out of a multivalued field in a customized function query?I want to implement a function query whose first parameter is a multi-value fileld, from which values are retrieved and manipulated. however, I used the code but get exceptions - can not use FieldCache on multivalued field /public ValueSource parse(FunctionQParser fp) throws ParseException { try { ValueSource vs = fp.parseValueSource(); } catch (...) { } Thanks. - Frank
Re: howto: get the value from a multivalued field?
Yeah, you can't do that. You'll need to keep a copy of whichever value from the multi-valued field you wish to be considered the value in a separate, non-multi-valued field. Possibly using an update processor, such as one of: FirstFieldValueUpdateProcessorFactory, LastFieldValueUpdateProcessorFactory, MaxFieldValueUpdateProcessorFactory, MinFieldValueUpdateProcessorFactory -- Jack Krupansky -Original Message- From: world hello Sent: Thursday, May 23, 2013 7:50 PM To: solr-user@lucene.apache.org Subject: FW: howto: get the value from a multivalued field? hi, all - how can I retrieve the value out of a multivalued field in a customized function query?I want to implement a function query whose first parameter is a multi-value fileld, from which values are retrieved and manipulated. however, I used the code but get exceptions - can not use FieldCache on multivalued field /public ValueSource parse(FunctionQParser fp) throws ParseException { try { ValueSource vs = fp.parseValueSource(); } catch (...) { } Thanks. - Frank
RE: howto: get the value from a multivalued field?
thanks, jack. could you please give more details on using update processor? Thanks. - frank From: j...@basetechnology.com To: solr-user@lucene.apache.org Subject: Re: howto: get the value from a multivalued field? Date: Thu, 23 May 2013 20:06:34 -0400 Yeah, you can't do that. You'll need to keep a copy of whichever value from the multi-valued field you wish to be considered the value in a separate, non-multi-valued field. Possibly using an update processor, such as one of: FirstFieldValueUpdateProcessorFactory, LastFieldValueUpdateProcessorFactory, MaxFieldValueUpdateProcessorFactory, MinFieldValueUpdateProcessorFactory -- Jack Krupansky -Original Message- From: world hello Sent: Thursday, May 23, 2013 7:50 PM To: solr-user@lucene.apache.org Subject: FW: howto: get the value from a multivalued field? hi, all - how can I retrieve the value out of a multivalued field in a customized function query?I want to implement a function query whose first parameter is a multi-value fileld, from which values are retrieved and manipulated. however, I used the code but get exceptions - can not use FieldCache on multivalued field /public ValueSource parse(FunctionQParser fp) throws ParseException { try { ValueSource vs = fp.parseValueSource(); } catch (...) { } Thanks. - Frank
Re: Can anyone explain this Solr query behavior?
(+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0/no_coord You're using edismax, not lucene. So AND is being considered as a search term, not an operator, and the word 'and' probably exists in 631580 documents. Why is it triggering dismax? Probably because field:() is not valid syntax, so edismax is dropping to dismax because it isn't a valid lucene query. What do you expect text:() to do? If you want to match any docs that have a value in the text field, use q=text:[* TO *] To match docs that *don't* have a value in the text field: q=-text[* TO *] Upayavira On Fri, May 24, 2013, at 12:23 AM, Shankar Sundararaju wrote: Hi Erick, Here's the output after turning on the debug flag: *q=text:()debug=query* yields response lst name=responseHeader int name=status0/int int name=QTime17/int lst name=params str name=indenttrue/str str name=qtext:()/str str name=debugquery/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0/result lst name=debug str name=rawquerystringtext:()/str str name=querystringtext:()/str str name=parsedquery(+())/no_coord/str str name=parsedquery_toString+()/str str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ /lst /response *q=doc-id:3000debug=query* yields response lst name=responseHeader int name=status0/int int name=QTime17/int lst name=params str name=qdoc-id:3000/str str name=debugquery/str /lst /lst result name=response numFound=1 start=0 maxScore=11.682044 doc : : /doc /result lst name=debug str name=rawquerystringdoc-id:3000/str str name=querystringdoc-id:3000/str str name=parsedquery(+doc-id:3000)/no_coord/str str name=parsedquery_toString+doc-id:`#8;#0;#0;#23;8/str str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ /lst /response *q=doc-id:3000 AND text:()debug=query* yields response lst name=responseHeader int name=status0/int int name=QTime23/int lst name=params str name=qdoc-id:3000 AND text:()/str str name=debugquery/str /lst /lst result name=response numFound=631647 start=0 maxScore=8.056607 doc : /doc : /doc doc : /doc doc : /doc doc : /doc doc : /doc /result lst name=debug str name=rawquerystringdoc-id:3000 AND text:()/str str name=querystringdoc-id:3000 AND text:()/str str name=parsedquery (+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0/no_coord /str str name=parsedquery_toString +(doc-id:`#8;#0;#0;#23;8 (Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0)) /str str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ /lst /response *solrconfig.xml:* requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str str name=defTypeedismax/str str name=qftext^1.0 Title^3.0 Classification^2.0 Contributors^2.0 Publisher^2.0/str /lst *schema.xml:* field name=text type=my_text indexed=true stored=false required= false/* * dynamicField name=* type=my_text indexed=true stored=true multiValued=false/ fieldType name=my_text class=solr.TextField analyzer type=index class=MyAnalyzer/ analyzer type=query class=MyAnalyzer/ analyzer type=multiterm class=MyAnalyzer/ /fieldType * * *Note:* MyAnalyzer among few other customizations, uses WhitespaceTokenizer and LoweCaseFilter Thanks a lot. -Shankar On Thu, May 23, 2013 at 4:34 AM, Erick Erickson erickerick...@gmail.comwrote: Please post the results of adding debug=query to the URL. That'll tell us what the query parser spits out which is much easier to analyze. Best Erick On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju shan...@ebrary.com wrote: This query returns 0 documents: *q=(+Title:() +Classification:() +Contributors:() +text:())* This returns 1 document: *q=doc-id:3000* And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 AND (+Title:() +Classification:() +Contributors:() +text:())* Am I missing something here? Can someone please explain? I am using Solr 4.2.1 Thanks -Shankar -- Regards, *Shankar Sundararaju *Sr. Software Architect ebrary, a ProQuest company 410 Cambridge Avenue, Palo Alto, CA 94306 USA shan...@ebrary.com | www.ebrary.com | 650-475-8776 (w) | 408-426-3057 (c)
Re: Can anyone explain this Solr query behavior?
Okay... sorry I wasn't paying close enough attention. What is happening is that the empty parentheses are illegal in Lucene query syntax: str name=msgorg.apache.solr.search.SyntaxError: Cannot parse 'id:* AND text:()': Encountered ) ) at line 1, column 15. Was expecting one of: lt;NOTgt; ... + ... - ... lt;BAREOPERgt; ... ( ... * ... lt;QUOTEDgt; ... lt;TERMgt; ... lt;PREFIXTERMgt; ... lt;WILDTERMgt; ... lt;REGEXPTERMgt; ... [ ... { ... lt;LPARAMSgt; ... lt;NUMBERgt; ... lt;TERMgt; ... * ... /str int name=code400/int Edismax traps such errors and then escapes the query so that Lucene will no longer throw an error. In this case, it puts quotes around the AND operator, which is why you see and included in the parsed query as if it were a term. And I believe it turns text:() into text:(), which makes the original Lucene error go away, but the () analyzes to nothing and generates no term in the query. So, fix your syntax error and the anomaly should go away. -- Jack Krupansky -Original Message- From: Shankar Sundararaju Sent: Thursday, May 23, 2013 7:23 PM To: solr-user@lucene.apache.org Subject: Re: Can anyone explain this Solr query behavior? Hi Erick, Here's the output after turning on the debug flag: *q=text:()debug=query* yields response lst name=responseHeader int name=status0/int int name=QTime17/int lst name=params str name=indenttrue/str str name=qtext:()/str str name=debugquery/str /lst /lst result name=response numFound=0 start=0 maxScore=0.0/result lst name=debug str name=rawquerystringtext:()/str str name=querystringtext:()/str str name=parsedquery(+())/no_coord/str str name=parsedquery_toString+()/str str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ /lst /response *q=doc-id:3000debug=query* yields response lst name=responseHeader int name=status0/int int name=QTime17/int lst name=params str name=qdoc-id:3000/str str name=debugquery/str /lst /lst result name=response numFound=1 start=0 maxScore=11.682044 doc : : /doc /result lst name=debug str name=rawquerystringdoc-id:3000/str str name=querystringdoc-id:3000/str str name=parsedquery(+doc-id:3000)/no_coord/str str name=parsedquery_toString+doc-id:`#8;#0;#0;#23;8/str str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ /lst /response *q=doc-id:3000 AND text:()debug=query* yields response lst name=responseHeader int name=status0/int int name=QTime23/int lst name=params str name=qdoc-id:3000 AND text:()/str str name=debugquery/str /lst /lst result name=response numFound=631647 start=0 maxScore=8.056607 doc : /doc : /doc doc : /doc doc : /doc doc : /doc doc : /doc /result lst name=debug str name=rawquerystringdoc-id:3000 AND text:()/str str name=querystringdoc-id:3000 AND text:()/str str name=parsedquery (+(doc-id:3000 DisjunctionMaxQuery((Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0/no_coord /str str name=parsedquery_toString +(doc-id:`#8;#0;#0;#23;8 (Publisher:and^2.0 | text:and | Classification:and^2.0 | Contributors:and^2.0 | Title:and^3.0)) /str str name=QParserExtendedDismaxQParser/str null name=altquerystring/ null name=boost_queries/ arr name=parsed_boost_queries/ null name=boostfuncs/ /lst /response *solrconfig.xml:* requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dftext/str str name=defTypeedismax/str str name=qftext^1.0 Title^3.0 Classification^2.0 Contributors^2.0 Publisher^2.0/str /lst *schema.xml:* field name=text type=my_text indexed=true stored=false required= false/* * dynamicField name=* type=my_text indexed=true stored=true multiValued=false/ fieldType name=my_text class=solr.TextField analyzer type=index class=MyAnalyzer/ analyzer type=query class=MyAnalyzer/ analyzer type=multiterm class=MyAnalyzer/ /fieldType * * *Note:* MyAnalyzer among few other customizations, uses WhitespaceTokenizer and LoweCaseFilter Thanks a lot. -Shankar On Thu, May 23, 2013 at 4:34 AM, Erick Erickson erickerick...@gmail.comwrote: Please post the results of adding debug=query to the URL. That'll tell us what the query parser spits out which is much easier to analyze. Best Erick On Wed, May 22, 2013 at 12:16 PM, Shankar Sundararaju shan...@ebrary.com wrote: This query returns 0 documents: *q=(+Title:() +Classification:() +Contributors:() +text:())* This returns 1 document: *q=doc-id:3000* And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 AND (+Title:() +Classification:() +Contributors:() +text:())* Am I missing something here? Can someone please explain? I am using Solr 4.2.1 Thanks -Shankar -- Regards, *Shankar Sundararaju *Sr. Software
RE: Question about Coordination factor
Thank you for your comment. Due to historical reasons, Our organization uses trunk version of Solr-4.0, which is a bit old and unofficial version. And edismax always returns 1/2 as a coordination value. So I wanted to make sure what this value would be like. This will be a good reason to upgrade our system to Solr 4.3 or later version. Thank you very much! -Kazu From: ans...@anshumgupta.net Date: Thu, 23 May 2013 16:58:46 +0530 Subject: Re: Question about Coordination factor To: solr-user@lucene.apache.org This looks correct. On Thu, May 23, 2013 at 7:37 AM, Kazuaki Hiraga kazuaki.hir...@gmail.comwrote: Hello Folks, Sorry, my last email was a bit messy, so I am sending it again. I have a question about coordination factor to ensure my understanding of this value is correct. If I have documents that contain some keywords like the following: Doc1: A, B, C Doc2: A, C Doc3: B, C And my query is A OR B OR C OR D. In this case, Coord factor value for each documents will be the following: Doc1: 3/4 Doc2: 2/4 Doc3: 2/4 In the same fashion, respective value of coord factor is the following if I have a query C OR D: Doc1: 1/2 Doc2: 1/2 Doc3: 1/2 Is this correct? or Did I miss something? Please correct me if I am wrong. Regards, Kazuaki -- Anshum Gupta http://www.anshumgupta.net