Re: Anybody uses Solr JMX?
Thank you everybody for the links and explanations. I am still curious whether JMX exposes more details than the Admin UI? I am thinking of a troubleshooting context, rather than long-term monitoring one. JMX is multi-purpose. So, in principle, it can offer considerably more. I've seen discussions about quite many JMX variables about the activity of the Garbage Collector (e.g. young-generation, …) which I can't remember having seen in the Admin UI. The advantage of JMX is the interface and what you can do with it. For example plotting values is not there in the admin UI and it really can help to make a difference into detecting, say, the cause of sudden bursts. paul
Re: Solr relevancy tuning
One good thing about kelvin it's more a programmatic task, so you could execute the scripts after a few changes/deployment and get a general idea if the new changes has impacted into the search experience; yeah sure the changing catalog it's still a problem but I kind of like to be able to execute a few commands and presto get it done. This could become a must-run test in the test suite of the app. I kind of do this already but testing from the user interface, using the test library provided by symfony2 (framework I'm using) and the functional tests. It's not test-driven-search-relevancy perse but we ensure not to mess up with some basic queries we use to test the search feature. - Original Message - From: Giovanni Bricconi giovanni.bricc...@banzai.it To: solr-user solr-user@lucene.apache.org Cc: Ahmet Arslan iori...@yahoo.com Sent: Friday, April 11, 2014 5:15:56 AM Subject: Re: Solr relevancy tuning Hello Doug I have just watched the quepid demonstration video, and I strongly agree with your introduction: it is very hard to involve marketing/business people in repeated testing session, and speadsheets or other kind of files are not the right tool to use. Currenlty I'm quite alone in my tuning task and having a visual approach could be benefical for me, you are giving me many good inputs! I see that kelvin (my scripted tool) and queepid follows the same path. In queepid someone quickly whatches the results and applies colours to result, in kelvin you enter one on more queries (network cable, ethernet cable) and states that the result must contains ethernet in the title, or must come from a list of product categories. I also do diffs of results, before and after changes, to check what is going on; but I have to do that in a very unix-scripted way. Have you considered of placing a counter of total red/bad results in quepid? I use this index to have a quick overview of changes impact across all queries. Actually I repeat tests in production from times to time, and if I see the kelvin temperature rising (the number of errors going up) I know I have to check what's going on because new products maybe are having a bad impact on the index. I also keep counters of products with low quality images/no images at all or too short listings, sometimes are useful to undestand better what will happen if you change some bq/fq in the application. I see also that after changes in quepid someone have to check gray results and assign them a colour, in kelvin case sometimes the conditions can do a bit of magic (new product names still contains SM-G900F) but sometimes can introduce false errors (the new product name contains only Galaxy 5 and not the product code SM-G900F). So some checks are needed but with quepid everybody can do the check, with kelvin you have to change some line of a script, and not everybody is able/willing to do that. The idea of a static index is a good suggestion, I will try to have it in the next round of search engine improvement. Thank you Doug! 2014-04-09 17:48 GMT+02:00 Doug Turnbull dturnb...@opensourceconnections.com: Hey Giovanni, nice to meet you. I'm the person that did the Test Driven Relevancy talk. We've got a product Quepid (http://quepid.com) that lets you gather good/bad results for queries and do a sort of test driven development against search relevancy. Sounds similar to your existing scripted approach. Have you considered keeping a static catalog for testing purposes? We had a project with a lot of updates and date-dependent relevancy. This lets you create some test scenarios against a static data set. However, one downside is you can't recreate problems in production in your test setup exactly-- you have to find a similar issue that reflects what you're seeing. Cheers, -Doug On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: Thank you for the links. The book is really useful, I will definitively have to spend some time reformatting the logs to to access number of result founds, session id and much more. I'm also quite happy that my test cases produces similar results to the precision reports shown at the beginning of the book. Giovanni 2014-04-09 12:59 GMT+02:00 Ahmet Arslan iori...@yahoo.com: Hi Giovanni, Here are some relevant pointers : http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy http://rosenfeldmedia.com/books/search-analytics/ http://www.sematext.com/search-analytics/index.html Ahmet On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: It is about one year I'm working on an e-commerce site, and unfortunately I have no information retrieval background, so probably I am missing some important practices about relevance tuning and search engines. During this period I had
stats pse-udo field score
hey,everyone, In our application we are using Solr 4.6. I had the idea to use stats component for score pse-udo field. Is it exists workaround of using …stats=truestats.field=score... ? thanks a lot! -- View this message in context: http://lucene.472066.n3.nabble.com/stats-pse-udo-field-score-tp4134635.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr does not recognize language
Thank you very much for you help Ahmet. However the language detection is still not workin. :( My solrconfig.xml didn't contain that lst section inside the update requestHandler. That's the content I added: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Now, your suggested query http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns response lst name=responseHeader int name=status0/int int name=QTime14/int /lst /response And there is still no lang field in my documents. Any idea what am I doing wrong? On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, solr/update should be used, not /solr/select curl 'http://localhost:8983/solr/update?commit=trueupdate.chain=langid' By the way don't you have following definition in your solrconfig.xml? requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler On Tuesday, April 29, 2014 4:50 PM, Victor Pascual vic...@mobilemediacontent.com wrote: Hi Ahmet, thanks for your reply. Adding update.chain=langid to my query doesn't work: IP:8080/solr/select/?q=*%3A*update.chain=langid Regarding defining the chain in an UpdateRequestHandler... sorry for the lame question but shall I paste those three lines to solrconfig.xml, or shall I add them somewhere else? There is not UpdateRequestHandler in my solrconfig. Thanks! On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Did you attach your chain to a UpdateRequestHandler? You can do it by adding update.chain=langid to the URL or defining it in a defaults section as follows lst name=defaults str name=update.chainlangid/str /lst On Tuesday, April 29, 2014 3:18 PM, Victor Pascual vic...@mobilemediacontent.com wrote: Dear all, I'm a new user of Solr. I've managed to index a bunch of documents (in fact, they are tweets) and everything works quite smoothly. Nevertheless it looks like Solr doesn't detect the language of my documents nor remove stopwords accordingly so I can extract the most frequent terms. I've added this piece of XML to my solrconfig.xml as well as the Tika lib jars. updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain There is no error in the tomcat log file, so I have no clue of why this isn't working. Any hint on how to solve this problem will be much appreciated!
Re: Solr does not recognize language
i think you should check your scheme.xml and solrconfig.xml encoding format = utf-8。 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134643.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr does not recognize language
Why this should be a problem? Both files start with ?xml version=1.0 encoding=UTF-8 ? On Mon, May 5, 2014 at 11:44 AM, Frankcis finalxc...@gmail.com wrote: i think you should check your scheme.xml and solrconfig.xml encoding format = utf-8。 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134643.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr does not recognize language
because if your encoding format doesn't both utf-8, building index will lead to messy code, of course, you will not get the expected result. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-does-not-recognize-language-tp4133711p4134647.html Sent from the Solr - User mailing list archive at Nabble.com.
Wildcard malfunctioning
Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or Im misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But Im getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of naranja q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!
interpretation of cat_rank in http://people.apache.org/~hossman/ac2012eu/
Hi everybody can anyone give me a suitable interpretation for cat_rank in http://people.apache.org/~hossman/ac2012eu/ slide 15 thanks
Re: Solr does not recognize language
Hi Victor, How do you index your documents? Your last config looks correct. However for example if you use data import handler you need to add update.chain there too. Same as extraction request hadler if you are using sole-cell. requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/username/data-config.xml/str str name=update.chainlangid/str /lst /requestHandler By the way The URL http://localhost:8080/solr/update?commit=trueupdate.chain=langid was just an example and meant to feed xml update messages by POST method. Not to use in a browser. Ahmet On Monday, May 5, 2014 11:04 AM, Victor Pascual vic...@mobilemediacontent.com wrote: Thank you very much for you help Ahmet. However the language detection is still not workin. :( My solrconfig.xml didn't contain that lst section inside the update requestHandler. That's the content I added: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Now, your suggested query http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns response lst name=responseHeader int name=status0/int int name=QTime14/int /lst /response And there is still no lang field in my documents. Any idea what am I doing wrong? On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, solr/update should be used, not /solr/select curl 'http://localhost:8983/solr/update?commit=trueupdate.chain=langid' By the way don't you have following definition in your solrconfig.xml? requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler On Tuesday, April 29, 2014 4:50 PM, Victor Pascual vic...@mobilemediacontent.com wrote: Hi Ahmet, thanks for your reply. Adding update.chain=langid to my query doesn't work: IP:8080/solr/select/?q=*%3A*update.chain=langid Regarding defining the chain in an UpdateRequestHandler... sorry for the lame question but shall I paste those three lines to solrconfig.xml, or shall I add them somewhere else? There is not UpdateRequestHandler in my solrconfig. Thanks! On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Did you attach your chain to a UpdateRequestHandler? You can do it by adding update.chain=langid to the URL or defining it in a defaults section as follows lst name=defaults str name=update.chainlangid/str /lst On Tuesday, April 29, 2014 3:18 PM, Victor Pascual vic...@mobilemediacontent.com wrote: Dear all, I'm a new user of Solr. I've managed to index a bunch of documents (in fact, they are tweets) and everything works quite smoothly. Nevertheless it looks like Solr doesn't detect the language of my documents nor remove stopwords accordingly so I can extract the most frequent terms. I've added this piece of XML to my solrconfig.xml as well as the Tika lib jars. updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain There is no error in the tomcat log file, so I have no clue of why this isn't working. Any hint on how to solve this problem will be much appreciated!
Re: Wildcard malfunctioning
Hi Roman, What you are experiencing is a OK and known. Stemming and wildcard searches could be counter intuitive sometimes. But luckily remedy is available. Use the following filters, and your wildcard searches will be happy. Please not that this change will require solr-restart and re-index. filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.SpanishLightStemFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ Regarding diacritics, please see http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ASCIIFoldingFilterFactory and http://wiki.apache.org/solr/MultitermQueryAnalysis Ahmet On Monday, May 5, 2014 2:01 PM, Román González rgonza...@normagricola.com wrote: Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or I’m misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But I’m getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of “naranja” q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!
Re: Wildcard malfunctioning
Generally, stemming filters are not supported when wildcards are present. Only a small subset of filters work with wildcards, such as the case conversion filters. But, you stay that you are using the stemmer to remove diacritical marks... you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory. -- Jack Krupansky -Original Message- From: Román González Sent: Monday, May 5, 2014 7:00 AM To: solr-user@lucene.apache.org Subject: Wildcard malfunctioning Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or I’m misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But I’m getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of “naranja” q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!
RE: Wildcard malfunctioning
SOLVED! First solution I tried (the Ahmet's one) worked fine! Thank you! -Mensaje original- De: Jack Krupansky [mailto:j...@basetechnology.com] Enviado el: lunes, 05 de mayo de 2014 13:19 Para: solr-user@lucene.apache.org; rgonza...@normagricola.com Asunto: Re: Wildcard malfunctioning Generally, stemming filters are not supported when wildcards are present. Only a small subset of filters work with wildcards, such as the case conversion filters. But, you stay that you are using the stemmer to remove diacritical marks... you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory. -- Jack Krupansky -Original Message- From: Román González Sent: Monday, May 5, 2014 7:00 AM To: solr-user@lucene.apache.org Subject: Wildcard malfunctioning Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or I’m misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But I’m getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of “naranja” q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!
Solr Not Searching while INDEXING the DATA
I am not able to search for the data while indexing. Indexing is done via the dataimport handler. While searching for the documents (in between indexing is happening), it gives the broken pipe exception and wont search anything. What should be the proper solution for this problem? Am I missing something? Help me! -- Regards, *Sohan Kalsariya*
Explain Solr Query Execution
How will a query like below will get executed, In which order I understand that when this query is executed fields mentioned in fieldList will be returned. What I don't understand is how the samplestring1 and samplestring2 will get searched with the query fields specified I think I will be able to understand how the search happens if this can be illustrated in SQL ( Just to understand what happens behind the scene) Following is the query. Please have a look at it and let me know how this works internally. query=samplestring1 AND samplestring2 defType: edismax queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7 fieldList: Column1, Column2 resultRows: 10 startRow: 0 P.S samplestring1 AND samplestring2 are some test strings in the query Sample of Schema for fields fieldType name=sampletype1 class=solr.TextField positionIncrementGap=100analyzer type=indextokenizer class=solr.KeywordTokenizerFactory/filter class=solr.LowerCaseFilterFactory/filter class=solr.NGramFilterFactory minGramSize=5 maxGramSize=10//analyzeranalyzer type=querytokenizer class=solr.KeywordTokenizerFactory/filter class=solr.LowerCaseFilterFactory//analyzer/fieldType fieldtype name=sampletype2 class=solr.TextField sortMissingLast=true omitNorms=trueanalyzertokenizer class=solr.KeywordTokenizerFactory/filter class=solr.LowerCaseFilterFactory//analyzer/fieldtype field name=Field1 compressed=true type=sampletype1 multiValued=false indexed=true stored=true required=true omitNorms=true/ field name=Field2 compressed=true type=sampletype1 multiValued=false indexed=true stored=true required=true omitNorms=true/ field name=Exact_Field1 omitPositions=true termVectors=false omitTermFreqAndPositions=true compressed=true type=sampletype2 multiValued=false indexed=true stored=true required=true omitNorms=true/ field name=Exact_Field2 omitPositions=true termVectors=false omitTermFreqAndPositions=true compressed=true type=sampletype2 multiValued=false indexed=true stored=true required=false omitNorms=true/ copyField source=Field1 dest=Exact_Field1/ copyField source=Field2 dest=Exact_Field2/ -- View this message in context: http://lucene.472066.n3.nabble.com/Explain-Solr-Query-Execution-tp4134681.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr does not recognize language
Hi there, I'm indexing my documents using mysolr. I mainly generate a lost of json objects and the run: solr.update(documents_array,'json') On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Victor, How do you index your documents? Your last config looks correct. However for example if you use data import handler you need to add update.chain there too. Same as extraction request hadler if you are using sole-cell. requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/username/data-config.xml/str str name=update.chainlangid/str /lst /requestHandler By the way The URL http://localhost:8080/solr/update?commit=trueupdate.chain=langid was just an example and meant to feed xml update messages by POST method. Not to use in a browser. Ahmet On Monday, May 5, 2014 11:04 AM, Victor Pascual vic...@mobilemediacontent.com wrote: Thank you very much for you help Ahmet. However the language detection is still not workin. :( My solrconfig.xml didn't contain that lst section inside the update requestHandler. That's the content I added: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Now, your suggested query http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns response lst name=responseHeader int name=status0/int int name=QTime14/int /lst /response And there is still no lang field in my documents. Any idea what am I doing wrong? On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, solr/update should be used, not /solr/select curl 'http://localhost:8983/solr/update?commit=trueupdate.chain=langid' By the way don't you have following definition in your solrconfig.xml? requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler On Tuesday, April 29, 2014 4:50 PM, Victor Pascual vic...@mobilemediacontent.com wrote: Hi Ahmet, thanks for your reply. Adding update.chain=langid to my query doesn't work: IP:8080/solr/select/?q=*%3A*update.chain=langid Regarding defining the chain in an UpdateRequestHandler... sorry for the lame question but shall I paste those three lines to solrconfig.xml, or shall I add them somewhere else? There is not UpdateRequestHandler in my solrconfig. Thanks! On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Did you attach your chain to a UpdateRequestHandler? You can do it by adding update.chain=langid to the URL or defining it in a defaults section as follows lst name=defaults str name=update.chainlangid/str /lst On Tuesday, April 29, 2014 3:18 PM, Victor Pascual vic...@mobilemediacontent.com wrote: Dear all, I'm a new user of Solr. I've managed to index a bunch of documents (in fact, they are tweets) and everything works quite smoothly. Nevertheless it looks like Solr doesn't detect the language of my documents nor remove stopwords accordingly so I can extract the most frequent terms. I've added this piece of XML to my solrconfig.xml as well as the Tika lib jars. updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain There is no error in the tomcat log file, so I have no clue of why this isn't working. Any hint on how to solve this problem will be much appreciated!
Help to Understand a Solr Query
Hi All I am completely new to solr and hoping to understand the basics. Can one of you help me to understand what the following query does, in which order it is getting executed I understand that when this query is executed fields mentioned in fieldList will be returned. What I don't understand is how the samplestring1 and samplestring2 will get searched with the query fields specified I think I will be able to understand how the search happens if this can be illustrated in SQL ( Just to understand what happens behind the scene) Following is the query. Please have a look at it and let me know how this works internally. query=samplestring1 AND samplestring2 defType: edismax queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7 fieldList: Column1, Column2 resultRows: 10 startRow: 0 P.S samplestring1 AND samplestring2 are some test strings in the query Sample of Schema for fields fieldType name=sampletype1 class=solr.TextField positionIncrementGap=100analyzer type=indextokenizer class=solr.KeywordTokenizerFactory/filter class=solr.LowerCaseFilterFactory/filter class=solr.NGramFilterFactory minGramSize=5 maxGramSize=10//analyzeranalyzer type=querytokenizer class=solr.KeywordTokenizerFactory/filter class=solr.LowerCaseFilterFactory//analyzer/fieldType fieldtype name=sampletype2 class=solr.TextField sortMissingLast=true omitNorms=trueanalyzertokenizer class=solr.KeywordTokenizerFactory/filter class=solr.LowerCaseFilterFactory//analyzer/fieldtype field name=Field1 compressed=true type=sampletype1 multiValued=false indexed=true stored=true required=true omitNorms=true/ field name=Field2 compressed=true type=sampletype1 multiValued=false indexed=true stored=true required=true omitNorms=true/ field name=Exact_Field1 omitPositions=true termVectors=false omitTermFreqAndPositions=true compressed=true type=sampletype2 multiValued=false indexed=true stored=true required=true omitNorms=true/ field name=Exact_Field2 omitPositions=true termVectors=false omitTermFreqAndPositions=true compressed=true type=sampletype2 multiValued=false indexed=true stored=true required=false omitNorms=true/ copyField source=Field1 dest=Exact_Field1/ copyField source=Field2 dest=Exact_Field2/ -- View this message in context: http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Help to Understand a Solr Query
Read up on the edismax query parser first: http://wiki.apache.org/solr/ExtendedDisMax The ^ operator is known as boosting or field boosting and is used to influence document scores for relevancy. It has no analog in SQL. -- Jack Krupansky -Original Message- From: nativecoder Sent: Monday, May 5, 2014 9:11 AM To: solr-user@lucene.apache.org Subject: Help to Understand a Solr Query Hi All I am completely new to solr and hoping to understand the basics. Can one of you help me to understand what the following query does, in which order it is getting executed I understand that when this query is executed fields mentioned in fieldList will be returned. What I don't understand is how the samplestring1 and samplestring2 will get searched with the query fields specified I think I will be able to understand how the search happens if this can be illustrated in SQL ( Just to understand what happens behind the scene) Following is the query. Please have a look at it and let me know how this works internally. query=samplestring1 AND samplestring2 defType: edismax queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7 fieldList: Column1, Column2 resultRows: 10 startRow: 0 P.S samplestring1 AND samplestring2 are some test strings in the query Sample of Schema for fields fieldType name=sampletype1 class=solr.TextField positionIncrementGap=100analyzer type=indextokenizer class=solr.KeywordTokenizerFactory/filter class=solr.LowerCaseFilterFactory/filter class=solr.NGramFilterFactory minGramSize=5 maxGramSize=10//analyzeranalyzer type=querytokenizer class=solr.KeywordTokenizerFactory/filter class=solr.LowerCaseFilterFactory//analyzer/fieldType fieldtype name=sampletype2 class=solr.TextField sortMissingLast=true omitNorms=trueanalyzertokenizer class=solr.KeywordTokenizerFactory/filter class=solr.LowerCaseFilterFactory//analyzer/fieldtype field name=Field1 compressed=true type=sampletype1 multiValued=false indexed=true stored=true required=true omitNorms=true/ field name=Field2 compressed=true type=sampletype1 multiValued=false indexed=true stored=true required=true omitNorms=true/ field name=Exact_Field1 omitPositions=true termVectors=false omitTermFreqAndPositions=true compressed=true type=sampletype2 multiValued=false indexed=true stored=true required=true omitNorms=true/ field name=Exact_Field2 omitPositions=true termVectors=false omitTermFreqAndPositions=true compressed=true type=sampletype2 multiValued=false indexed=true stored=true required=false omitNorms=true/ copyField source=Field1 dest=Exact_Field1/ copyField source=Field2 dest=Exact_Field2/ -- View this message in context: http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Block Join Score Highlighting
I changed the hardcoded BlockJoinChildQParser setting to use the parent scoring and that seems to work. So I think I got rid of the scoring issue :). I also voted for the issue! Didn't find a solution for the highlighting issue at the moment, but I am considering to omit highlighting for now as it also causes the index to grow big quickly as the fields need to be stored to support highlighting. -- View this message in context: http://lucene.472066.n3.nabble.com/Block-Join-Score-Highlighting-tp4134045p4134702.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can't make GET request to solr in android app
thanks, basically I'm running solr on my localhost(computer) and trying to access it through the emulator in eclipse, NOT in the physical phone. Can it be done? -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134706.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Core failure when a lot of processes are indexing
The index is made with the same version of solr, that is searching (4.6.0), the config file (solrconfig.xml) schema.xml is the same too. The only way for me to solve this issue is to let only one process to index at the same time. Wouldnt a layer of message queue resolve this issue? 2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org: On 5/4/2014 9:30 AM, Hakim Benoudjit wrote: Ok. These files contain what you've requested: First (the xml error): http://pastebin.com/ZcagK3T7 Second (java params): http://pastebin.com/JtWQpp6s Third (Solr version): http://pastebin.com/wYdpdsAW Are you running with an index originally built by an earlier version of Solr? If you are, you may be running into a known bug. The last caused by section of the java stacktrace looks similar to the one in this issue -- which is indeed index corruption: https://issues.apache.org/jira/browse/LUCENE-5377 If that's the problem you're experiencing, upgrading your Solr version will hopefully fix it. Simply dropping in the 4.6.1 war file and any contrib jars should cause zero problems for your 4.6.0 install. Upgrading to 4.7.2 or 4.8.0 should be done with more care. Thanks, Shawn -- Hakim Benoudjit.
Re: can't make GET request to solr in android app
Hi, It's not an error if you see my code, there is a catch statement, which contains the FAIL message, it does always show it. -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134709.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard malfunctioning
On 5/5/2014 5:19 AM, Jack Krupansky wrote: But, you stay that you are using the stemmer to remove diacritical marks... you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory. I like ICUFoldingFilterFactory for this, but it does require additional contrib jars (included in the Solr download). It lowercases too. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ICUFoldingFilterFactory Thanks, Shawn
Re: Core failure when a lot of processes are indexing
Is there an option in Solr (solrconfig.xml or somewhere else) to regularize commits to the index. I meant to do a 'sleep' between each commit to the index, when data to-be-indexed is waiting inside a stack. 2014-05-05 15:58 GMT+01:00 Hakim Benoudjit h.benoud...@gmail.com: The index is made with the same version of solr, that is searching (4.6.0), the config file (solrconfig.xml) schema.xml is the same too. The only way for me to solve this issue is to let only one process to index at the same time. Wouldnt a layer of message queue resolve this issue? 2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org: On 5/4/2014 9:30 AM, Hakim Benoudjit wrote: Ok. These files contain what you've requested: First (the xml error): http://pastebin.com/ZcagK3T7 Second (java params): http://pastebin.com/JtWQpp6s Third (Solr version): http://pastebin.com/wYdpdsAW Are you running with an index originally built by an earlier version of Solr? If you are, you may be running into a known bug. The last caused by section of the java stacktrace looks similar to the one in this issue -- which is indeed index corruption: https://issues.apache.org/jira/browse/LUCENE-5377 If that's the problem you're experiencing, upgrading your Solr version will hopefully fix it. Simply dropping in the 4.6.1 war file and any contrib jars should cause zero problems for your 4.6.0 install. Upgrading to 4.7.2 or 4.8.0 should be done with more care. Thanks, Shawn -- Hakim Benoudjit. -- Hakim Benoudjit.
Re: Solr does not recognize language
Hi Victor, I don't know mysolr, I assume you are using /update/json, lets add your chain to defaults section. requestHandler name=/update/json class=solr.UpdateRequestHandler lst name=defaults str name=stream.contentTypeapplication/json/str str name=update.chainlangid/str /lst /requestHandler On Monday, May 5, 2014 4:06 PM, Victor Pascual vic...@mobilemediacontent.com wrote: Hi there, I'm indexing my documents using mysolr. I mainly generate a lost of json objects and the run: solr.update(documents_array,'json') On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Victor, How do you index your documents? Your last config looks correct. However for example if you use data import handler you need to add update.chain there too. Same as extraction request hadler if you are using sole-cell. requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/username/data-config.xml/str str name=update.chainlangid/str /lst /requestHandler By the way The URL http://localhost:8080/solr/update?commit=trueupdate.chain=langid was just an example and meant to feed xml update messages by POST method. Not to use in a browser. Ahmet On Monday, May 5, 2014 11:04 AM, Victor Pascual vic...@mobilemediacontent.com wrote: Thank you very much for you help Ahmet. However the language detection is still not workin. :( My solrconfig.xml didn't contain that lst section inside the update requestHandler. That's the content I added: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Now, your suggested query http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns response lst name=responseHeader int name=status0/int int name=QTime14/int /lst /response And there is still no lang field in my documents. Any idea what am I doing wrong? On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, solr/update should be used, not /solr/select curl 'http://localhost:8983/solr/update?commit=trueupdate.chain=langid' By the way don't you have following definition in your solrconfig.xml? requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler On Tuesday, April 29, 2014 4:50 PM, Victor Pascual vic...@mobilemediacontent.com wrote: Hi Ahmet, thanks for your reply. Adding update.chain=langid to my query doesn't work: IP:8080/solr/select/?q=*%3A*update.chain=langid Regarding defining the chain in an UpdateRequestHandler... sorry for the lame question but shall I paste those three lines to solrconfig.xml, or shall I add them somewhere else? There is not UpdateRequestHandler in my solrconfig. Thanks! On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, Did you attach your chain to a UpdateRequestHandler? You can do it by adding update.chain=langid to the URL or defining it in a defaults section as follows lst name=defaults str name=update.chainlangid/str /lst On Tuesday, April 29, 2014 3:18 PM, Victor Pascual vic...@mobilemediacontent.com wrote: Dear all, I'm a new user of Solr. I've managed to index a bunch of documents (in fact, they are tweets) and everything works quite smoothly. Nevertheless it looks like Solr doesn't detect the language of my documents nor remove stopwords accordingly so I can extract the most frequent terms. I've added this piece of XML to my solrconfig.xml as well as the Tika lib jars. updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain There is no error in the tomcat log file, so I have no clue of why this isn't working. Any hint on how to solve this problem will be much appreciated!
Re: Help to Understand a Solr Query
I already went through the link. I understand about the boosting factor for the relevancy query=samplestring1 AND samplestring2 defType: edismax queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7 fieldList: Column1, Column2 I need to understand whether the samplestring1 and samplestring 2 both will be searched in each field mentioned in queryFields. What I meant was ; e.g (Exact_Field1:samplestring1 AND Exact_Field1:samplestring2) AND (Exact_Field2:samplestring1 AND Exact_Field2:samplestring2) AND (Field1:samplestring1 AND Field1:samplestring2) AND (Field2:samplestring1 AND Field2:samplestring2) Is the above correct ? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134714.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort groups by the sum of the scores of the documents within each group
I don't think so. Solr excels at getting the score of single documents, not aggregation. It's not at all clear to me, though, that the sum of documents' scores is a reasonable thing to sort by. Consider grouping on a very common term. You'd never do this, but group on the elements of a text field. Then the group 'a' would sort to the top almost always (or maybe 'the' or...). This sounds like an XY problem, what use-case are you trying to solve? Best, Erick On Sun, May 4, 2014 at 9:31 PM, frank shi finalxc...@gmail.com wrote: Currently, solr grouping (http://wiki.apache.org/solr/FieldCollapsing) sorts groups by the score of the top document within each group. E.g. [...] groups:[{ groupValue:81cb63020d0339adb019a924b2a9e0c2, doclist:{numFound:9,start:0,maxScore:4.729042,docs:[ { id:7481df771afe39fab368ce19dfeeb528, [...], score:4.729042}, { id:c879e95b5f16343dad8b1248133727c2, [...], score:4.6635237}, { id:485b9aec90fd3ef381f013c51ab6a4df, [...], score:4.347174}] }}, [...] Is there an out-of-the-box way to sort groups by the sum of the scores of the documents within each group? E.g. [...] groups:[{ groupValue:81cb63020d0339adb019a924b2a9e0c2, doclist:{numFound:9,start:0,scoreSum:13.739738,docs:[ { id:7481df771afe39fab368ce19dfeeb528, [...], score:4.729042}, { id:c879e95b5f16343dad8b1248133727c2, [...], score:4.6635237}, { id:485b9aec90fd3ef381f013c51ab6a4df, [...], score:4.347174}] }}, [...] With the release of sorting by Function Query (https://issues.apache.org/jira/browse/SOLR-1297), it seems that there should be a way to use the sum() function (http://wiki.apache.org/solr/FunctionQuery). But it's not quite close enough since the score field is not part of the documents. I feel like I'm close but I'm missing some obvious piece. I'm using Solr 4.6. -- View this message in context: http://lucene.472066.n3.nabble.com/sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134607.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Not Searching while INDEXING the DATA
On 5/5/2014 5:39 AM, Sohan Kalsariya wrote: I am not able to search for the data while indexing. Indexing is done via the dataimport handler. While searching for the documents (in between indexing is happening), it gives the broken pipe exception and wont search anything. What should be the proper solution for this problem? A broken pipe exception means that your client gave up and timed out before Solr could respond, so it closed the TCP connection. When Solr finally was able to respond, the connection was gone, so the servlet container logged that exception. The most common reason for underlying performance issues that causes problems like this is that you don't have enough RAM. It could be something else, of course. A number of possible options are covered on this wiki page: http://wiki.apache.org/solr/SolrPerformanceProblems I see that you asked the same question on the IRC channel early this morning (in my timezone), but you were gone before I was awake to see that. Thanks, Shawn
Re: Core failure when a lot of processes are indexing
You should not be committing from the client by and large, use the autoCommit and autoSoftCommit options in solrconfig.xml. See: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Mon, May 5, 2014 at 8:12 AM, Hakim Benoudjit h.benoud...@gmail.com wrote: Is there an option in Solr (solrconfig.xml or somewhere else) to regularize commits to the index. I meant to do a 'sleep' between each commit to the index, when data to-be-indexed is waiting inside a stack. 2014-05-05 15:58 GMT+01:00 Hakim Benoudjit h.benoud...@gmail.com: The index is made with the same version of solr, that is searching (4.6.0), the config file (solrconfig.xml) schema.xml is the same too. The only way for me to solve this issue is to let only one process to index at the same time. Wouldnt a layer of message queue resolve this issue? 2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org: On 5/4/2014 9:30 AM, Hakim Benoudjit wrote: Ok. These files contain what you've requested: First (the xml error): http://pastebin.com/ZcagK3T7 Second (java params): http://pastebin.com/JtWQpp6s Third (Solr version): http://pastebin.com/wYdpdsAW Are you running with an index originally built by an earlier version of Solr? If you are, you may be running into a known bug. The last caused by section of the java stacktrace looks similar to the one in this issue -- which is indeed index corruption: https://issues.apache.org/jira/browse/LUCENE-5377 If that's the problem you're experiencing, upgrading your Solr version will hopefully fix it. Simply dropping in the 4.6.1 war file and any contrib jars should cause zero problems for your 4.6.0 install. Upgrading to 4.7.2 or 4.8.0 should be done with more care. Thanks, Shawn -- Hakim Benoudjit. -- Hakim Benoudjit.
Re: can't make GET request to solr in android app
On 5/5/2014 9:02 AM, blach wrote: It's not an error if you see my code, there is a catch statement, which contains the FAIL message, it does always show it. In your code, you are not printing the stack trace or throwing the exception. If you want to see it in your own code, you'll need to include code to write out the stacktrace from the exception. If you don't want to do that, you can look on the server log to see what the exception is. Since you are basically writing Java code (I'm aware that Dalvik is not *actually* Java, but I've never written code for android), can you use SolrJ instead of HttpClient? Thanks, Shawn
Re: Help to Understand a Solr Query
dismax means Disjunction Maximum, which means Lucene takes the highest scoring clause (field), for each search term. This is effectively an OR of the clauses. -- Jack Krupansky -Original Message- From: nativecoder Sent: Monday, May 5, 2014 11:21 AM To: solr-user@lucene.apache.org Subject: Re: Help to Understand a Solr Query I already went through the link. I understand about the boosting factor for the relevancy query=samplestring1 AND samplestring2 defType: edismax queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7 fieldList: Column1, Column2 I need to understand whether the samplestring1 and samplestring 2 both will be searched in each field mentioned in queryFields. What I meant was ; e.g (Exact_Field1:samplestring1 AND Exact_Field1:samplestring2) AND (Exact_Field2:samplestring1 AND Exact_Field2:samplestring2) AND (Field1:samplestring1 AND Field1:samplestring2) AND (Field2:samplestring1 AND Field2:samplestring2) Is the above correct ? -- View this message in context: http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134714.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can't make GET request to solr in android app
Yes Im reading about SOLRJ now I wrote this code for it, but its the same problem, in this case all the app is stopping, this is the code String urlString = http://localhost:8983/solr;; SolrServer solr = new HttpSolrServer(urlString); SolrQuery query = new SolrQuery(); query.set(q, mem); QueryResponse response = null; try { response = solr.query(query); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } SolrDocumentList results = response.getResults(); for (int i = 0; i results.size(); ++i) { etxt2.setText((CharSequence) results.get(i)); } -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134735.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can't make GET request to solr in android app
On 5/5/2014 11:05 AM, blach wrote: I wrote this code for it, but its the same problem, in this case all the app is stopping, this is the code String urlString = http://localhost:8983/solr;; SolrServer solr = new HttpSolrServer(urlString); SolrQuery query = new SolrQuery(); query.set(q, mem); QueryResponse response = null; try { response = solr.query(query); } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } SolrDocumentList results = response.getResults(); for (int i = 0; i results.size(); ++i) { etxt2.setText((CharSequence) results.get(i)); } Do you get any output to stderr? Have you looked in the solr logfile to see if there's an error logged there? Note that you should add the core name to the URL -- using a path of just /solr is deprecated in the newest Solr versions. http://localhost:8983/solr/corename Thanks, Shawn
Odd XSLT behavior
Solr 4.7.2 (and 4.6.1) Tomcat 7.0.52 Java 1.7.0_45 (and _55) I'm getting some really odd behavior with some XSLT documents. I've been doing some upgrades to Java Solr and I'm trying to narrow down where the problems are happening. I have a few XSLT docs that I put into the conf/xslt directory for my indexes I haven't changed the in a while, and they were working fine for a 3.X Solr, and seemed to work fine on an earlier 4.X release. The problem is that sometimes I get an error saying that a field can't be found. Here's a slice of the XSLT: xsl:template match=doc xsl:variable name=id select=str[@name='id']/ xsl:variable name=url select=str[@name='url']/ xsl:variable name=title select=str[@name='title']/ xsl:variable name=description select=str[@name='description']/ entry xmlns=http://www.w3.org/2005/Atom; titlexsl:value-of select=str[@name='title']//title link xsl:attribute name=hrefxsl:value-of select=str[@name='url'] //xsl:attribute /link summary xsl:choose xsl:when test=string-length($description) gt; 255 xsl:value-of select=concat(substring($description, 1, 255), '...')/ /xsl:when xsl:otherwise xsl:value-of select=$description/ /xsl:otherwise /xsl:choose /summary . /xsl:template I get messages saying that it can't find the description variable. This was working perfectly well, but I can't seem to narrow down a specific change that caused this. Caused by: javax.xml.transform.TransformerConfigurationException: solrres:/xslt/osatom.xsl: line 115: Variable or parameter 'description' is undefined. at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:964) at org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110) Has anyone run into a problem like this? Thanks! -- Chris
Re: What are the best practices on Multiple Language support in Solr Cloud ?
Thanks Nicole. Leveraging dynamic field definitions is a great idea. Probably work for me as I've a bunch of fields which are indexed as String. Just curious about the sharding, are you using Solr Cloud. I thought of taking the dedicated shard / core route , but then, as using a composite key (for dedup), managing dedicated core can cause issues at times. As far as single field representation, thanks for validating my concern. Probably its best to use when you've to address a multi-lingual search. -- View this message in context: http://lucene.472066.n3.nabble.com/What-are-the-best-practices-on-Multiple-Language-support-in-Solr-Cloud-tp4134006p4134743.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Core failure when a lot of processes are indexing
I've tried it it worked by letting solr do the commit instead of my solr client. In solrconfig.xml: autocommit max_time has been set to 5 minutes autosoftcommit max_time to something bigger. Thanks a lot guys! 2014-05-05 16:30 GMT+01:00 Erick Erickson erickerick...@gmail.com: You should not be committing from the client by and large, use the autoCommit and autoSoftCommit options in solrconfig.xml. See: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Mon, May 5, 2014 at 8:12 AM, Hakim Benoudjit h.benoud...@gmail.com wrote: Is there an option in Solr (solrconfig.xml or somewhere else) to regularize commits to the index. I meant to do a 'sleep' between each commit to the index, when data to-be-indexed is waiting inside a stack. 2014-05-05 15:58 GMT+01:00 Hakim Benoudjit h.benoud...@gmail.com: The index is made with the same version of solr, that is searching (4.6.0), the config file (solrconfig.xml) schema.xml is the same too. The only way for me to solve this issue is to let only one process to index at the same time. Wouldnt a layer of message queue resolve this issue? 2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org: On 5/4/2014 9:30 AM, Hakim Benoudjit wrote: Ok. These files contain what you've requested: First (the xml error): http://pastebin.com/ZcagK3T7 Second (java params): http://pastebin.com/JtWQpp6s Third (Solr version): http://pastebin.com/wYdpdsAW Are you running with an index originally built by an earlier version of Solr? If you are, you may be running into a known bug. The last caused by section of the java stacktrace looks similar to the one in this issue -- which is indeed index corruption: https://issues.apache.org/jira/browse/LUCENE-5377 If that's the problem you're experiencing, upgrading your Solr version will hopefully fix it. Simply dropping in the 4.6.1 war file and any contrib jars should cause zero problems for your 4.6.0 install. Upgrading to 4.7.2 or 4.8.0 should be done with more care. Thanks, Shawn -- Hakim Benoudjit. -- Hakim Benoudjit. -- Hakim Benoudjit.
Re: Help to Understand a Solr Query
That answer helps a lot Where would the OR clause be ? (Exact_Field1:samplestring1 *OR* Exact_Field1:samplestring2) AND (Exact_Field2:samplestring1 *OR* Exact_Field2:samplestring2) AND (Field1:samplestring1 *OR* Field1:samplestring2) AND (Field2:samplestring1 *OR* Field2:samplestring2) Please note that in my query it is an AND clause. I am trying to understand where the AND fits in. *query=samplestring1 AND samplestring2* defType: edismax queryFields: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7 fieldList: Column1, Column2 -- View this message in context: http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134763.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Odd XSLT behavior
Shot in the dark: perhaps you have a doc w/o a value in the description field, which means the xsl:variable's select doesn't match anything; which perhaps means that your XSLT engine then leaves the variable undefined. : Solr 4.7.2 (and 4.6.1) : Tomcat 7.0.52 : Java 1.7.0_45 (and _55) : : I'm getting some really odd behavior with some XSLT documents. I've been : doing some upgrades to Java Solr and I'm trying to narrow down where the : problems are happening. : : I have a few XSLT docs that I put into the conf/xslt directory for my : indexes I haven't changed the in a while, and they were working fine for a : 3.X Solr, and seemed to work fine on an earlier 4.X release. : : The problem is that sometimes I get an error saying that a field can't be : found. Here's a slice of the XSLT: : xsl:template match=doc : xsl:variable name=id select=str[@name='id']/ : xsl:variable name=url select=str[@name='url']/ : xsl:variable name=title select=str[@name='title']/ : xsl:variable name=description select=str[@name='description']/ : : entry xmlns=http://www.w3.org/2005/Atom; : titlexsl:value-of select=str[@name='title']//title : link : xsl:attribute name=hrefxsl:value-of select=str[@name='url'] : //xsl:attribute : /link : summary : xsl:choose : xsl:when test=string-length($description) gt; 255 : xsl:value-of select=concat(substring($description, 1, 255), : '...')/ : /xsl:when : xsl:otherwise : xsl:value-of select=$description/ : /xsl:otherwise : /xsl:choose :/summary :. : /xsl:template : :I get messages saying that it can't find the description variable. : This was working perfectly well, but I can't seem to narrow down a specific : change that caused this. : : Caused by: javax.xml.transform.TransformerConfigurationException: : solrres:/xslt/osatom.xsl: line 115: Variable or parameter 'description' is : undefined. : at : com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:964) : at : org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110) : : Has anyone run into a problem like this? Thanks! : : -- Chris : -Hoss http://www.lucidworks.com/
Re: Stored vs non-stored very large text fields
I'll found out that storing Documents as separate docs+id does not help either. You must have an completely separate collection/core to get things work fast. Kind regards, Jochen Zitat von Jochen Barth ba...@ub.uni-heidelberg.de: Ok, https://wiki.apache.org/solr/SolrPerformanceFactors states that: Retrieving the stored fields of a query result can be a significant expense. This cost is affected largely by the number of bytes stored per document--the higher byte count, the sparser the documents will be distributed on disk and more I/O is necessary to retrieve the fields (usually this is a concern when storing large fields, like the entire contents of a document). But in my case (with docValues=true) there should be no reason to access *.fdt. Kind regards, Jochen Zitat von Jochen Barth ba...@ub.uni-heidelberg.de: Something is really strange here: even when configuring fields id + sort_... to docValues=true -- so there's nothing to get from stored documents file -- performance is still terrible with ocr stored=true _even_ with my patch which stores uncompressed like solr4.0.0 (checked with strings -a on *.fdt). Just reading http://lucene.472066.n3.nabble.com/Can-Solr-handle-large-text-files-td3439504.html .. perhaps things will clear up soon (will check if spltting to index+non-stored and non-indexed+stored could help here) Kind regards, J. Barth Zitat von Shawn Heisey s...@elyograg.org: On 4/29/2014 4:20 AM, Jochen Barth wrote: BTW: stored field compression: are all stored fields within a document are put into one compressed chunk, or by per-field basis? Here's the issue that added the compression to Lucene: https://issues.apache.org/jira/browse/LUCENE-4226 It was made the default stored field format for Lucene, which also made it the default for Solr. At this time, there is no way to remove compression on Solr without writing custom code. I filed an issue to make it configurable, but I don't know how to do it. Nobody else has offered a solution either. One day I might find some time to take a look at the issue and see if I can solve it myself. https://issues.apache.org/jira/browse/SOLR-4375 Here's the author's blog post that goes into more detail than the LUCENE issue: http://blog.jpountz.net/post/33247161884/efficient-compressed-stored-fields-with-lucene Thanks, Shawn
Re: can't make GET request to solr in android app
Thank you Shawn I did what you told me. now this is my code: import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.SolrServer; //import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.*; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.common.SolrDocumentList; import java.io.InputStream; @Override public void onClick(View v) { // TODO Auto-generated method stub //etxt2.setText(etxt1.getText()); //ALERT MESSAGE // Toast.makeText(getBaseContext(),Please wait, connecting to server.,Toast.LENGTH_LONG).show(); SolrServer solr; String urlString = http://localhost:8983/solr/collection1;; solr = new HttpSolrServer(urlString); SolrQuery query = new SolrQuery(); query.set(qt, /select); query.set(q, mem); QueryResponse response = null; try { response = solr.query(query); SolrDocumentList results = response.getResults(); for (int i = 0; i results.size(); ++i) { //System.out.println(results.get(i)); etxt2.setText((CharSequence) results.get(i)); } } catch (SolrServerException e) { // TODO Auto-generated catch block e.printStackTrace(); } }}); } it gives me error that org.apache.solr.client.solrj is not found -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134769.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: can't make GET request to solr in android app
On 5/5/2014 12:17 PM, blach wrote: Thank you Shawn I did what you told me. now this is my code: snip it gives me error that org.apache.solr.client.solrj is not found I don't know how to do classpath management in the Android enviroment. You'll need to add the solrj jar to your application classpath. In the download that I have extracted on my computer, this is named dist/solr-solrj-4.7.2.jar ... the version number is usually in the filename. A number of other jars are also required. You can find these in the dist/solrj-lib directory. If you need a newer or slightly older version of one of the dependent jars for your own code, it is usually OK to use a slightly different version. Thanks, Shawn
Re: interpretation of cat_rank in http://people.apache.org/~hossman/ac2012eu/
: Hi everybody : can anyone give me a suitable interpretation for cat_rank in : http://people.apache.org/~hossman/ac2012eu/ slide 15 Have you seen the video? http://vimeopro.com/user11514798/apache-lucene-eurocon-2012/video/55822630 That slide starts ~ 23:00 and i go through a description of this example. TL;DW: cat_rank in this example would be a numeric ranking of the category the product is in - so cat_rank==N means the product is in the Nth most popular categoy on the site (so lower is better, but hte number is always a positive integer) -Hoss http://www.lucidworks.com/
Re: Help to Understand a Solr Query
That answer helps a lot Where would the OR clause be ? (Exact_Field1:samplestring1 OR Exact_Field1:samplestring2) AND (Exact_Field2:samplestring1 OR Exact_Field2:samplestring2) AND (Field1:samplestring1 OR Field1:samplestring2) AND (Field2:samplestring1 OR Field2:samplestring2) Please note that in my query it is an AND clause. I am trying to understand where the AND fits in. To be more precise my query is as below q=samplestring1 AND samplestring2defType: edismaxqf: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7fl= Column1, Column2 -- View this message in context: http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134775.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Odd XSLT behavior
Checked that first -- it's a test site with a small sample size. The field is set in all of the items. And refreshing the query a few times can yield either result (with/without the error). I'm reverting back to an old version of my stack (my code, plus tomcat solr), I'll step through my previous work slowly to see if I can pinpoint what breaks it. If I can (ever) determine what caused it then I'll post it. Thanks! -- Chris On Mon, May 5, 2014 at 2:05 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: Shot in the dark: perhaps you have a doc w/o a value in the description field, which means the xsl:variable's select doesn't match anything; which perhaps means that your XSLT engine then leaves the variable undefined. : Solr 4.7.2 (and 4.6.1) : Tomcat 7.0.52 : Java 1.7.0_45 (and _55) : : I'm getting some really odd behavior with some XSLT documents. I've been : doing some upgrades to Java Solr and I'm trying to narrow down where the : problems are happening. : : I have a few XSLT docs that I put into the conf/xslt directory for my : indexes I haven't changed the in a while, and they were working fine for a : 3.X Solr, and seemed to work fine on an earlier 4.X release. : : The problem is that sometimes I get an error saying that a field can't be : found. Here's a slice of the XSLT: : xsl:template match=doc : xsl:variable name=id select=str[@name='id']/ : xsl:variable name=url select=str[@name='url']/ : xsl:variable name=title select=str[@name='title']/ : xsl:variable name=description select=str[@name='description']/ : : entry xmlns=http://www.w3.org/2005/Atom; : titlexsl:value-of select=str[@name='title']//title : link : xsl:attribute name=hrefxsl:value-of select=str[@name='url'] : //xsl:attribute : /link : summary : xsl:choose : xsl:when test=string-length($description) gt; 255 : xsl:value-of select=concat(substring($description, 1, 255), : '...')/ : /xsl:when : xsl:otherwise : xsl:value-of select=$description/ : /xsl:otherwise : /xsl:choose :/summary :. : /xsl:template : :I get messages saying that it can't find the description variable. : This was working perfectly well, but I can't seem to narrow down a specific : change that caused this. : : Caused by: javax.xml.transform.TransformerConfigurationException: : solrres:/xslt/osatom.xsl: line 115: Variable or parameter 'description' is : undefined. : at : com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl.newTemplates(TransformerFactoryImpl.java:964) : at : org.apache.solr.util.xslt.TransformerProvider.getTemplates(TransformerProvider.java:110) : : Has anyone run into a problem like this? Thanks! : : -- Chris : -Hoss http://www.lucidworks.com/
Re: can't make GET request to solr in android app
I have included the reference for this library in good way but still giving me the same error. feeling -- View this message in context: http://lucene.472066.n3.nabble.com/can-t-make-GET-request-to-solr-in-android-app-tp4134584p4134785.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Core failure when a lot of processes are indexing
Take a look through the article I linked, 5 minutes may be an issue since the transaction log will hold all 5 minutes worth of input. In batch processes this can be quite a bit of data. Worse, when a Solr instance terminates unexpectedly, the entire transaction log can be replayed. Consider setting your autommit max time to something much shorter, say 30 seconds. Or even less. NOTE openSearcher should be false. Then set your soft commit time to the latency you can stand, i.e. if the users don't need to be able to search for a long time you can set this to hours. FWIW, Erick On Mon, May 5, 2014 at 11:03 AM, Hakim Benoudjit h.benoud...@gmail.com wrote: I've tried it it worked by letting solr do the commit instead of my solr client. In solrconfig.xml: autocommit max_time has been set to 5 minutes autosoftcommit max_time to something bigger. Thanks a lot guys! 2014-05-05 16:30 GMT+01:00 Erick Erickson erickerick...@gmail.com: You should not be committing from the client by and large, use the autoCommit and autoSoftCommit options in solrconfig.xml. See: http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Mon, May 5, 2014 at 8:12 AM, Hakim Benoudjit h.benoud...@gmail.com wrote: Is there an option in Solr (solrconfig.xml or somewhere else) to regularize commits to the index. I meant to do a 'sleep' between each commit to the index, when data to-be-indexed is waiting inside a stack. 2014-05-05 15:58 GMT+01:00 Hakim Benoudjit h.benoud...@gmail.com: The index is made with the same version of solr, that is searching (4.6.0), the config file (solrconfig.xml) schema.xml is the same too. The only way for me to solve this issue is to let only one process to index at the same time. Wouldnt a layer of message queue resolve this issue? 2014-05-04 18:33 GMT+01:00 Shawn Heisey s...@elyograg.org: On 5/4/2014 9:30 AM, Hakim Benoudjit wrote: Ok. These files contain what you've requested: First (the xml error): http://pastebin.com/ZcagK3T7 Second (java params): http://pastebin.com/JtWQpp6s Third (Solr version): http://pastebin.com/wYdpdsAW Are you running with an index originally built by an earlier version of Solr? If you are, you may be running into a known bug. The last caused by section of the java stacktrace looks similar to the one in this issue -- which is indeed index corruption: https://issues.apache.org/jira/browse/LUCENE-5377 If that's the problem you're experiencing, upgrading your Solr version will hopefully fix it. Simply dropping in the 4.6.1 war file and any contrib jars should cause zero problems for your 4.6.0 install. Upgrading to 4.7.2 or 4.8.0 should be done with more care. Thanks, Shawn -- Hakim Benoudjit. -- Hakim Benoudjit. -- Hakim Benoudjit.
Turning on KeywordRepeat and RemoveDups on an existing fieldType.
As per the stemming docs ( https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), I want to score the original term higher than the stemmed version by adding: filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ to a field type that is already created (with Stemming). I have 100M documents in this index, and it gets slowly reindexed every month as records change. My question is, can I add this to the existing fieldType, or do I need to make a new fieldType, and copyField the data over to it, and after it's all reindexed switch my code? I'd rather be able to just add the lines to my fieldType because I don't think I have enough disk space on my cloud members to hold my primary fulltext field twice. Just in case it helps, I'm running 4.4.0 and the field I'm wanting to mod looks like this: fieldType name=keywordText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=keyword_stopwords.txt enablePositionIncrements=true / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=keyword_stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Thanks, M.
Re: Error initializing QueryElevationComponent
The full details are farther down in the stack... : null:org.apache.solr.common.SolrException: SolrCore 'master' is not : available due to init failure: Error initializing QueryElevationComponent. ... : Caused by: org.apache.solr.common.SolrException: Error initializing : QueryElevationComponent. ... : Caused by: org.apache.solr.common.SolrException: : org.xml.sax.SAXParseException; systemId: solrres:/elevate.xml; lineNumber: : 28; columnNumber: 80; The reference to entity ver must end with the ';' : delimiter. The problem is that your elevate.xml is not a valid XML file at all -- you have a bare character in there (as part of your id which is not valid in XML -- you are confusing hte parser into thinking that you intend for ver to be an XML entity but you are missing the ; at the end (and even if you had that, then you'd get an error that the entity ver; is not defined) ... : id=sitecore://master/{137f5eb3-eb84-4165-bef0-5be1fbbc3201}?lang=enver=1/ you need to use valid XML, so that id attribute should be something like... id=sitecore://master/{137f5eb3-eb84-4165-bef0-5be1fbbc3201}?lang=enamp;ver=1 -Hoss http://www.lucidworks.com/
Strict Search in Apache Solr
How could Solr accomplish an end-user behavior like a strict search? Let’s say an end-user decides to use quotation marks in their keywords to provide specificity in their search results. Current: If you were to query: your future, then 10 results would return and print to the page. Expected: I’d like to query: “your future”, then less than 10 results would return and print to the page. Regards, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Relevancy help
Hello, I have a weird relevancy requirement. We search news content hence chronology is very important and also relevancy, although both are mutually exclusive. For example, if the search terms are - malaysia airline crash blackbox - my requirements are as follows docs containing all words should be on top, but the editorial also wants them sorted reverse by chronological order without loosing relevancy. Why ?? If on day 1 there is an article about search for blackbox but on day 2 the blackbox is found and day 3 there is an article about blackbox being unusable...from the user's standpoint it makes sense that we show most recent content on top. I already boost recency of docs with boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of 3 months However when I do the boost the chronology is messed up. I know relevancy and sorting are mutually exclusive concepts. Is there any magic that we can do in SOLR which can achieve both ??? Thanks, Ravi Kiran bhaskar
Re: Turning on KeywordRepeat and RemoveDups on an existing fieldType.
I haven't personally used this technique, but I gather that the intent is that the unstemmed term will have a lower term frequency (more unique) than the stemmed term which may generate the same stemmed term from a number of different source terms. To answer your question, no, you don't need a separate field or type for this feature, but it will tend to generate a lot more terms in your index since it will index a stemmed term as two terms. Only use the repeat/remove filters for the index analyzer. You will need to reindex to see the full effect immediately, but you can do the reindex incrementally (as you replace existing documents) as well if you don't mind if the difference in relevancy takes an extended time to become apparent. -- Jack Krupansky -Original Message- From: Michael Tracey Sent: Monday, May 5, 2014 4:52 PM To: solr-user@lucene.apache.org Subject: Turning on KeywordRepeat and RemoveDups on an existing fieldType. As per the stemming docs ( https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), I want to score the original term higher than the stemmed version by adding: filter class=solr.KeywordRepeatFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ to a field type that is already created (with Stemming). I have 100M documents in this index, and it gets slowly reindexed every month as records change. My question is, can I add this to the existing fieldType, or do I need to make a new fieldType, and copyField the data over to it, and after it's all reindexed switch my code? I'd rather be able to just add the lines to my fieldType because I don't think I have enough disk space on my cloud members to hold my primary fulltext field twice. Just in case it helps, I'm running 4.4.0 and the field I'm wanting to mod looks like this: fieldType name=keywordText class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=keyword_stopwords.txt enablePositionIncrements=true / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=keyword_stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Thanks, M.
Re: Relevancy help
The recip function query is the proper way to boost by reverse chronological order, but you may have to play around with the boost factor so that date does not completely overwhelm the natural relevancy. Use the debugQuery=true parameter and look at the explain section to see what the document scores look like. -- Jack Krupansky -Original Message- From: Ravi Solr Sent: Monday, May 5, 2014 5:41 PM To: solr-user@lucene.apache.org Subject: Relevancy help Hello, I have a weird relevancy requirement. We search news content hence chronology is very important and also relevancy, although both are mutually exclusive. For example, if the search terms are - malaysia airline crash blackbox - my requirements are as follows docs containing all words should be on top, but the editorial also wants them sorted reverse by chronological order without loosing relevancy. Why ?? If on day 1 there is an article about search for blackbox but on day 2 the blackbox is found and day 3 there is an article about blackbox being unusable...from the user's standpoint it makes sense that we show most recent content on top. I already boost recency of docs with boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of 3 months However when I do the boost the chronology is messed up. I know relevancy and sorting are mutually exclusive concepts. Is there any magic that we can do in SOLR which can achieve both ??? Thanks, Ravi Kiran bhaskar
Re: Relevancy help
Hi Ravi, Regarding recency please see : http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr Regarding docs containing all words there is function query that elevates those docs to top. Search existing mailing list past posts. Ahmet On Tuesday, May 6, 2014 12:42 AM, Ravi Solr ravis...@gmail.com wrote: Hello, I have a weird relevancy requirement. We search news content hence chronology is very important and also relevancy, although both are mutually exclusive. For example, if the search terms are - malaysia airline crash blackbox - my requirements are as follows docs containing all words should be on top, but the editorial also wants them sorted reverse by chronological order without loosing relevancy. Why ?? If on day 1 there is an article about search for blackbox but on day 2 the blackbox is found and day 3 there is an article about blackbox being unusable...from the user's standpoint it makes sense that we show most recent content on top. I already boost recency of docs with boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of 3 months However when I do the boost the chronology is messed up. I know relevancy and sorting are mutually exclusive concepts. Is there any magic that we can do in SOLR which can achieve both ??? Thanks, Ravi Kiran bhaskar
Re: Strict Search in Apache Solr
Hi Reyes, I think it is not clear your question. Please see : https://wiki.apache.org/solr/UsingMailingLists Ahmet On Tuesday, May 6, 2014 12:23 AM, Reyes, Mark mark.re...@bpiedu.com wrote: How could Solr accomplish an end-user behavior like a strict search? Let’s say an end-user decides to use quotation marks in their keywords to provide specificity in their search results. Current: If you were to query: your future, then 10 results would return and print to the page. Expected: I’d like to query: “your future”, then less than 10 results would return and print to the page. Regards, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Histogram facet?
Hi, I am trying to plot a non date field by time in order to draw an histogram showing its evolution during the week. For example, if I have a tweet index: Tweet: date retweetCount 3 tweets indexed: Tweet | Date | Retweet A01/01 100 B01/01 100 C01/02 100 If I want to plot the number of tweets by day: easy with a date range facet: Day 1: 2 Day 2: 1 But now counting the number of retweet by day is not possible natively: Day 1: 200 Day 2: 100 On current workaround would be to do a date rage facet to get the date slots and ask only for the retweet field and compute the sums in the client. We could compute other stats like average, etc... too The closest I could see was https://issues.apache.org/jira/browse/SOLR-4772but it seems to be slightly different. Basically I am trying to do something very similar to the Date Histogram Facethttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facetin ES. Is there a way to move the counting logic to the Solr server? Thanks! Romain
Re: Strict Search in Apache Solr
Okay, let¹s try it this wayŠ CURRENTLY: Step 1: Type, your future into the search bar. Step 2: 10 search results return. I¹D LIKE TO SEE THIS: Step 1: Type, ³your future² into the search bar. Step 2: 1 search result returns. Can this be accomplished through the Solr UI? Thanks, Mark On 5/5/14, 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Reyes, I think it is not clear your question. Please see : https://wiki.apache.org/solr/UsingMailingLists Ahmet On Tuesday, May 6, 2014 12:23 AM, Reyes, Mark mark.re...@bpiedu.com wrote: How could Solr accomplish an end-user behavior like a strict search? Let¹s say an end-user decides to use quotation marks in their keywords to provide specificity in their search results. Current: If you were to query: your future, then 10 results would return and print to the page. Expected: I¹d like to query: ³your future², then less than 10 results would return and print to the page. Regards, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Strict Search in Apache Solr
The term strict search is not in the Lucene/Solr nomenclature - it could mean any number of things. It sounds as if maybe you want to do a phrase search, looking for an exact phrase - yes, you can do that by enclosing the phrase in quotes. -- Jack Krupansky -Original Message- From: Reyes, Mark Sent: Monday, May 5, 2014 5:23 PM To: solr-user@lucene.apache.org Subject: Strict Search in Apache Solr How could Solr accomplish an end-user behavior like a strict search? Let’s say an end-user decides to use quotation marks in their keywords to provide specificity in their search results. Current: If you were to query: your future, then 10 results would return and print to the page. Expected: I’d like to query: “your future”, then less than 10 results would return and print to the page. Regards, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Linking Two Fields Together
I'm using Sorl to create an image search functionality that allows users to search for an existing image in the site to add to new content. A given piece of content has a field that can store multiple images, so I will need to use a multi-value Solr field to store image data. Currently, I'm storing the path and file name in a tom_* field, since I want to be able to search on file name. However, another piece of data that I need to store and retrieve is the file id used to identify the file in the database (in the same table as the image path). What is the best way to store this data so that the file id and path values are properly synced, since there can be multiple images for each piece of content? I could just store the file path/name (I need that data to be searchable, so it has to be stored in Solr), and then query the db for the fid once I get the results back, but I'd rather not do that if I don't have to. Searching around, it doesn't appear that I can store multiple pieces of data in one field without doing some sort of concatenation and then splitting at query time. If I just use two separate fields in each document, is it safe to assume that the values will be synchronized in the search results? In other words, if I put two values each into tom_image_path and im_image_file_id, when I query and the document is returned, can I assume the values in the two fields are synchronized? Or, is there a way to store multiple pieces of data in one field so that they can be indexed together and then retrived together? Thanks. Steve
Re: dynamic field assignments
: My understanding is that DynamicField can do something like : FOO_BAR_TEXT_* but what I really need is *_TEXT_* as I might have : FOO_BAR_TEXT_1 but I also might have WIDGET_BAR_TEXT_2. Both of those : field names need to map to a field type of 'fullText'. I'm pretty sure you can get what you are after with the new Manged Schema functionality... https://cwiki.apache.org/confluence/display/solr/Schemaless+Mode https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig Assuming you have managed schema enabled in solrconfig.xml, and you define both of your fieldTypes using names like text and select then something like this should work in your processor chain... processor class=solr.AddSchemaFieldsUpdateProcessorFactory str name=fieldRegex.*_TEXT_.*/str str name=defaultFieldTypetext/str /processor processor class=solr.AddSchemaFieldsUpdateProcessorFactory str name=fieldRegex.*_SELECT_.*/str str name=defaultFieldTypeselect/str /processor (Normally that processor is used once with multiple value-type mappings -- but in your case you don't care about the run-time value, just the run time field name regex (which should also be configurable according to the various FieldNameSelector rules... https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/AddSchemaFieldsUpdateProcessorFactory.html https://lucene.apache.org/solr/4_8_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html -Hoss http://www.lucidworks.com/
Re: sort groups by the sum of the scores of the documents within each group
my scheme.xml: schema name=example core one version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=uuid class=solr.UUIDField indexed=true / fieldtype name=textComplex class=solr.TextField positionIncrementGap=100 omitNorms=false autoGeneratePhraseQueries=false analyzer type=query tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer analyzer type=index tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer /fieldtype /types fields field name=idtype=uuidindexed=true stored=true multiValued=false required=true / field name=name type=textComplexindexed=true stored=true multiValued=false / field name=type type=stringindexed=true stored=true multiValued=false / field name=price type=longindexed=true stored=true / field name=_version_ type=long indexed=true stored=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldname/defaultSearchField solrQueryParser defaultOperator=OR/ /schema update docs: docs: [ { name: 苹果4s, type: 手机, price: 2000, id: 4017e35a-6b19-45b6-b945-382340ca1eec, _version_: 1466799722505175000 }, { name: 苹果5, type: 手机, price: 5000, id: 4052d9f3-f6d9-458f-8bb0-477b17852f37, _version_: 1466799735745544200 }, { name: 三星, type: 手机, price: 3000, id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac, _version_: 1466799747596550100 }, { name: 摩托罗拉i3, type: 电脑, price: 1000, id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd, _version_: 1466799757491961900 }, { name: 摩托罗拉i5, type: 电脑, price: 1500, id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c, _version_: 1466799766311534600 } ] thank you , Erick, i want to sort groups based on the sum of documents' scores within each group, as you said, solr excels at getting the score of single documents, in solr 4.6, the default sort of group each other depends on the maxScore of all documents within each group, but the sum of documents' scores, though i can get the sum of documents' scores by the client program, it's not good idea, l know that the stats component of solr can statistics the long field, so I had the idea to use statistic data for score field, but the score is pse-udo field, the stats.field doesn't support it. In addition, as scheme.xml displayed, i do group on the elements of a string field(type) without using participle. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Anybody uses Solr JMX?
Alexandre, you could use something like http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to quickly dump everything out of JMX and see if there is anything there Solr Admin UI doesn't expose. I think you'll find there is more in JMX than Solr Admin UI shows. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Thank you everybody for the links and explanations. I am still curious whether JMX exposes more details than the Admin UI? I am thinking of a troubleshooting context, rather than long-term monitoring one. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty g...@mimirtech.com wrote: On May 5, 2014 7:09 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: I have religiously kept jmx statement in my solrconfig.xml, thinking it was enabling the web interface statistics output. But looking at the server logs really closely, I can see that JMX is actually disabled without server present. And the Admin UI does not actually seem to care after a quick test. Does anybody have a real experience with Solr JMX? Does it expose more information than Admin UI's Plugins/Stats page? Is it good for Have not been using JMX lately, but we were using it in the past. It does allow monitoring many useful details. As others have commented, it also integrates well with other monitoring tools as JMX is a standard. Regards, Gora
Re: Anybody uses Solr JMX?
Thanks Otis, JMXC looks interesting, though I cannot seem to find the Open Source section on your website it used to link to. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, May 6, 2014 at 9:43 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Alexandre, you could use something like http://blog.sematext.com/2012/09/25/new-tool-jmxc-jmx-console/ to quickly dump everything out of JMX and see if there is anything there Solr Admin UI doesn't expose. I think you'll find there is more in JMX than Solr Admin UI shows. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, May 5, 2014 at 1:56 AM, Alexandre Rafalovitch arafa...@gmail.comwrote: Thank you everybody for the links and explanations. I am still curious whether JMX exposes more details than the Admin UI? I am thinking of a troubleshooting context, rather than long-term monitoring one. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, May 5, 2014 at 12:21 PM, Gora Mohanty g...@mimirtech.com wrote: On May 5, 2014 7:09 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: I have religiously kept jmx statement in my solrconfig.xml, thinking it was enabling the web interface statistics output. But looking at the server logs really closely, I can see that JMX is actually disabled without server present. And the Admin UI does not actually seem to care after a quick test. Does anybody have a real experience with Solr JMX? Does it expose more information than Admin UI's Plugins/Stats page? Is it good for Have not been using JMX lately, but we were using it in the past. It does allow monitoring many useful details. As others have commented, it also integrates well with other monitoring tools as JMX is a standard. Regards, Gora
Re: Help to Understand a Solr Query
If you are looking for that level of understanding, you are best enabling the debug flag. Then you will get a full breakdown of what matched which field and why. Including scores, preferences, etc. Possibly with debug.explained.structured enabled: http://wiki.apache.org/solr/CommonQueryParameters#debug.explain.structured Most people do not want to deep dive into debug info. But I am getting the feeling this would be right where you want to go. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, May 6, 2014 at 1:47 AM, nativecoder romrom...@gmail.com wrote: That answer helps a lot Where would the OR clause be ? (Exact_Field1:samplestring1 OR Exact_Field1:samplestring2) AND (Exact_Field2:samplestring1 OR Exact_Field2:samplestring2) AND (Field1:samplestring1 OR Field1:samplestring2) AND (Field2:samplestring1 OR Field2:samplestring2) Please note that in my query it is an AND clause. I am trying to understand where the AND fits in. To be more precise my query is as below q=samplestring1 AND samplestring2defType: edismaxqf: Exact_Field1^1.0 Exact_Field2^0.9 Field1^0.8 Field2^0.7fl= Column1, Column2 -- View this message in context: http://lucene.472066.n3.nabble.com/Help-to-Understand-a-Solr-Query-tp4134686p4134775.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Linking Two Fields Together
You can have two parallel multi-value fields and as long as you don't introduce null/empty values, they will kept together. However, for recent Solr (4.7? certainly 4.8), you may want to look at parent/child entries and join/parent/child queries. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, May 6, 2014 at 7:20 AM, Steve Edwards killsho...@gmail.com wrote: I'm using Sorl to create an image search functionality that allows users to search for an existing image in the site to add to new content. A given piece of content has a field that can store multiple images, so I will need to use a multi-value Solr field to store image data. Currently, I'm storing the path and file name in a tom_* field, since I want to be able to search on file name. However, another piece of data that I need to store and retrieve is the file id used to identify the file in the database (in the same table as the image path). What is the best way to store this data so that the file id and path values are properly synced, since there can be multiple images for each piece of content? I could just store the file path/name (I need that data to be searchable, so it has to be stored in Solr), and then query the db for the fid once I get the results back, but I'd rather not do that if I don't have to. Searching around, it doesn't appear that I can store multiple pieces of data in one field without doing some sort of concatenation and then splitting at query time. If I just use two separate fields in each document, is it safe to assume that the values will be synchronized in the search results? In other words, if I put two values each into tom_image_path and im_image_file_id, when I query and the document is returned, can I assume the values in the two fields are synchronized? Or, is there a way to store multiple pieces of data in one field so that they can be indexed together and then retrived together? Thanks. Steve
Re: Solr does not recognize language
hi,iorixxx, i'm Frankcis, not Victor , are you make the wrong email? 2014-05-05 23:20 GMT+08:00 iorixxx [via Lucene] ml-node+s472066n4134713...@n3.nabble.com: Hi Victor, I don't know mysolr, I assume you are using /update/json, lets add your chain to defaults section. requestHandler name=/update/json class=solr.UpdateRequestHandler lst name=defaults str name=stream.contentTypeapplication/json/str str name=update.chainlangid/str /lst /requestHandler On Monday, May 5, 2014 4:06 PM, Victor Pascual [hidden email]http://user/SendEmail.jtp?type=nodenode=4134713i=0 wrote: Hi there, I'm indexing my documents using mysolr. I mainly generate a lost of json objects and the run: solr.update(documents_array,'json') On Mon, May 5, 2014 at 1:08 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134713i=1 wrote: Hi Victor, How do you index your documents? Your last config looks correct. However for example if you use data import handler you need to add update.chain there too. Same as extraction request hadler if you are using sole-cell. requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=config/home/username/data-config.xml/str str name=update.chainlangid/str /lst /requestHandler By the way The URL http://localhost:8080/solr/update?commit=trueupdate.chain=langid was just an example and meant to feed xml update messages by POST method. Not to use in a browser. Ahmet On Monday, May 5, 2014 11:04 AM, Victor Pascual [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=2 wrote: Thank you very much for you help Ahmet. However the language detection is still not workin. :( My solrconfig.xml didn't contain that lst section inside the update requestHandler. That's the content I added: requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler updateRequestProcessorChain name=langid processor class=org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.fltext/str str name=langid.langFieldlang/str /lst /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Now, your suggested query http://localhost:8080/solr/update?commit=trueupdate.chain=langid returns response lst name=responseHeader int name=status0/int int name=QTime14/int /lst /response And there is still no lang field in my documents. Any idea what am I doing wrong? On Tue, Apr 29, 2014 at 5:33 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134713i=3 wrote: Hi, solr/update should be used, not /solr/select curl ' http://localhost:8983/solr/update?commit=trueupdate.chain=langid' By the way don't you have following definition in your solrconfig.xml? requestHandler name=/update class=solr.UpdateRequestHandler lst name=defaults str name=update.chainlangid/str /lst /requestHandler On Tuesday, April 29, 2014 4:50 PM, Victor Pascual [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=4 wrote: Hi Ahmet, thanks for your reply. Adding update.chain=langid to my query doesn't work: IP:8080/solr/select/?q=*%3A*update.chain=langid Regarding defining the chain in an UpdateRequestHandler... sorry for the lame question but shall I paste those three lines to solrconfig.xml, or shall I add them somewhere else? There is not UpdateRequestHandler in my solrconfig. Thanks! On Tue, Apr 29, 2014 at 3:13 PM, Ahmet Arslan [hidden email]http://user/SendEmail.jtp?type=nodenode=4134713i=5 wrote: Hi, Did you attach your chain to a UpdateRequestHandler? You can do it by adding update.chain=langid to the URL or defining it in a defaults section as follows lst name=defaults str name=update.chainlangid/str /lst On Tuesday, April 29, 2014 3:18 PM, Victor Pascual [hidden email] http://user/SendEmail.jtp?type=nodenode=4134713i=6 wrote: Dear all, I'm a new user of Solr. I've managed to index a bunch of documents (in fact, they are tweets) and everything works quite smoothly. Nevertheless it looks like Solr doesn't detect the language of my documents nor remove stopwords accordingly so I can extract the most frequent terms. I've added this piece of XML to my solrconfig.xml as well as the Tika lib jars. updateRequestProcessorChain name=langid processor
Re: Strict Search in Apache Solr
You can do phrase search explicitly with quotes. Or you could look at something like Term query parser: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermQueryParser You can also enable autoGeneratePhraseQueries on the field type to try the phrase queries, but that's in addition to trying individual terms: https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, May 6, 2014 at 5:35 AM, Jack Krupansky j...@basetechnology.com wrote: The term strict search is not in the Lucene/Solr nomenclature - it could mean any number of things. It sounds as if maybe you want to do a phrase search, looking for an exact phrase - yes, you can do that by enclosing the phrase in quotes. -- Jack Krupansky -Original Message- From: Reyes, Mark Sent: Monday, May 5, 2014 5:23 PM To: solr-user@lucene.apache.org Subject: Strict Search in Apache Solr How could Solr accomplish an end-user behavior like a strict search? Let’s say an end-user decides to use quotation marks in their keywords to provide specificity in their search results. Current: If you were to query: your future, then 10 results would return and print to the page. Expected: I’d like to query: “your future”, then less than 10 results would return and print to the page. Regards, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Relevancy help
Can you sort by score, than date? Assuming similar articles will get same score (may need to discount frequency/length). There is also QueryRescore API introduced in Lucene 4.8 that might be relevant. Though I have no idea how that would get exposed in Solr. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, May 6, 2014 at 5:12 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Ravi, Regarding recency please see : http://www.slideshare.net/lucenerevolution/potter-timothy-boosting-documents-in-solr Regarding docs containing all words there is function query that elevates those docs to top. Search existing mailing list past posts. Ahmet On Tuesday, May 6, 2014 12:42 AM, Ravi Solr ravis...@gmail.com wrote: Hello, I have a weird relevancy requirement. We search news content hence chronology is very important and also relevancy, although both are mutually exclusive. For example, if the search terms are - malaysia airline crash blackbox - my requirements are as follows docs containing all words should be on top, but the editorial also wants them sorted reverse by chronological order without loosing relevancy. Why ?? If on day 1 there is an article about search for blackbox but on day 2 the blackbox is found and day 3 there is an article about blackbox being unusable...from the user's standpoint it makes sense that we show most recent content on top. I already boost recency of docs with boost=recip(ms(NOW/HOUR,displaydatetime),7.889e-10,1,1) i.e. increments of 3 months However when I do the boost the chronology is messed up. I know relevancy and sorting are mutually exclusive concepts. Is there any magic that we can do in SOLR which can achieve both ??? Thanks, Ravi Kiran bhaskar
Re: Histogram facet?
Hmmm, I _think_ pivot faceting works here. One dimension would be day and the other retweet count. The response will have the number of retweets per day, you'd have to sum them up I suppose. Best, Erick On Mon, May 5, 2014 at 3:18 PM, Romain romain@gmail.com wrote: Hi, I am trying to plot a non date field by time in order to draw an histogram showing its evolution during the week. For example, if I have a tweet index: Tweet: date retweetCount 3 tweets indexed: Tweet | Date | Retweet A01/01 100 B01/01 100 C01/02 100 If I want to plot the number of tweets by day: easy with a date range facet: Day 1: 2 Day 2: 1 But now counting the number of retweet by day is not possible natively: Day 1: 200 Day 2: 100 On current workaround would be to do a date rage facet to get the date slots and ask only for the retweet field and compute the sums in the client. We could compute other stats like average, etc... too The closest I could see was https://issues.apache.org/jira/browse/SOLR-4772but it seems to be slightly different. Basically I am trying to do something very similar to the Date Histogram Facethttp://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facetin ES. Is there a way to move the counting logic to the Solr server? Thanks! Romain
Re: Wildcard malfunctioning
I mark all the filters that support wildcards with (multi) on my list: http://www.solr-start.com/info/analyzers/ . I uses actual interface markers to derive that list, so it should be most up to date. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Mon, May 5, 2014 at 6:19 PM, Jack Krupansky j...@basetechnology.com wrote: Generally, stemming filters are not supported when wildcards are present. Only a small subset of filters work with wildcards, such as the case conversion filters. But, you stay that you are using the stemmer to remove diacritical marks... you can/should use ASCIIFoldingFilterFactory or MappingCharFilterFactory. -- Jack Krupansky -Original Message- From: Román González Sent: Monday, May 5, 2014 7:00 AM To: solr-user@lucene.apache.org Subject: Wildcard malfunctioning Hi all! Sorry in advance if this question was posted but I were unable to find it with search engines. Filter SpanishLightStemFilterFactory is not working properly with wildcards or I’m misunderstanding something. I have the field field name=cultivo_es type=text_es indexed=true stored=true / With this type: fieldType name=text_es class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_es.txt format=snowball / filter class=solr.SpanishLightStemFilterFactory/ !-- more aggressive: filter class=solr.SnowballPorterFilterFactory language=Spanish/ -- /analyzer /fieldType But I’m getting these results: q = cultivo_es:uva Getting 50 correct results q = cultivo_es:uva* Getting the same 50 correct results q = cultivo_es:naranja Getting the 50 correct results of “naranja” q = cultivo_es:naranja* Getting the 0 results ! It works fine if I remove SpanishLightStemFilterFactory filter, but I need it in order to filter diacritics according to Spanish rules. Thank you!!
Re: sort groups by the sum of the scores of the documents within each group
You haven't answered _why_ this is a good idea. I'm having a hard time understanding what would be _useful_ about sorting this way. Just because the sum of scores in a group is greater than the sum of scores in another says _nothing_ about how relevant any of the docs in the group are relative to each other. I mean group 1 could have 10M documents all with a score of .01 and group 2 could have 1 document with a score of 1,000 and group 1 would sort first. So unless you have some unusual use-case which you haven't yet articulated, this seems like a bad idea. Best, Erick On Mon, May 5, 2014 at 7:20 PM, Frankcis finalxc...@gmail.com wrote: my scheme.xml: schema name=example core one version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.TrieLongField precisionStep=0 positionIncrementGap=0/ fieldType name=uuid class=solr.UUIDField indexed=true / fieldtype name=textComplex class=solr.TextField positionIncrementGap=100 omitNorms=false autoGeneratePhraseQueries=false analyzer type=query tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer analyzer type=index tokenizer class=com.chenlb.mmseg4j.solr.MMSegTokenizerFactory mode=complex dicPath=E:\solr-4.6.1\example\solr\dict/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=false expand=true/ /analyzer /fieldtype /types fields field name=idtype=uuid indexed=true stored=true multiValued=false required=true / field name=name type=textComplexindexed=true stored=true multiValued=false / field name=type type=stringindexed=true stored=true multiValued=false / field name=price type=longindexed=true stored=true / field name=_version_ type=long indexed=true stored=true/ /fields uniqueKeyid/uniqueKey defaultSearchFieldname/defaultSearchField solrQueryParser defaultOperator=OR/ /schema update docs: docs: [ { name: 苹果4s, type: 手机, price: 2000, id: 4017e35a-6b19-45b6-b945-382340ca1eec, _version_: 1466799722505175000 }, { name: 苹果5, type: 手机, price: 5000, id: 4052d9f3-f6d9-458f-8bb0-477b17852f37, _version_: 1466799735745544200 }, { name: 三星, type: 手机, price: 3000, id: 468abce8-8bb9-4f51-9900-8d4d6abc02ac, _version_: 1466799747596550100 }, { name: 摩托罗拉i3, type: 电脑, price: 1000, id: db66bb02-3d6a-4ab0-9133-2e6e38b3d4dd, _version_: 1466799757491961900 }, { name: 摩托罗拉i5, type: 电脑, price: 1500, id: f211525f-bc3c-4ea7-aded-1c46a94ecd1c, _version_: 1466799766311534600 } ] thank you , Erick, i want to sort groups based on the sum of documents' scores within each group, as you said, solr excels at getting the score of single documents, in solr 4.6, the default sort of group each other depends on the maxScore of all documents within each group, but the sum of documents' scores, though i can get the sum of documents' scores by the client program, it's not good idea, l know that the stats component of solr can statistics the long field, so I had the idea to use statistic data for score field, but the score is pse-udo field, the stats.field doesn't support it. In addition, as scheme.xml displayed, i do group on the elements of a string field(type) without using participle. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sort groups by the sum of the scores of the documents within each group
thank you, Erick, you're right, the maxScore of document within each group is more effective than the sum of scores in a group, especially some use-case just as your assumption(group 1 could have 10M documents all with a score of .01 and group 2 could have 1 document with a score of 1,000 and group 1 would sort first) ,but the function is required by the client, can you tell me the way how to achieve it ? -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134856.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing scanned PDFs
we are using SOLr to index pdf documents but there are cases where PDFs are usually a scanned document with no text to extract and index . Is there a plugin or module in SOLR that we can integrate so that it would actually extract a text / OCR and then index? Thanks in advance Chandan Tamrakar
Re: sort groups by the sum of the scores of the documents within each group
thank you, Erick, you're good man, this is the client requirement: In the forum, there is a lot of discussion of the content under different subjects, search for a keyword, which will lead to a result that the word of content or subject match the query, group these document based on every subject, sort these groups based on the sum score of every subject. my pleasure to listen your suggestions. -- View this message in context: http://lucene.472066.n3.nabble.com/Re-sort-groups-by-the-sum-of-the-scores-of-the-documents-within-each-group-tp4134715p4134869.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Indexing scanned PDFs
Nothing I am aware of for Solr directly. You may have better luck chasing this at TIKA mailing list, as that's what Solr uses under covers to index PDF otherwise. Doing a quick search for Tika and OCR brings up a number of links. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Tue, May 6, 2014 at 12:15 PM, Chandan Tamrakar chandan.tamra...@nepasoft.com wrote: we are using SOLr to index pdf documents but there are cases where PDFs are usually a scanned document with no text to extract and index . Is there a plugin or module in SOLR that we can integrate so that it would actually extract a text / OCR and then index? Thanks in advance Chandan Tamrakar
Re: Histogram facet?
The dates won't match unless you truncate all of them to day. But then if you want to have slots of 15minutes it won't work as you would need to truncate the dates every 15minutes in the index. In ES, they have 1 field to make the slots and 1 field to insert into the bucket, e.g.: { query : { match_all : {} }, facets : { histo1 : { date_histogram : { key_field : timestamp, value_field : price, interval : day } } } } Romain On Mon, May 5, 2014 at 9:05 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, I _think_ pivot faceting works here. One dimension would be day and the other retweet count. The response will have the number of retweets per day, you'd have to sum them up I suppose. Best, Erick On Mon, May 5, 2014 at 3:18 PM, Romain romain@gmail.com wrote: Hi, I am trying to plot a non date field by time in order to draw an histogram showing its evolution during the week. For example, if I have a tweet index: Tweet: date retweetCount 3 tweets indexed: Tweet | Date | Retweet A01/01 100 B01/01 100 C01/02 100 If I want to plot the number of tweets by day: easy with a date range facet: Day 1: 2 Day 2: 1 But now counting the number of retweet by day is not possible natively: Day 1: 200 Day 2: 100 On current workaround would be to do a date rage facet to get the date slots and ask only for the retweet field and compute the sums in the client. We could compute other stats like average, etc... too The closest I could see was https://issues.apache.org/jira/browse/SOLR-4772but it seems to be slightly different. Basically I am trying to do something very similar to the Date Histogram Facet http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-date-histogram-facet.html#search-facets-date-histogram-facet in ES. Is there a way to move the counting logic to the Solr server? Thanks! Romain