Re: Number of fields in schema.xml and impact on Solr
Thanks Shawn. This is good to know. Steve On Wed, Apr 22, 2015 at 9:00 AM, Shawn Heisey elyog...@elyograg.org wrote: On 4/22/2015 6:33 AM, Steven White wrote: Is there anything I should be taking into consideration if I have a large number of fields in my Solr's schema.xml file? I will be indexing records into Solr and as I create documents, each document will have between 20-200 fields. However, due to the natural of my data source, the combined flatten list of fields that I need to include in schema.xml will be upward of 2000 and may reach 3000. My questions are as follows, compare a schema with 300 fields vs. 3000: 1) Will indexing be slower? Require more memory? CPU? 2) Will the index size be larger? If so any idea by what factor? 3) Will searches be slower? Require more memory? CPU? 4) Will the field type (float, boolean, date, string, etc.) have any factor? 5) Anything else I should know that I didn't ask? I should make it clear that only about 5 fields will be of type stored while everything else will be indexed. The number of fields in your schema is likely not a significant contributor to performance. I'm sure it can have an impact because there is code that validates everything against the schema, but even with a few thousand entries, that code should execute quickly. The amount of data you are actually indexing is MUCH more relevant. The Lucene index itself is only aware of the fields that actually contain data. The entire Solr schema is not used or recorded by Lucene code at all. It is only used within code specific to Solr. Thanks, Shawn
Number of fields in schema.xml and impact on Solr
Hi Everyone Is there anything I should be taking into consideration if I have a large number of fields in my Solr's schema.xml file? I will be indexing records into Solr and as I create documents, each document will have between 20-200 fields. However, due to the natural of my data source, the combined flatten list of fields that I need to include in schema.xml will be upward of 2000 and may reach 3000. My questions are as follows, compare a schema with 300 fields vs. 3000: 1) Will indexing be slower? Require more memory? CPU? 2) Will the index size be larger? If so any idea by what factor? 3) Will searches be slower? Require more memory? CPU? 4) Will the field type (float, boolean, date, string, etc.) have any factor? 5) Anything else I should know that I didn't ask? I should make it clear that only about 5 fields will be of type stored while everything else will be indexed. Thanks Steve
Re: Checking of Solr Memory and Disk usage
I see. I'm running on SolrCloud with 2 replicia, so I guess mine will probably use much more when my system reaches millions of documents. Regards, Edwin On 22 April 2015 at 20:47, Shawn Heisey apa...@elyograg.org wrote: On 4/22/2015 12:11 AM, Zheng Lin Edwin Yeo wrote: Roughly how many collections and how much records do you have in your Solr? I have 8 collections with a total of roughly 227000 records, most of which are CSV records. One of my collections have 142000 records. The core that shows 82MB for heap usage has 16 million documents and is hit with an average of 1 or 2 queries per second. The entire Solr instance on this machine has about 55 million documents and a 6GB max heap. This is NOT running SolrCloud, though the indexes are distributed. There are 24 cores defined, but during normal operation, only four of them contain documents. All four of those cores show heap memory values less than 100MB, but the overall heap usage on that machine is measured in gigabytes. Thanks, Shawn
Odp.: Suggester
For the sake of others who would look for the solution and stumble upon this thread, consider sharing. I'd expect Solr to return whole field, if it's a text block then that's it. @LAFK_PL Oryginalna wiadomość Od: Martin Keller Wysłano: środa, 22 kwietnia 2015 16:36 Do: solr-user@lucene.apache.org Odpowiedz: solr-user@lucene.apache.org Temat: Re: Suggester OK, I found the problem and as so often it was sitting in front of the display. Now the next problem: The suggestions returned consist always of a complete text block where the match was found. I would have expected a single word or a small phrase. Thanks in advance Martin Am 22.04.2015 um 12:50 schrieb Martin Keller martin.kel...@unitedplanet.com: Unfortunately, setting suggestAnalyzerFieldType to text_suggest didn’t change anything. The suggest dictionary is freshly built. As I mentioned before, only words or phrases of the source field „content“ are not matched. When querying the index, the response only contains „suggestions“ field data not coming from the „content“ field. The complete schema is a slightly modified techproducts schema. „Normal“ searching for words which I would expect coming from „content“ works. Any more ideas? Thanks Martin Am 21.04.2015 um 17:39 schrieb Erick Erickson erickerick...@gmail.com: Did you build your suggest dictionary after indexing? Kind of a shot in the dark but worth a try. Note that the suggest field of your suggester isn't using your text_suggest field type to make suggestions, it's using text_general. IOW, the text may not be analyzed as you expect. Best, Erick On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller martin.kel...@unitedplanet.com wrote: Hello together, I have some problems with the Solr 5.1.0 suggester. I followed the instructions in https://cwiki.apache.org/confluence/display/solr/Suggester and also tried the techproducts example delivered with the binary package, which is working well. I added a field suggestions-Field to the schema: field name=suggestions type=text_suggest indexed=true stored=true multiValued=true“/ And added some copies to the field: copyField source=content dest=suggestions/ copyField source=title dest=suggestions/ copyField source=author dest=suggestions/ copyField source=description dest=suggestions/ copyField source=keywords dest=suggestions/ The field type definition for „text_suggest“ is pretty simple: fieldType name=text_suggest class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I Also changed the solrconfig.xml to use the suggestions field: searchComponent class=solr.SuggestComponent name=suggest lst name=suggester str name=namemySuggester/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldsuggestions/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnStartupfalse/str /lst /searchComponent For Tokens original coming from „title or „author“, I get suggestions, but not any from the content field. So, what do I have to do? Any help is appreciated. Martin
Re: MLT causing Problems
Anything more informative in the Solr logs? Best, Erick On Wed, Apr 22, 2015 at 2:45 AM, Srinivas Rishindra sririshin...@gmail.com wrote: Hello, I am working on a project in which i have to find similar documents. While I implementing the following error is occurring. Please let me know what to do. Exception in thread main org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8980/solr/rishi: Expected mime type application/octet-stream but got text/html. html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 404 Not Found/title /head bodyh2HTTP ERROR 404/h2 pProblem accessing /solr/rishi/mlt. Reason: preNot Found/pre/phr /ismallPowered by Jetty:///small/ibr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:525) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:233) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:225) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958) at MoreLikeThis.main(MoreLikeThis.java:31)
After language detection is enabled, SOLR (5.1) isn't indexing anything
Hi guys, I've enabled language detection in solrconfig.xml: updateRequestProcessorChain name=langid processor class= org.apache.solr.update.processor.TikaLanguageIdentifierUpdateProcessorFactory lst name=defaults str name=langid.flcontent,title/str str name=langid.fallbacken/str str name=langid.langFieldlanguage_s/str str name=langid.lcmapen_GB:en en_US:en/str str name=langid.map.lcmapen_GB:en en_US:en/str /lst /processor /updateRequestProcessorChain Then I have: requestHandler name=/update class=solr.UpdateRequestHandler !-- See below for information on defining updateRequestProcessorChains that can be used by name on each Update Request -- lst name=defaults str name=update.chainlangid/str /lst /requestHandler When I try to index a document, it's not added to the SOLR index. If I remove the above code, everything works fine. Do i need to make any specific changes to the schema.xml? Here is an excerpt of it : field name=title type=string indexed=true stored=true required= false multiValued=false / field name=title_en type=string indexed=true stored=true required= false multiValued=false / field name=content type=multilang_text_exact indexed=true stored= true/ fieldType name=multilang_text_exact class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.LetterTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.LetterTokenizerFactory/ /analyzer /fieldType I don't get any errors in the SOLR console output. Do i need to add _en and _LANG ID suffixes to all fields in my schema, for the above to work? I mean, do i need to have title, title_en, title_jp, and so on - manually defined in the schema? I still don't understand why a document isn't added at all, without any error being thrown. Thank you, Angel
Re: Boolean filter query not working as expected
A purely negative sub-query is not supported by Lucene - you need to have at least one positive term, such as *:*, at each level of sub-query. Try: ((*:* -(field:V1) AND -(field:V2)) AND -(field:V3)) -- Jack Krupansky On Wed, Apr 22, 2015 at 10:56 AM, Dhutia, Devansh ddhu...@gannett.com wrote: I have an automated filter query builder that uses the SolrNet nuget package to build out boolean filters. I have a scenario where it is generating a fq in the following format: ((-(field:V1) AND -(field:V2)) AND -(field:V3)) The filter looks legal to me (albeit with extra parentheses), but the above yields 0 total results, even though I know eligible data exists. If I manually re-write the above filter as (-(field:V1) AND -(field:V2) AND -(field:V3)) I get the expected results. I realize the auto generated filter could be rewritten in a different way, but the question still remains, why is the first version not returning any results? Solr does not report any errors returns successfully, just with 0 results. Thanks
Re: Odp.: solr issue with pdf forms
Are they not _indexed_ correctly or not being displayed correctly? Take a look at admin UIschema browser your field and press the load terms button. That'll show you what is _in_ the index as opposed to what the raw data looked like. When you return the field in a Solr search, you get a verbatim, un-analyzed copy of your original input. My guess is that your browser isn't using the compatible character encoding for display. Best, Erick On Wed, Apr 22, 2015 at 7:08 AM, steve.sch...@t-systems.com wrote: Thanks for your answer. Maybe my English is not good enough, what are you trying to say? Sorry I didn't get the point. :-( -Ursprüngliche Nachricht- Von: LAFK [mailto:tomasz.bo...@gmail.com] Gesendet: Mittwoch, 22. April 2015 14:01 An: solr-user@lucene.apache.org; solr-user@lucene.apache.org Betreff: Odp.: solr issue with pdf forms Out of my head I'd follow how are writable PDFs created and encoded. @LAFK_PL Oryginalna wiadomość Od: steve.sch...@t-systems.com Wysłano: środa, 22 kwietnia 2015 12:41 Do: solr-user@lucene.apache.org Odpowiedz: solr-user@lucene.apache.org Temat: solr issue with pdf forms Hi guys, hopefully you can help me with my issue. We are using a solr setup and have the following issue: - usual pdf files are indexed just fine - pdf files with writable form-fields look like this: Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind Somehow the blank space character is not indexed correctly. Is this a know issue? Does anybody have an idea? Thanks a lot Best Steve
Re: Document Created Date
Sorry if my question was too vague. In my mind it wasn't but you led me in the right direction which gave me a new issue. I added the following to my schema.xml to bring back the Created Date: field name=created type=date indexed=false stored=true/ but now I am getting back the created date for PDF files but not for Word documents (specifically .doc and .docx). Has anyone run into this issue? If I look at the properties for all three types of files the Create Date is called created so I am not sure what I am doing wrong. Thanks for the help in advanced. Eric Erick Erickson erickerick...@gmail.com 4/21/2015 11:45 AM Not really sure what you're asking here, I must be missing something. The mapping is through the field name supplied, so as long as your input XML has something like add doc field name=CreatedDateyour date here/field /doc /add it should be fine. You can use date math here as well, as: field name=CreatedDateNOW/field Best, Erick On Tue, Apr 21, 2015 at 7:57 AM, Eric Meisler eric.meis...@veritablelp.com wrote: I am a newbie and just started using Solr 4.10.3. We have successfully indexed a network drive and are running searches. We now have a request to show the Created Date for all documents (PDF/WORD/TXT/XLS) that come back in our search results. I have successfully filtered on the last_modified date but I cannot figure out or find out how to add a document's Created Date to the schema.xml. We do not want to search on the created date since last_modified date handles this but just want to display it. To my understanding I need to add indexed=false and stored=true to the xml field but I don't know how or understand how the xml name will map to the document's created date property. This is my guess: field name=CreatedDate type=date indexed=false stored=true/ Can someone please supply the correct syntax for the xml and maybe a brief comment on how solr maps to the actual document's property? Also, will I need to re-index the dive to make this change apply? Thanks, Eric
Re: Number of fields in schema.xml and impact on Solr
On 4/22/2015 6:33 AM, Steven White wrote: Is there anything I should be taking into consideration if I have a large number of fields in my Solr's schema.xml file? I will be indexing records into Solr and as I create documents, each document will have between 20-200 fields. However, due to the natural of my data source, the combined flatten list of fields that I need to include in schema.xml will be upward of 2000 and may reach 3000. My questions are as follows, compare a schema with 300 fields vs. 3000: 1) Will indexing be slower? Require more memory? CPU? 2) Will the index size be larger? If so any idea by what factor? 3) Will searches be slower? Require more memory? CPU? 4) Will the field type (float, boolean, date, string, etc.) have any factor? 5) Anything else I should know that I didn't ask? I should make it clear that only about 5 fields will be of type stored while everything else will be indexed. The number of fields in your schema is likely not a significant contributor to performance. I'm sure it can have an impact because there is code that validates everything against the schema, but even with a few thousand entries, that code should execute quickly. The amount of data you are actually indexing is MUCH more relevant. The Lucene index itself is only aware of the fields that actually contain data. The entire Solr schema is not used or recorded by Lucene code at all. It is only used within code specific to Solr. Thanks, Shawn
Solr Error Message ShutDown
Hi , We are having an issue without PROD environment and its say below message when we access solr using browser.. HTTP Status 503 - Server is shutting down or failed to initialize type Status report message Server is shutting down or failed to initialize description The requested service is not currently available. Apache Tomcat/7.0.59 Any Suggestions or similar will help US. Note: This happens after Microsoft Patch, The solr is in Windows environment (2012) Thanks Ravi
Re: Checking of Solr Memory and Disk usage
On 4/22/2015 12:11 AM, Zheng Lin Edwin Yeo wrote: Roughly how many collections and how much records do you have in your Solr? I have 8 collections with a total of roughly 227000 records, most of which are CSV records. One of my collections have 142000 records. The core that shows 82MB for heap usage has 16 million documents and is hit with an average of 1 or 2 queries per second. The entire Solr instance on this machine has about 55 million documents and a 6GB max heap. This is NOT running SolrCloud, though the indexes are distributed. There are 24 cores defined, but during normal operation, only four of them contain documents. All four of those cores show heap memory values less than 100MB, but the overall heap usage on that machine is measured in gigabytes. Thanks, Shawn
Getting error while searching meaningless words
Hello There, We are using hybris with SOLR (4.6.1) I checked the https://issues.apache.org/jira/browse/SOLR-6563 and saw that problem has been solved. However we are still getting same problem on standalone server. There is no problem on embedded server. Is there any idea? You can find log file on below. rg.springframework.web.util.NestedServletException: Request processing failed; nested exception is org.apache.solr.common.SolrException: org.apache.http.ParseException: Invalid content type: org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:948) org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:827) javax.servlet.http.HttpServlet.service(HttpServlet.java:620) org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:812) javax.servlet.http.HttpServlet.service(HttpServlet.java:727) de.hybris.platform.servicelayer.web.AbstractPlatformFilterChain$InternalFilterChain.doFilter(AbstractPlatformFilterChain.java:256) de.hybris.platform.servicelayer.web.AbstractPlatformFilterChain$StatisticsGatewayFilter.doFilter(AbstractPlatformFilterChain.java:345) de.hybris.platform.servicelayer.web.AbstractPlatformFilterChain$InternalFilterChain.doFilter(AbstractPlatformFilterChain.java:226)
Re: Odp.: phraseFreq vs sloppyFreq
LAFK, Yes, or even more, than 1k. Based on sloppyFreq component (hopefully, same as phraseFreq) we get documents where keywords occur near each other ranked higher. As if we used slop=10 or something. On Wed, Apr 22, 2015 at 2:59 PM, LAFK tomasz.bo...@gmail.com wrote: Out of curiosity, why proximity 1k? @LAFK_PL Oryginalna wiadomość Od: Dmitry Kan Wysłano: środa, 22 kwietnia 2015 09:26 Do: solr-user@lucene.apache.org Odpowiedz: solr-user@lucene.apache.org Temat: phraseFreq vs sloppyFreq Hi guys. I'm executing the following proximity query: leader the~1000. In the debugQuery I see phraseFreq=0.032258064. Is phraseFreq same thing as sloppyFreq from https://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/DefaultSimilarity.html ? Do higher phraserFreq increase the final similarity score? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Highlighting in Solr
Hi, I'm currently implementing highlighting on my Solr-5.0.0. When I issue the following command: http://localhost:8983/solr/collection1/select?q=conducted http://localhost:8983/solr/edmtechnical/select?q=conducted hl=truehl.fl=Content,Summarywt=jsonindent=truerows=10, the highlighting result is listed at the bottom of the output, instead of together with the rest of the response above. The result is shown below: response:{numFound:10,start:0,docs:[ { id:1-1, Summary:i} Trial conducted, Content:Completed, _version_:1498407036159787020}, highlighting:{ 1-1:{ Summary:[i) Trial emconducted/em]} Is there any way to get the highlighted output to be displayed together with the rest of the response, instead of having it display separately at the bottom? Which is something like this response:{numFound:10,start:0,docs:[ { id:1-1, Summary:i} Trial emconducted/em, Content:Completed, _version_:1498407036159787020}, Regards, Edwin
Boolean filter query not working as expected
I have an automated filter query builder that uses the SolrNet nuget package to build out boolean filters. I have a scenario where it is generating a fq in the following format: ((-(field:V1) AND -(field:V2)) AND -(field:V3)) The filter looks legal to me (albeit with extra parentheses), but the above yields 0 total results, even though I know eligible data exists. If I manually re-write the above filter as (-(field:V1) AND -(field:V2) AND -(field:V3)) I get the expected results. I realize the auto generated filter could be rewritten in a different way, but the question still remains, why is the first version not returning any results? Solr does not report any errors returns successfully, just with 0 results. Thanks
Re: Boolean filter query not working as expected
If I upgrade to using the edismax parser in my fq, I get the desired results. The default lucene parser on fq must not be able to parse the more complex nested clauses q=*:*fq={!type=edismax}((-(field:V1) AND -(field:V2)) AND -(field:V3)) - Works On 4/22/15, 3:27 PM, Dhutia, Devansh ddhu...@gannett.com wrote: I don’t know if that’s completely true, or maybe I’m misunderstanding something. If it doesn’t support purely negative subqueries, this shouldn't work, but does: q=*:*fq=(-(field:V1)) However, for me, the following is a summary of what works what doesn’t. q=*:*fq=(-(field:V1)) - Works q=*:*fq=((-(field:V1) AND -(field:V2)) AND -(field:V3)) - Doesn’t work q=*:*fq=(-(field:V1) AND -(field:V2) AND -(field:V3)) - Works q=*:*fq=((*:* -(field:V1) AND -(field:V2)) AND -(field:V3)) - Works On 4/22/15, 3:02 PM, Jack Krupansky jack.krupan...@gmail.com wrote: A purely negative sub-query is not supported by Lucene - you need to have at least one positive term, such as *:*, at each level of sub-query. Try: ((*:* -(field:V1) AND -(field:V2)) AND -(field:V3)) -- Jack Krupansky On Wed, Apr 22, 2015 at 10:56 AM, Dhutia, Devansh ddhu...@gannett.com wrote: I have an automated filter query builder that uses the SolrNet nuget package to build out boolean filters. I have a scenario where it is generating a fq in the following format: ((-(field:V1) AND -(field:V2)) AND -(field:V3)) The filter looks legal to me (albeit with extra parentheses), but the above yields 0 total results, even though I know eligible data exists. If I manually re-write the above filter as (-(field:V1) AND -(field:V2) AND -(field:V3)) I get the expected results. I realize the auto generated filter could be rewritten in a different way, but the question still remains, why is the first version not returning any results? Solr does not report any errors returns successfully, just with 0 results. Thanks
no subject
On 4/22/15, 7:36 AM, Martin Keller martin.kel...@unitedplanet.com wrote: OK, I found the problem and as so often it was sitting in front of the display. Now the next problem: The suggestions returned consist always of a complete text block where the match was found. I would have expected a single word or a small phrase. Thanks in advance Martin Am 22.04.2015 um 12:50 schrieb Martin Keller martin.kel...@unitedplanet.com: Unfortunately, setting suggestAnalyzerFieldType to text_suggest didn’t change anything. The suggest dictionary is freshly built. As I mentioned before, only words or phrases of the source field „content“ are not matched. When querying the index, the response only contains „suggestions“ field data not coming from the „content“ field. The complete schema is a slightly modified techproducts schema. „Normal“ searching for words which I would expect coming from „content“ works. Any more ideas? Thanks Martin Am 21.04.2015 um 17:39 schrieb Erick Erickson erickerick...@gmail.com: Did you build your suggest dictionary after indexing? Kind of a shot in the dark but worth a try. Note that the suggest field of your suggester isn't using your text_suggest field type to make suggestions, it's using text_general. IOW, the text may not be analyzed as you expect. Best, Erick On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller martin.kel...@unitedplanet.com wrote: Hello together, I have some problems with the Solr 5.1.0 suggester. I followed the instructions in https://cwiki.apache.org/confluence/display/solr/Suggester and also tried the techproducts example delivered with the binary package, which is working well. I added a field suggestions-Field to the schema: field name=suggestions type=text_suggest indexed=true stored=true multiValued=true“/ And added some copies to the field: copyField source=content dest=suggestions/ copyField source=title dest=suggestions/ copyField source=author dest=suggestions/ copyField source=description dest=suggestions/ copyField source=keywords dest=suggestions/ The field type definition for „text_suggest“ is pretty simple: fieldType name=text_suggest class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I Also changed the solrconfig.xml to use the suggestions field: searchComponent class=solr.SuggestComponent name=suggest lst name=suggester str name=namemySuggester/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldsuggestions/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnStartupfalse/str /lst /searchComponent For Tokens original coming from „title or „author“, I get suggestions, but not any from the content field. So, what do I have to do? Any help is appreciated. Martin
Re: Bad contentType for search handler :text/xml; charset=UTF-8
text/xml is not a safe content-type, because of the way that HTTP handles charsets. Always use application/xml. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 22, 2015, at 3:01 AM, bengates benga...@aliceadsl.fr wrote: Looks like Solarium hardcodes a default header Content-Type: text/xml; charset=utf-8 if none provided. Removing it solves the problem. It seems that Solr 5.1 doesn't support this content-type. -- View this message in context: http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201579.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boolean filter query not working as expected
I don’t know if that’s completely true, or maybe I’m misunderstanding something. If it doesn’t support purely negative subqueries, this shouldn't work, but does: q=*:*fq=(-(field:V1)) However, for me, the following is a summary of what works what doesn’t. q=*:*fq=(-(field:V1)) - Works q=*:*fq=((-(field:V1) AND -(field:V2)) AND -(field:V3)) - Doesn’t work q=*:*fq=(-(field:V1) AND -(field:V2) AND -(field:V3)) - Works q=*:*fq=((*:* -(field:V1) AND -(field:V2)) AND -(field:V3)) - Works On 4/22/15, 3:02 PM, Jack Krupansky jack.krupan...@gmail.com wrote: A purely negative sub-query is not supported by Lucene - you need to have at least one positive term, such as *:*, at each level of sub-query. Try: ((*:* -(field:V1) AND -(field:V2)) AND -(field:V3)) -- Jack Krupansky On Wed, Apr 22, 2015 at 10:56 AM, Dhutia, Devansh ddhu...@gannett.com wrote: I have an automated filter query builder that uses the SolrNet nuget package to build out boolean filters. I have a scenario where it is generating a fq in the following format: ((-(field:V1) AND -(field:V2)) AND -(field:V3)) The filter looks legal to me (albeit with extra parentheses), but the above yields 0 total results, even though I know eligible data exists. If I manually re-write the above filter as (-(field:V1) AND -(field:V2) AND -(field:V3)) I get the expected results. I realize the auto generated filter could be rewritten in a different way, but the question still remains, why is the first version not returning any results? Solr does not report any errors returns successfully, just with 0 results. Thanks
Re: solr issue with pdf forms
Steve, Are you using ExtractingRequestHandler / DataImportHandler or extracting the text content from the PDF outside of Solr? On Wed, Apr 22, 2015 at 6:40 AM, steve.sch...@t-systems.com wrote: Hi guys, hopefully you can help me with my issue. We are using a solr setup and have the following issue: - usual pdf files are indexed just fine - pdf files with writable form-fields look like this: Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind Somehow the blank space character is not indexed correctly. Is this a know issue? Does anybody have an idea? Thanks a lot Best Steve
Re: Odp.: solr issue with pdf forms
+1 - I like Erick's answer. Let me know if that turns out to be the problem - I'm interested in this problem and would be happy to help. On Wed, Apr 22, 2015 at 11:11 AM, Erick Erickson erickerick...@gmail.com wrote: Are they not _indexed_ correctly or not being displayed correctly? Take a look at admin UIschema browser your field and press the load terms button. That'll show you what is _in_ the index as opposed to what the raw data looked like. When you return the field in a Solr search, you get a verbatim, un-analyzed copy of your original input. My guess is that your browser isn't using the compatible character encoding for display. Best, Erick On Wed, Apr 22, 2015 at 7:08 AM, steve.sch...@t-systems.com wrote: Thanks for your answer. Maybe my English is not good enough, what are you trying to say? Sorry I didn't get the point. :-( -Ursprüngliche Nachricht- Von: LAFK [mailto:tomasz.bo...@gmail.com] Gesendet: Mittwoch, 22. April 2015 14:01 An: solr-user@lucene.apache.org; solr-user@lucene.apache.org Betreff: Odp.: solr issue with pdf forms Out of my head I'd follow how are writable PDFs created and encoded. @LAFK_PL Oryginalna wiadomość Od: steve.sch...@t-systems.com Wysłano: środa, 22 kwietnia 2015 12:41 Do: solr-user@lucene.apache.org Odpowiedz: solr-user@lucene.apache.org Temat: solr issue with pdf forms Hi guys, hopefully you can help me with my issue. We are using a solr setup and have the following issue: - usual pdf files are indexed just fine - pdf files with writable form-fields look like this: Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind Somehow the blank space character is not indexed correctly. Is this a know issue? Does anybody have an idea? Thanks a lot Best Steve
Re: Suggester
OK, I found the problem and as so often it was sitting in front of the display. Now the next problem: The suggestions returned consist always of a complete text block where the match was found. I would have expected a single word or a small phrase. Thanks in advance Martin Am 22.04.2015 um 12:50 schrieb Martin Keller martin.kel...@unitedplanet.com: Unfortunately, setting suggestAnalyzerFieldType to text_suggest didn’t change anything. The suggest dictionary is freshly built. As I mentioned before, only words or phrases of the source field „content“ are not matched. When querying the index, the response only contains „suggestions“ field data not coming from the „content“ field. The complete schema is a slightly modified techproducts schema. „Normal“ searching for words which I would expect coming from „content“ works. Any more ideas? Thanks Martin Am 21.04.2015 um 17:39 schrieb Erick Erickson erickerick...@gmail.com: Did you build your suggest dictionary after indexing? Kind of a shot in the dark but worth a try. Note that the suggest field of your suggester isn't using your text_suggest field type to make suggestions, it's using text_general. IOW, the text may not be analyzed as you expect. Best, Erick On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller martin.kel...@unitedplanet.com wrote: Hello together, I have some problems with the Solr 5.1.0 suggester. I followed the instructions in https://cwiki.apache.org/confluence/display/solr/Suggester and also tried the techproducts example delivered with the binary package, which is working well. I added a field suggestions-Field to the schema: field name=suggestions type=text_suggest indexed=true stored=true multiValued=true“/ And added some copies to the field: copyField source=content dest=suggestions/ copyField source=title dest=suggestions/ copyField source=author dest=suggestions/ copyField source=description dest=suggestions/ copyField source=keywords dest=suggestions/ The field type definition for „text_suggest“ is pretty simple: fieldType name=text_suggest class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I Also changed the solrconfig.xml to use the suggestions field: searchComponent class=solr.SuggestComponent name=suggest lst name=suggester str name=namemySuggester/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldsuggestions/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnStartupfalse/str /lst /searchComponent For Tokens original coming from „title or „author“, I get suggestions, but not any from the content field. So, what do I have to do? Any help is appreciated. Martin
Re: Bad contentType for search handler :text/xml; charset=UTF-8
A similar problem seems to happen when sending application/json to the search handler. Solr returns a NullPointerException for some reason: vagrant@precise64:~/solr-5.1.0$ curl http://localhost:8983/solr/gettingstarted/select?wt=jsonindent=trueq=foundation; -H Content-type:application/json { responseHeader:{ status:500, QTime:2, params:{ indent:true, json:, q:foundation, wt:json}}, error:{ trace:java.lang.NullPointerException\n\tat org.apache.solr.request.json.ObjectUtil$ConflictHandler.mergeMap(ObjectUtil.java:60)\n\tat org.apache.solr.request.json.ObjectUtil.mergeObjects(ObjectUtil.java:114)\n\tat org.apache.solr.request.json.RequestUtil.mergeJSON(RequestUtil.java:259)\n\tat org.apache.solr.request.json.RequestUtil.processParams(RequestUtil.java:176)\n\tat org.apache.solr.util.SolrPluginUtils.setDefaults(SolrPluginUtils.java:166)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:140)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat java.lang.Thread.run(Thread.java:745)\n, code:500}} On Wed, Apr 22, 2015 at 9:41 AM, Walter Underwood wun...@wunderwood.org wrote: text/xml is not a safe content-type, because of the way that HTTP handles charsets. Always use application/xml. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 22, 2015, at 3:01 AM, bengates benga...@aliceadsl.fr wrote: Looks like Solarium hardcodes a default header Content-Type: text/xml; charset=utf-8 if none provided. Removing it solves the problem. It seems that Solr 5.1 doesn't support this content-type. -- View this message in context: http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201579.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document Created Date
The generic problem with all the semi-structured documents is that the meta-data has no consistent naming. Making up names here, but Word might have created_on, PDF created etc. Its really frustrating, but each type has to be investigated to figure out which field you want to map to created. Tika and SolrCel just map what they find. On way to go about this is to map the dynamic glob pattern to a stored field, then look at what pops out. Not satisfactory, but... dynamicField name=* type=string stored=true multiValued=true / Best, Erick On Wed, Apr 22, 2015 at 5:44 AM, Eric Meisler eric.meis...@veritablelp.com wrote: Sorry if my question was too vague. In my mind it wasn't but you led me in the right direction which gave me a new issue. I added the following to my schema.xml to bring back the Created Date: field name=created type=date indexed=false stored=true/ but now I am getting back the created date for PDF files but not for Word documents (specifically .doc and .docx). Has anyone run into this issue? If I look at the properties for all three types of files the Create Date is called created so I am not sure what I am doing wrong. Thanks for the help in advanced. Eric Erick Erickson erickerick...@gmail.com 4/21/2015 11:45 AM Not really sure what you're asking here, I must be missing something. The mapping is through the field name supplied, so as long as your input XML has something like add doc field name=CreatedDateyour date here/field /doc /add it should be fine. You can use date math here as well, as: field name=CreatedDateNOW/field Best, Erick On Tue, Apr 21, 2015 at 7:57 AM, Eric Meisler eric.meis...@veritablelp.com wrote: I am a newbie and just started using Solr 4.10.3. We have successfully indexed a network drive and are running searches. We now have a request to show the Created Date for all documents (PDF/WORD/TXT/XLS) that come back in our search results. I have successfully filtered on the last_modified date but I cannot figure out or find out how to add a document's Created Date to the schema.xml. We do not want to search on the created date since last_modified date handles this but just want to display it. To my understanding I need to add indexed=false and stored=true to the xml field but I don't know how or understand how the xml name will map to the document's created date property. This is my guess: field name=CreatedDate type=date indexed=false stored=true/ Can someone please supply the correct syntax for the xml and maybe a brief comment on how solr maps to the actual document's property? Also, will I need to re-index the dive to make this change apply? Thanks, Eric
Re: Solr Error Message ShutDown
What version of Solr? And do the Solr logs show anything useful? Or catalina.out? Best, Erick On Wed, Apr 22, 2015 at 7:23 AM, EXTERNAL Taminidi Ravi (ETI, AA-AS/PAS-PTS) external.ravi.tamin...@us.bosch.com wrote: Hi , We are having an issue without PROD environment and its say below message when we access solr using browser.. HTTP Status 503 - Server is shutting down or failed to initialize type Status report message Server is shutting down or failed to initialize description The requested service is not currently available. Apache Tomcat/7.0.59 Any Suggestions or similar will help US. Note: This happens after Microsoft Patch, The solr is in Windows environment (2012) Thanks Ravi
Re: Odp.: Suggester
Right, this is what the suggester you're using is built for. Which is actually way cool for certain situations. Try the FreeTextLookupFactory (warning, I'm not too familiar with the nuances here) Or maybe spelling suggestions are more what you're looking for which look at the terms and return a term at a time. Best, Erick On Wed, Apr 22, 2015 at 7:59 AM, LAFK tomasz.bo...@gmail.com wrote: For the sake of others who would look for the solution and stumble upon this thread, consider sharing. I'd expect Solr to return whole field, if it's a text block then that's it. @LAFK_PL Oryginalna wiadomość Od: Martin Keller Wysłano: środa, 22 kwietnia 2015 16:36 Do: solr-user@lucene.apache.org Odpowiedz: solr-user@lucene.apache.org Temat: Re: Suggester OK, I found the problem and as so often it was sitting in front of the display. Now the next problem: The suggestions returned consist always of a complete text block where the match was found. I would have expected a single word or a small phrase. Thanks in advance Martin Am 22.04.2015 um 12:50 schrieb Martin Keller martin.kel...@unitedplanet.com: Unfortunately, setting suggestAnalyzerFieldType to text_suggest didn’t change anything. The suggest dictionary is freshly built. As I mentioned before, only words or phrases of the source field „content“ are not matched. When querying the index, the response only contains „suggestions“ field data not coming from the „content“ field. The complete schema is a slightly modified techproducts schema. „Normal“ searching for words which I would expect coming from „content“ works. Any more ideas? Thanks Martin Am 21.04.2015 um 17:39 schrieb Erick Erickson erickerick...@gmail.com: Did you build your suggest dictionary after indexing? Kind of a shot in the dark but worth a try. Note that the suggest field of your suggester isn't using your text_suggest field type to make suggestions, it's using text_general. IOW, the text may not be analyzed as you expect. Best, Erick On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller martin.kel...@unitedplanet.com wrote: Hello together, I have some problems with the Solr 5.1.0 suggester. I followed the instructions in https://cwiki.apache.org/confluence/display/solr/Suggester and also tried the techproducts example delivered with the binary package, which is working well. I added a field suggestions-Field to the schema: field name=suggestions type=text_suggest indexed=true stored=true multiValued=true“/ And added some copies to the field: copyField source=content dest=suggestions/ copyField source=title dest=suggestions/ copyField source=author dest=suggestions/ copyField source=description dest=suggestions/ copyField source=keywords dest=suggestions/ The field type definition for „text_suggest“ is pretty simple: fieldType name=text_suggest class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I Also changed the solrconfig.xml to use the suggestions field: searchComponent class=solr.SuggestComponent name=suggest lst name=suggester str name=namemySuggester/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldsuggestions/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnStartupfalse/str /lst /searchComponent For Tokens original coming from „title or „author“, I get suggestions, but not any from the content field. So, what do I have to do? Any help is appreciated. Martin
Re: Checking of Solr Memory and Disk usage
Roughly how many collections and how much records do you have in your Solr? I have 8 collections with a total of roughly 227000 records, most of which are CSV records. One of my collections have 142000 records. Regards, Edwin On 22 April 2015 at 13:49, Shawn Heisey apa...@elyograg.org wrote: On 4/21/2015 11:33 PM, Zheng Lin Edwin Yeo wrote: I've got the amount of disk space used, but for the Heap Memory Usage reading, it is showing the value -1. Do we need to change any settings for it? When I check from the Windows Task Manager, it is showing about 300MB for shard1 and 150MB for shard2. But I suppose that is the usage for the entire Solr and not for individual collection. That -1 sounds like a bug, but I'd like others to have a chance to chime in before you open an issue in Jira. My Solr instances are older -- 4.7.2 and 4.9.1. One of the larger cores on a 4.7.2 server shows a heap memory value of 86656138 -- about 82MB. I have no way to verify, but this seems very low to me. Thanks, Shawn
Suggestion in Solr Cloud
Hi All, I want to use suggest option in solr but my SOLR is in cloud mode hence to get the suggestion every time in query I need to provide shard url with it like below:- http://node1/solr/city/suggest?suggest.dictionary=solr-suggestersuggest=truesuggest.build=truesuggest.q=Delhishards=node1/solr/city,node2/solr/cityshards.qt=/suggest Here my requirement is, if any ways where I get the same suggestion by not providing shards in the query. Regards, Swaraj Kumar Senior Software Engineer I MakeMyTrip.com Mob No- 9811774497
Re: Solr 4.10.x regression in map-reduce contrib
I got same issue when using 4.10.2. I suspected this issue will cause trouble when using too many reducers. Then I tried to use less reducers, and made it work. I do not think map-reduce contrib in this version is stable... Anyway it is free. On Tue, Apr 21, 2015 at 10:56 PM, ralph tice ralph.t...@gmail.com wrote: Hello list, I'm using mapreduce from contrib and I get this stack trace: https://gist.github.com/ralph-tice/b1e84bdeb64532c7ecab Whenever I specify luceneMatchVersion4.10/luceneMatchVersion in my solrconfig.xml. 4.9 works fine. I'm using 4.10.4 artifacts for both map reduce runs. I tried raising maxWarmingSearchers to 20 and set openSearcher to false in my configs with no difference. I have started studying the code, but why would BatchWriter invoke warming (autowarming?) on a close, let alone opening a new searcher? Should I be looking in Lucene or Solr code to investigate this regression? I also notice there are interesting defaults for FaultTolerance in SolrReducer that don't appear to be documented: https://github.com/apache/lucene-solr/blob/trunk/solr/contrib/map-reduce/src/java/org/apache/solr/hadoop/SolrReducer.java#L70-L73 but reading https://issues.apache.org/jira/browse/SOLR-5758 sounds like they are either unimportant or overlooked? Also, we will probably be testing mapreduce contrib with 5.x, has anyone been successful with this yet or are there any known issues? I don't see a lot of changes on contrib/map-reduce... Regards, --Ralph Tice ralph.t...@gmail.com -- Regards, Shenghua (Daniel) Wan
Re: Complete list of field type that Solr supports
: To be clear, here is an example of a type from Solr's schema.xml: : : field name=weight type=float indexed=true stored=true/ : : Here, the type is float. I'm looking for the complete list of : out-of-the-box types supported. what you are asking about are just symbolic names that come from type/ definitions in the schema.xml -- there is no complete list. you can add any arbitrary type name=foo .../ you want to your schema, and now you've introduced a new type that solr supports. As far as the list of all FieldType *classes* that exist in solr out of the box (ie: the list of classes that can be specified in type/ declarations, that is a bit more straight forward... https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr -Hoss http://www.lucidworks.com/
Re: Complete list of field type that Solr supports
: I'm confused. If type=float is just a symbolic name, how does Solr knows : to index the data of field weight as float? What about for date per : this example: : : field name=last_modified type=date indexed=true stored=true/ : : How does Solr applies date-range queries such as: because somewhere else in your schema is a type/ declaration that defines type name=date ... using class=solr.TrieDateField you asked for the complete list of all possible values for the type attribute on a field/ -- the answer is infinite because the possible values for the type attribute on a field/ is dictated by whatever you might choose to specify as the name attribute on a type/ : I was always under the impression that there are primitive field-types but : looks like that's not the case? There are FieldType *classes* which can be configured a variety of ways in your schema.xml, and then reused by different fields -- but the *names* of those types is up to you. for example: the exact same TriDateField *class* can be configured in your schema.xml to implement 2 differnet *types* named date_foo and date_bar by using different default options (maybe one uses a non-default precisionStep and defaults to stored=true while the other uses the default precisionStep and defaults to stored=faluse) ... those two diff types can then both be used in your schema... field name=last_modified type=date_foo indexed=true / field name=pub_date type=date_bar indexed=true / ...and have different behavior. : : Thanks : : Steve : : On Wed, Apr 22, 2015 at 12:59 PM, Chris Hostetter hossman_luc...@fucit.org : wrote: : : : : To be clear, here is an example of a type from Solr's schema.xml: : : : : field name=weight type=float indexed=true stored=true/ : : : : Here, the type is float. I'm looking for the complete list of : : out-of-the-box types supported. : : what you are asking about are just symbolic names that come from type/ : definitions in the schema.xml -- there is no complete list. you can add : any arbitrary type name=foo .../ you want to your schema, and now : you've introduced a new type that solr supports. : : As far as the list of all FieldType *classes* that exist in solr out of : the box (ie: the list of classes that can be specified in type/ : declarations, that is a bit more straight forward... : : : https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr : : : : -Hoss : http://www.lucidworks.com/ : : -Hoss http://www.lucidworks.com/
Complete list of field type that Solr supports
Hi Everyone, I Googled for this with no luck. Where can I find a complete list of field type that Solr supports? In the sample scheam.xml that comes with Solr 5 and prior version, I am able to compile a list such as boolean, float, string, etc. but I cannot find a complete list documented somewhere. To be clear, here is an example of a type from Solr's schema.xml: field name=weight type=float indexed=true stored=true/ Here, the type is float. I'm looking for the complete list of out-of-the-box types supported. Thanks Steve
Re: Complete list of field type that Solr supports
Hi Hoss, I'm confused. If type=float is just a symbolic name, how does Solr knows to index the data of field weight as float? What about for date per this example: field name=last_modified type=date indexed=true stored=true/ How does Solr applies date-range queries such as: last_modified:[NOW-1YEAR/DAY TO NOW/DAY+1DAY] I was always under the impression that there are primitive field-types but looks like that's not the case? Thanks Steve On Wed, Apr 22, 2015 at 12:59 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : To be clear, here is an example of a type from Solr's schema.xml: : : field name=weight type=float indexed=true stored=true/ : : Here, the type is float. I'm looking for the complete list of : out-of-the-box types supported. what you are asking about are just symbolic names that come from type/ definitions in the schema.xml -- there is no complete list. you can add any arbitrary type name=foo .../ you want to your schema, and now you've introduced a new type that solr supports. As far as the list of all FieldType *classes* that exist in solr out of the box (ie: the list of classes that can be specified in type/ declarations, that is a bit more straight forward... https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr -Hoss http://www.lucidworks.com/
Re: Boolean filter query not working as expected
1) https://lucidworks.com/blog/why-not-and-or-and-not/ 2) use debug=query to understand how your (filter) query is being parsed. : Date: Wed, 22 Apr 2015 14:56:22 + : From: Dhutia, Devansh ddhu...@gannett.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org solr-user@lucene.apache.org : Subject: Boolean filter query not working as expected : : I have an automated filter query builder that uses the SolrNet nuget package to build out boolean filters. I have a scenario where it is generating a fq in the following format: : : ((-(field:V1) AND -(field:V2)) AND -(field:V3)) : The filter looks legal to me (albeit with extra parentheses), but the above yields 0 total results, even though I know eligible data exists. : : If I manually re-write the above filter as : : (-(field:V1) AND -(field:V2) AND -(field:V3)) : I get the expected results. : : I realize the auto generated filter could be rewritten in a different way, but the question still remains, why is the first version not returning any results? : : Solr does not report any errors returns successfully, just with 0 results. : : Thanks : -Hoss http://www.lucidworks.com/
Re: Complete list of field type that Solr supports
I got it now. I have to start from fieldType/ to create my field/ list. If I want a list of supported field-types (used in my schema.xml), I have to look at the class attribute of fieldType/ to get that list. The out-of-the-box list of field-types is documented in the link you provided: https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr Thanks Steve On Wed, Apr 22, 2015 at 1:46 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I'm confused. If type=float is just a symbolic name, how does Solr knows : to index the data of field weight as float? What about for date per : this example: : : field name=last_modified type=date indexed=true stored=true/ : : How does Solr applies date-range queries such as: because somewhere else in your schema is a type/ declaration that defines type name=date ... using class=solr.TrieDateField you asked for the complete list of all possible values for the type attribute on a field/ -- the answer is infinite because the possible values for the type attribute on a field/ is dictated by whatever you might choose to specify as the name attribute on a type/ : I was always under the impression that there are primitive field-types but : looks like that's not the case? There are FieldType *classes* which can be configured a variety of ways in your schema.xml, and then reused by different fields -- but the *names* of those types is up to you. for example: the exact same TriDateField *class* can be configured in your schema.xml to implement 2 differnet *types* named date_foo and date_bar by using different default options (maybe one uses a non-default precisionStep and defaults to stored=true while the other uses the default precisionStep and defaults to stored=faluse) ... those two diff types can then both be used in your schema... field name=last_modified type=date_foo indexed=true / field name=pub_date type=date_bar indexed=true / ...and have different behavior. : : Thanks : : Steve : : On Wed, Apr 22, 2015 at 12:59 PM, Chris Hostetter hossman_luc...@fucit.org : wrote: : : : : To be clear, here is an example of a type from Solr's schema.xml: : : : : field name=weight type=float indexed=true stored=true/ : : : : Here, the type is float. I'm looking for the complete list of : : out-of-the-box types supported. : : what you are asking about are just symbolic names that come from type/ : definitions in the schema.xml -- there is no complete list. you can add : any arbitrary type name=foo .../ you want to your schema, and now : you've introduced a new type that solr supports. : : As far as the list of all FieldType *classes* that exist in solr out of : the box (ie: the list of classes that can be specified in type/ : declarations, that is a bit more straight forward... : : : https://cwiki.apache.org/confluence/display/solr/Field+Types+Included+with+Solr : : : : -Hoss : http://www.lucidworks.com/ : : -Hoss http://www.lucidworks.com/
Re: rq breaks wildcard search?
Awesome thanks! I was on 4.10.2 Ryan On Apr 22, 2015, at 16:44, Joel Bernstein joels...@gmail.com wrote: For your own implementation you'll need to implement the following methods: public Query rewrite(IndexReader reader) throws IOException public void extractTerms(SetTerm terms) You can review the 4.10.3 version of the ReRankQParserPlugin to see how it implements these methods. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 22, 2015 at 7:33 PM, Joel Bernstein joels...@gmail.com wrote: Just confirmed that wildcard queries work with Re-Ranking following SOLR-6323. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein joels...@gmail.com wrote: This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323. Solr 4.10.3 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal rjo...@gmail.com wrote: Using edismax, supplying a rq= param, like {!rerank ...} is causing an UnsupportedOperationException because the Query doesn't implement createWeight. This is for WildcardQuery in particular. From some preliminary debugging it looks like without rq, somehow the qf Queries might turn into ConstantScore instead of WildcardQuery. I don't think this is related to the RankQuery implementation as my own subclass has the same issue. Anyway the effect is that all q's containing ? or * return http 500 because I always have rq on. Can anyone confirm if this is a bug? I will log it in Jira if so. Also, does anyone know how I can work around it? Specifically, can I disable edismax from making WildcardQueries? Ryan
Re: rq breaks wildcard search?
Just confirmed that wildcard queries work with Re-Ranking following SOLR-6323. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein joels...@gmail.com wrote: This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323 . Solr 4.10.3 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal rjo...@gmail.com wrote: Using edismax, supplying a rq= param, like {!rerank ...} is causing an UnsupportedOperationException because the Query doesn't implement createWeight. This is for WildcardQuery in particular. From some preliminary debugging it looks like without rq, somehow the qf Queries might turn into ConstantScore instead of WildcardQuery. I don't think this is related to the RankQuery implementation as my own subclass has the same issue. Anyway the effect is that all q's containing ? or * return http 500 because I always have rq on. Can anyone confirm if this is a bug? I will log it in Jira if so. Also, does anyone know how I can work around it? Specifically, can I disable edismax from making WildcardQueries? Ryan
Re: rq breaks wildcard search?
This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323. Solr 4.10.3 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal rjo...@gmail.com wrote: Using edismax, supplying a rq= param, like {!rerank ...} is causing an UnsupportedOperationException because the Query doesn't implement createWeight. This is for WildcardQuery in particular. From some preliminary debugging it looks like without rq, somehow the qf Queries might turn into ConstantScore instead of WildcardQuery. I don't think this is related to the RankQuery implementation as my own subclass has the same issue. Anyway the effect is that all q's containing ? or * return http 500 because I always have rq on. Can anyone confirm if this is a bug? I will log it in Jira if so. Also, does anyone know how I can work around it? Specifically, can I disable edismax from making WildcardQueries? Ryan
Re: rq breaks wildcard search?
For your own implementation you'll need to implement the following methods: public Query rewrite(IndexReader reader) throws IOException public void extractTerms(SetTerm terms) You can review the 4.10.3 version of the ReRankQParserPlugin to see how it implements these methods. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 22, 2015 at 7:33 PM, Joel Bernstein joels...@gmail.com wrote: Just confirmed that wildcard queries work with Re-Ranking following SOLR-6323. Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 22, 2015 at 7:26 PM, Joel Bernstein joels...@gmail.com wrote: This should be resolved in https://issues.apache.org/jira/browse/SOLR-6323. Solr 4.10.3 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Apr 15, 2015 at 6:23 PM, Ryan Josal rjo...@gmail.com wrote: Using edismax, supplying a rq= param, like {!rerank ...} is causing an UnsupportedOperationException because the Query doesn't implement createWeight. This is for WildcardQuery in particular. From some preliminary debugging it looks like without rq, somehow the qf Queries might turn into ConstantScore instead of WildcardQuery. I don't think this is related to the RankQuery implementation as my own subclass has the same issue. Anyway the effect is that all q's containing ? or * return http 500 because I always have rq on. Can anyone confirm if this is a bug? I will log it in Jira if so. Also, does anyone know how I can work around it? Specifically, can I disable edismax from making WildcardQueries? Ryan
RE: Solr Index data lost
Just to close this thread – It looks like it’s working fine now. Not sure what mistake I had done last time. But now, the index data is still persistent on the pen drive even after server shutdown and restarting it on a different machine where the pen drive is plugged in. Thanks for all your help.. Regards Vijay From: Vijaya Narayana Reddy Bhoomi Reddy [mailto:vijaya.bhoomire...@whishworks.com] Sent: 21 April 2015 09:22 To: solr-user@lucene.apache.org Subject: Re: Solr Index data lost Shawn, Yes, I had used java -jar start.jar. I haven't tried moving it to a local hard disk, as I wanted to work on two machines (work and home). So was using a pen drive as the index storage. Yesterday, I did the complete indexing and then unplugged the drive from work machine and connected to my personal laptop. Data folder didn't exist. Erick, As per your earlier suggestion, I am using Tika and SolrJ to index the data (both binary and as well database content) and the same had been committed using the SolrJ UpdataRequest. I was able to see the data in the admin UI screen and even performed some searches on the index and it worked fine. Thanks Regards Vijay On 21 April 2015 at 00:42, Erick Erickson erickerick...@gmail.com mailto:erickerick...@gmail.com wrote: Did you commit before you unplugged the drive? Were you able to see data in the admin UI _before_ you unplugged the drive? Best, Erick On Mon, Apr 20, 2015 at 3:58 PM, Vijay Bhoomireddy vijaya.bhoomire...@whishworks.com mailto:vijaya.bhoomire...@whishworks.com wrote: Shawn, I haven’t changed any DirectoryFactory setting in the solrconfig.xml as I am using in a local setup and using the default configurations. Device has been unmounted successfully (confirmed through windows message in the lower right corner). I am using Solr-4.10.2. I simply run a Ctrl-C command in the windows Command prompt to stop Solr, in the same window where it was started earlier. Please correct me if something has been done not in the correct fashion. Thanks Regards Vijay -Original Message- From: Shawn Heisey [mailto:apa...@elyograg.org mailto:apa...@elyograg.org ] Sent: 20 April 2015 22:34 To: solr-user@lucene.apache.org mailto:solr-user@lucene.apache.org Subject: Re: Solr Index data lost On 4/20/2015 2:55 PM, Vijay Bhoomireddy wrote: I have configured Solr example server on a pen drive. I have indexed some content. The data directory was under example/solr/collection1/data which is the default one. After indexing, I stopped the Solr server and unplugged the pen drive and reconnected the same. Now, when I navigate to the SolrAdmin UI, I cannot see any data in the index. Any pointers please? In this case, though the installation was on a pen-drive, I think it shouldn't matter to Solr on where the data directory is. So I believe this data folder wiping has happened due to server shutdown. Will the data folder be wiped off if the server is restarted or stopped? How to save the index data between machine failures or planned maintenances? If you are using the default Directory implementation in your solrconfig.xml (NRTCachingDirectoryFactory for 4.x and later, MMapDirectoryFactory for newer 3.x versions), then everything should be persisted correctly. Did you properly unmount/eject the removable volume before you unplugged it? On a non-windows OS, you might also want to run the 'sync' command. If you didn't do the unmount/eject, you can't be sure that the filesystem was properly closed and fully up-to-date on the device. What version of Solr did you use and how exactly did you start Solr and the example? How did you stop Solr? Thanks, Shawn -- The contents of this e-mail are confidential and for the exclusive use of the intended recipient. If you receive this e-mail in error please delete it from your system immediately and notify us either by e-mail or telephone. You should not copy, forward or otherwise disclose the content of the e-mail. The views expressed in this communication may not necessarily be the view held by WHISHWORKS. -- The contents of this e-mail are confidential and for the exclusive use of the intended recipient. If you receive this e-mail in error please delete it from your system immediately and notify us either by e-mail or telephone. You should not copy, forward or otherwise disclose the content of the e-mail. The views expressed in this communication may not necessarily be the view held by WHISHWORKS.
TIKA OCR not working
Hi, I want to use solr to index some scanned document, after settings solr document with a two field content and filename, I tried to upload the attached file, but it seems that the content of the file is only \n \n \n. But if I used the tesseract from command line I got the result correctly. The log when solr receive my request: --- INFO - 2015-04-23 03:49:25.941; org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp=/solr path=/update/extract params={literal.groupid=2json.nl=flat resource.name=phplNiPrsliteral.id =4commit=trueextractOnly=falseliteral.historyid=4omitHeader=trueliteral.userid=3literal.createddate=2015-04-22T15:00:00Zfmap.content=contentwt=jsonliteral.filename=\\trunght\test\tesseract_3.png} The document when I check on solr admin page: - { groupid: 2, id: 4, historyid: 4, userid: 3, createddate: 2015-04-22T15:00:00Z, filename: trunght\\test\\tesseract_3.png, autocomplete_text: [ trunght\\test\\tesseract_3.png ], content: \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n , _version_: 1499213034586898400 } --- Since I am a solr newbie I do not know where to look, can anyone give me an advice for where to look for error or settings to make it work. Thanks in advanced. Trung.
AW: Odp.: solr issue with pdf forms
Thanks for your answer. Maybe my English is not good enough, what are you trying to say? Sorry I didn't get the point. :-( -Ursprüngliche Nachricht- Von: LAFK [mailto:tomasz.bo...@gmail.com] Gesendet: Mittwoch, 22. April 2015 14:01 An: solr-user@lucene.apache.org; solr-user@lucene.apache.org Betreff: Odp.: solr issue with pdf forms Out of my head I'd follow how are writable PDFs created and encoded. @LAFK_PL Oryginalna wiadomość Od: steve.sch...@t-systems.com Wysłano: środa, 22 kwietnia 2015 12:41 Do: solr-user@lucene.apache.org Odpowiedz: solr-user@lucene.apache.org Temat: solr issue with pdf forms Hi guys, hopefully you can help me with my issue. We are using a solr setup and have the following issue: - usual pdf files are indexed just fine - pdf files with writable form-fields look like this: Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind Somehow the blank space character is not indexed correctly. Is this a know issue? Does anybody have an idea? Thanks a lot Best Steve
Re: Bad contentType for search handler :text/xml; charset=UTF-8
On Wed, Apr 22, 2015 at 11:00 AM, didier deshommes dfdes...@gmail.com wrote: curl http://localhost:8983/solr/gettingstarted/select?wt=jsonindent=trueq=foundation; -H Content-type:application/json You're telling Solr the body encoding is JSON, but then you don't send any body. We could catch that error earlier perhaps, but it still looks like an error? -Yonik
phraseFreq vs sloppyFreq
Hi guys. I'm executing the following proximity query: leader the~1000. In the debugQuery I see phraseFreq=0.032258064. Is phraseFreq same thing as sloppyFreq from https://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/DefaultSimilarity.html ? Do higher phraserFreq increase the final similarity score? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
solr issue with pdf forms
Hi guys, hopefully you can help me with my issue. We are using a solr setup and have the following issue: - usual pdf files are indexed just fine - pdf files with writable form-fields look like this: Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind Somehow the blank space character is not indexed correctly. Is this a know issue? Does anybody have an idea? Thanks a lot Best Steve
Re: Bad contentType for search handler :text/xml; charset=UTF-8
Looks like Solarium hardcodes a default header Content-Type: text/xml; charset=utf-8 if none provided. Removing it solves the problem. It seems that Solr 5.1 doesn't support this content-type. -- View this message in context: http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201579.html Sent from the Solr - User mailing list archive at Nabble.com.
MLT causing Problems
Hello, I am working on a project in which i have to find similar documents. While I implementing the following error is occurring. Please let me know what to do. Exception in thread main org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8980/solr/rishi: Expected mime type application/octet-stream but got text/html. html head meta http-equiv=Content-Type content=text/html; charset=ISO-8859-1/ titleError 404 Not Found/title /head bodyh2HTTP ERROR 404/h2 pProblem accessing /solr/rishi/mlt. Reason: preNot Found/pre/phr /ismallPowered by Jetty:///small/ibr/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ br/ /body /html at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:525) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:233) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:225) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:135) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:943) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:958) at MoreLikeThis.main(MoreLikeThis.java:31)
Re: Suggester
Unfortunately, setting suggestAnalyzerFieldType to text_suggest didn’t change anything. The suggest dictionary is freshly built. As I mentioned before, only words or phrases of the source field „content“ are not matched. When querying the index, the response only contains „suggestions“ field data not coming from the „content“ field. The complete schema is a slightly modified techproducts schema. „Normal“ searching for words which I would expect coming from „content“ works. Any more ideas? Thanks Martin Am 21.04.2015 um 17:39 schrieb Erick Erickson erickerick...@gmail.com: Did you build your suggest dictionary after indexing? Kind of a shot in the dark but worth a try. Note that the suggest field of your suggester isn't using your text_suggest field type to make suggestions, it's using text_general. IOW, the text may not be analyzed as you expect. Best, Erick On Tue, Apr 21, 2015 at 7:16 AM, Martin Keller martin.kel...@unitedplanet.com wrote: Hello together, I have some problems with the Solr 5.1.0 suggester. I followed the instructions in https://cwiki.apache.org/confluence/display/solr/Suggester and also tried the techproducts example delivered with the binary package, which is working well. I added a field suggestions-Field to the schema: field name=suggestions type=text_suggest indexed=true stored=true multiValued=true“/ And added some copies to the field: copyField source=content dest=suggestions/ copyField source=title dest=suggestions/ copyField source=author dest=suggestions/ copyField source=description dest=suggestions/ copyField source=keywords dest=suggestions/ The field type definition for „text_suggest“ is pretty simple: fieldType name=text_suggest class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType I Also changed the solrconfig.xml to use the suggestions field: searchComponent class=solr.SuggestComponent name=suggest lst name=suggester str name=namemySuggester/str str name=lookupImplFuzzyLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldsuggestions/str str name=suggestAnalyzerFieldTypetext_general/str str name=buildOnStartupfalse/str /lst /searchComponent For Tokens original coming from „title or „author“, I get suggestions, but not any from the content field. So, what do I have to do? Any help is appreciated. Martin
Re: Bad contentType for search handler :text/xml; charset=UTF-8
Hello, I've got the same issue after an upgrade from Solr 5.0 to 5.1, even on GET requests. Actually i'm using PHP Solarium library to perform my requests. This is the error the library gets now, on a search handler. The request is transported with cUrl. What's weird is when I copy/paste the Url Solarium generates in my browser, I don't get the error. Maybe Solr 5.1 requires a new header which is automatically sent by the browser but not by cUrl. I'll investigate on this... Ben -- View this message in context: http://lucene.472066.n3.nabble.com/Bad-contentType-for-search-handler-text-xml-charset-UTF-8-tp4200314p4201564.html Sent from the Solr - User mailing list archive at Nabble.com.
Exception while using group with timeAllowed on SolrCloud
We have the same issue as this JIRA. https://issues.apache.org/jira/browse/SOLR-6156 I have posted my query, response and solr logs to the JIAR. Could anyone please take a look? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Exception-while-using-group-with-timeAllowed-on-SolrCloud-tp4201570.html Sent from the Solr - User mailing list archive at Nabble.com.
Odp.: phraseFreq vs sloppyFreq
Out of curiosity, why proximity 1k? @LAFK_PL Oryginalna wiadomość Od: Dmitry Kan Wysłano: środa, 22 kwietnia 2015 09:26 Do: solr-user@lucene.apache.org Odpowiedz: solr-user@lucene.apache.org Temat: phraseFreq vs sloppyFreq Hi guys. I'm executing the following proximity query: leader the~1000. In the debugQuery I see phraseFreq=0.032258064. Is phraseFreq same thing as sloppyFreq from https://lucene.apache.org/core/4_3_0/core/org/apache/lucene/search/similarities/DefaultSimilarity.html ? Do higher phraserFreq increase the final similarity score? -- Dmitry Kan Luke Toolbox: http://github.com/DmitryKey/luke Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan SemanticAnalyzer: www.semanticanalyzer.info
Odp.: solr issue with pdf forms
Out of my head I'd follow how are writable PDFs created and encoded. @LAFK_PL Oryginalna wiadomość Od: steve.sch...@t-systems.com Wysłano: środa, 22 kwietnia 2015 12:41 Do: solr-user@lucene.apache.org Odpowiedz: solr-user@lucene.apache.org Temat: solr issue with pdf forms Hi guys, hopefully you can help me with my issue. We are using a solr setup and have the following issue: - usual pdf files are indexed just fine - pdf files with writable form-fields look like this: Ich�bestätige�mit�meiner�Unterschrift,�dass�alle�Angaben�korrekt�und�vollständig�sind Somehow the blank space character is not indexed correctly. Is this a know issue? Does anybody have an idea? Thanks a lot Best Steve