Re: Missing slf4j jar in solr 1.4.0 distribution?
Thanks. I see. It seems that slf4j-nop-1.5.5.jar is the only jar file missing in solrj-lib, so I suggest that it should be included in the next release. Per Halvor -Opprinnelig melding- Fra: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sendt: 17. november 2009 20:51 Til: 'solr-user@lucene.apache.org' Emne: Re: Missing slf4j jar in solr 1.4.0 distribution? : I downloaded solr 1.4.0 but discovered when using solrj 1.4 that a : required slf4j jar was missing in the distribution (i.e. : apache-solr-1.4.0/dist). I got a java.lang.NoClassDefFoundError: : org/slf4j/impl/StaticLoggerBinder when using solrj ... : Have I overlooked something or are not all necessary classes required : for using solrj in solr 1.4.0 included in the distribution? Regretably, Solr releases aren't particularly consistent about where third-party libraries can be found. If you use the the pre-built war, the 'main' dependencies are allready bunlded into it. If you want to roll your own, you need to look at the ./lib directory -- ./dist is only *suppose* to contain the artifacts built from solr source But that solrj-lib directory can be confusing)... hoss...@brunner:apache-solr-1.4.0$ ls ./lib/slf4j-* lib/slf4j-api-1.5.5.jar lib/slf4j-jdk14-1.5.5.jar -Hoss
Ruby serialization with dismax
Hi, not sure this is something new in Solr 1.4, but I just noticed that facets results are serialized differently with standard and dismax when using wt=ruby. Standard returns: 'my_facet'={'20344'=1} Whereas dismax has: 'my_facet'=['20344',1] Admittedly this is not a big deal, it's easy to work around, but it still feels strange. Am I missing anything or is it a bug? In that case I'll file an issue. Bye, Andrea
HTMLStripCharFilterFactory does not replace #233;
Hello I indexed an html document with a decimal HTML Entity encodings: the character é (e with an acute accent) is encoded as #233; The exact content of the document is: htmlbody#231;a va m#233;m#233; ?/body/html A search for 'mémé' returns no document. If I put the line above in solr admin's analysis.jsp it also doesn't match mémé. There is only a match if I replace #233; by é . This is how I configured the fieldType: fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType I tried avoiding the problem by using the MappingCharFilterFactory: fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType I put the file mapping.txt in the conf directory. It contains just this: #233; = é This doesn't work either. How can I get this to work? (I am using solr 1.4.0) thank you Andréas Kündig World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using.
Index-time field boosting not working?
I have the following field configured in schema.xml: field name=title type=text indexed=true stored=true omitNorms=false boost=3.0 / Where text is the type which came with the Solr distribution. I have not been able to get this configuration to alter any document scores, and if I look at the indexes in Luke there is no change in the norms (compared to an un-boosted equivalent). I have confirmed that document boosting works (via SolrJ), but our field boosting needs to be done in the schema. Am I doing something wrong (BTW I have tried using 3.0f as well, no difference)? Also, I have seen no debug output during startup which would indicate that fild boosting is being configured - should there be any? I have found no usage examples of this in the Solr 1.4 book, except a vague discouragement - is this a deprecated feature? TIA, Ian Web design and intelligent Content Management. www.twitter.com/gossinteractive Registered Office: c/o Bishop Fleming, Cobourg House, Mayflower Street, Plymouth, PL1 1LG. Company Registration No: 3553908 This email contains proprietary information, some or all of which may be legally privileged. It is for the intended recipient only. If an addressing or transmission error has misdirected this email, please notify the author by replying to this email. If you are not the intended recipient you may not use, disclose, distribute, copy, print or rely on this email. Email transmission cannot be guaranteed to be secure or error free, as information may be intercepted, corrupted, lost, destroyed, arrive late or incomplete or contain viruses. This email and any files attached to it have been checked with virus detection software before transmission. You should nonetheless carry out your own virus check before opening any attachment. GOSS Interactive Ltd accepts no liability for any loss or damage that may be caused by software viruses.
Re: Ruby serialization with dismax
Andrea, I'd guess you have json.nl=arrarr set for your dismax handler (or request). Erik On Nov 18, 2009, at 12:01 PM, Andrea Campi wrote: Hi, not sure this is something new in Solr 1.4, but I just noticed that facets results are serialized differently with standard and dismax when using wt=ruby. Standard returns: 'my_facet'={'20344'=1} Whereas dismax has: 'my_facet'=['20344',1] Admittedly this is not a big deal, it's easy to work around, but it still feels strange. Am I missing anything or is it a bug? In that case I'll file an issue. Bye, Andrea
Re: Ruby serialization with dismax
Erik, Erik Hatcher wrote: Andrea, I'd guess you have json.nl=arrarr set for your dismax handler (or request). sigh, you're right, sorry for the noise :/ Andrea
VelocityResponseWriter/Solritas character encoding issue
Hi, I've played around with Solr's VelocityResponseWriter (which is indeed a very useful feature for rapid prototyping). I've realized that Velocity uses ISO-8859-1 as default character encoding. I've changed this setting to UTF-8 in my velocity.properties file (inside the conf directory), i.e., input.encoding=UTF-8 output.encoding=UTF-8 and checked that the settings were successfully loaded. Within the main Velocity template, browse.vm, the character encoding is set to UTF-8 as well, i.e., meta http-equiv=content-type content=text/html; charset=UTF-8/ After starting Solr (which is deployed in a Tomcat 6 server on a Ubuntu machine), I ran into some character encoding problems. Due to the change of input.encoding to UTF-8, no problems occur when non-ASCII characters are presend in the query string, e.g. german umlauts. But unfortunately, something is wrong with the encoding of characters in the html page that is generated by VelocityResponseWriter. The non-ASCII characters aren't displayed properly (for example, FF prints a black diamond with a white question mark). If I manually set the encoding to ISO-8859-1, the non-ASCII characters are displayed correctly. Does anybody have a clue? Thanks in advance, Sascha
Re: VelocityResponseWriter/Solritas character encoding issue
Hi Erik, Erik Hatcher wrote: Can you give me a test document that causes an issue? (maybe send me a Solr XML document in private e-mail). I'll see what I can do once I can see the issue first hand. Thank you! Just try the utf8-example.xml file in the exampledoc directory. After having indexed the document, the output of the script test_utf8.sh suggests to me that everything works correctly: Solr server is up. HTTP GET is accepting UTF-8 HTTP POST is accepting UTF-8 HTTP POST does not default to UTF-8 HTTP GET is accepting UTF-8 beyond the basic multilingual plane HTTP POST is accepting UTF-8 beyond the basic multilingual plane HTTP POST + URL params is accepting UTF-8 beyond the basic multilingual If I'm using the standard QueryResponseWriter and the query q=umlauts, the responding xml page contains properly printed non-ASCII characters. The same query against the VelocityResponseWriter returns a lot of Unicode replacement characters (u+FFFD) instead. -Sascha On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote: Hi, I've played around with Solr's VelocityResponseWriter (which is indeed a very useful feature for rapid prototyping). I've realized that Velocity uses ISO-8859-1 as default character encoding. I've changed this setting to UTF-8 in my velocity.properties file (inside the conf directory), i.e., input.encoding=UTF-8 output.encoding=UTF-8 and checked that the settings were successfully loaded. Within the main Velocity template, browse.vm, the character encoding is set to UTF-8 as well, i.e., meta http-equiv=content-type content=text/html; charset=UTF-8/ After starting Solr (which is deployed in a Tomcat 6 server on a Ubuntu machine), I ran into some character encoding problems. Due to the change of input.encoding to UTF-8, no problems occur when non-ASCII characters are presend in the query string, e.g. german umlauts. But unfortunately, something is wrong with the encoding of characters in the html page that is generated by VelocityResponseWriter. The non-ASCII characters aren't displayed properly (for example, FF prints a black diamond with a white question mark). If I manually set the encoding to ISO-8859-1, the non-ASCII characters are displayed correctly. Does anybody have a clue? Thanks in advance, Sascha
Re: Missing slf4j jar in solr 1.4.0 distribution?
Solr includes slf4j-jdk14-1.5.5.jar, if you want to use the nop (or log4j, or loopback) impl you will need to include that in your own project. Solr uses slf4j so that each user can decide their logging implementation, it includes the jdk version so that something works off-the-shelf, but if you want more control, then you can switch in whatever you want. ryan On Nov 18, 2009, at 1:22 AM, Per Halvor Tryggeseth wrote: Thanks. I see. It seems that slf4j-nop-1.5.5.jar is the only jar file missing in solrj-lib, so I suggest that it should be included in the next release. Per Halvor -Opprinnelig melding- Fra: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sendt: 17. november 2009 20:51 Til: 'solr-user@lucene.apache.org' Emne: Re: Missing slf4j jar in solr 1.4.0 distribution? : I downloaded solr 1.4.0 but discovered when using solrj 1.4 that a : required slf4j jar was missing in the distribution (i.e. : apache-solr-1.4.0/dist). I got a java.lang.NoClassDefFoundError: : org/slf4j/impl/StaticLoggerBinder when using solrj ... : Have I overlooked something or are not all necessary classes required : for using solrj in solr 1.4.0 included in the distribution? Regretably, Solr releases aren't particularly consistent about where third-party libraries can be found. If you use the the pre-built war, the 'main' dependencies are allready bunlded into it. If you want to roll your own, you need to look at the ./lib directory -- ./dist is only *suppose* to contain the artifacts built from solr source But that solrj-lib directory can be confusing)... hoss...@brunner:apache-solr-1.4.0$ ls ./lib/slf4j-* lib/slf4j-api-1.5.5.jar lib/slf4j-jdk14-1.5.5.jar -Hoss
Re: initiate reindexing in solr for field type changes
Thanks So going by you reply, can i assume that if there is a configuration change to my schema I have to again index documents, There is no short cut of updating the index. Because we cant afford to index 2 million documents again and again. There should be some utility or command line which does this things in the background. i hope i make sense. darniz -- View this message in context: http://old.nabble.com/initiate-reindexing-in-solr-for-field-type-changes-tp26397067p26413172.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: UTF-8 Character Set not specifed on OutputStreamWriter in StreamingUpdateSolrServer
Specifying the file.encoding did work, although I don't think it is a suitable workaround for my use case. Any idea what my next step is to having a bug opened. Thanks, Joe Date: Wed, 18 Nov 2009 16:15:55 +0530 Subject: Re: UTF-8 Character Set not specifed on OutputStreamWriter in StreamingUpdateSolrServer From: shalinman...@gmail.com To: solr-user@lucene.apache.org On Wed, Nov 18, 2009 at 6:56 AM, Joe Kessel isjust...@hotmail.com wrote: While trying to make use of the StreamingUpdateSolrServer for updates with the release code for Solr.14 I noticed some characters such as é did not show up in the index correctly. The code should set the CharsetName via the constructor of the OutputStreamWriter. I noticed that the CommonsHttpSolrServer seems to set the charset to UTF-8. As a workaround I am able to use the CommonsHttpSolrServer. Being new to Solr, not sure what the bug protocol is, assuming this is a bug. I wrote a simple test case and I'm able to index and query 'é' and other characters using StreamingUpdateSolrServer. Can you use -Dfile.encoding=UTF8 as a JVM parameter and see if that fixes your case. If it does, then it may be a Solr bug. -- Regards, Shalin Shekhar Mangar. _ Hotmail: Trusted email with powerful SPAM protection. http://clk.atdmt.com/GBL/go/177141665/direct/01/
Re: VelocityResponseWriter/Solritas character encoding issue
What platform are you using? Windows does not use UTF-8 by default, and this can cause subtle problems. If you can do the same thing on other platforms (Linux, Mac) that would help narrow down the problem. On Wed, Nov 18, 2009 at 8:15 AM, Sascha Szott sz...@zib.de wrote: Hi Erik, Erik Hatcher wrote: Can you give me a test document that causes an issue? (maybe send me a Solr XML document in private e-mail). I'll see what I can do once I can see the issue first hand. Thank you! Just try the utf8-example.xml file in the exampledoc directory. After having indexed the document, the output of the script test_utf8.sh suggests to me that everything works correctly: Solr server is up. HTTP GET is accepting UTF-8 HTTP POST is accepting UTF-8 HTTP POST does not default to UTF-8 HTTP GET is accepting UTF-8 beyond the basic multilingual plane HTTP POST is accepting UTF-8 beyond the basic multilingual plane HTTP POST + URL params is accepting UTF-8 beyond the basic multilingual If I'm using the standard QueryResponseWriter and the query q=umlauts, the responding xml page contains properly printed non-ASCII characters. The same query against the VelocityResponseWriter returns a lot of Unicode replacement characters (u+FFFD) instead. -Sascha On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote: Hi, I've played around with Solr's VelocityResponseWriter (which is indeed a very useful feature for rapid prototyping). I've realized that Velocity uses ISO-8859-1 as default character encoding. I've changed this setting to UTF-8 in my velocity.properties file (inside the conf directory), i.e., input.encoding=UTF-8 output.encoding=UTF-8 and checked that the settings were successfully loaded. Within the main Velocity template, browse.vm, the character encoding is set to UTF-8 as well, i.e., meta http-equiv=content-type content=text/html; charset=UTF-8/ After starting Solr (which is deployed in a Tomcat 6 server on a Ubuntu machine), I ran into some character encoding problems. Due to the change of input.encoding to UTF-8, no problems occur when non-ASCII characters are presend in the query string, e.g. german umlauts. But unfortunately, something is wrong with the encoding of characters in the html page that is generated by VelocityResponseWriter. The non-ASCII characters aren't displayed properly (for example, FF prints a black diamond with a white question mark). If I manually set the encoding to ISO-8859-1, the non-ASCII characters are displayed correctly. Does anybody have a clue? Thanks in advance, Sascha -- Lance Norskog goks...@gmail.com
Re: Missing slf4j jar in solr 1.4.0 distribution?
Ah, thanks for the tip about switching out the jdk jar with the log4j jar. I think I was running into this issue and couldn't figure out why Solr logging couldn't be configured when running inside Hadoop which uses log4j, maybe this was the issue? On Wed, Nov 18, 2009 at 9:11 AM, Ryan McKinley ryan...@gmail.com wrote: Solr includes slf4j-jdk14-1.5.5.jar, if you want to use the nop (or log4j, or loopback) impl you will need to include that in your own project. Solr uses slf4j so that each user can decide their logging implementation, it includes the jdk version so that something works off-the-shelf, but if you want more control, then you can switch in whatever you want. ryan On Nov 18, 2009, at 1:22 AM, Per Halvor Tryggeseth wrote: Thanks. I see. It seems that slf4j-nop-1.5.5.jar is the only jar file missing in solrj-lib, so I suggest that it should be included in the next release. Per Halvor -Opprinnelig melding- Fra: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sendt: 17. november 2009 20:51 Til: 'solr-user@lucene.apache.org' Emne: Re: Missing slf4j jar in solr 1.4.0 distribution? : I downloaded solr 1.4.0 but discovered when using solrj 1.4 that a : required slf4j jar was missing in the distribution (i.e. : apache-solr-1.4.0/dist). I got a java.lang.NoClassDefFoundError: : org/slf4j/impl/StaticLoggerBinder when using solrj ... : Have I overlooked something or are not all necessary classes required : for using solrj in solr 1.4.0 included in the distribution? Regretably, Solr releases aren't particularly consistent about where third-party libraries can be found. If you use the the pre-built war, the 'main' dependencies are allready bunlded into it. If you want to roll your own, you need to look at the ./lib directory -- ./dist is only *suppose* to contain the artifacts built from solr source But that solrj-lib directory can be confusing)... hoss...@brunner:apache-solr-1.4.0$ ls ./lib/slf4j-* lib/slf4j-api-1.5.5.jar lib/slf4j-jdk14-1.5.5.jar -Hoss
Re: initiate reindexing in solr for field type changes
Darniz, Yes, if there is an incompatible schema change, you need to reindex your documents. Otis P.S. Please include the copy of the response when replying, so the context/background of your question is easy to figure out. -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: darniz rnizamud...@edmunds.com To: solr-user@lucene.apache.org Sent: Wed, November 18, 2009 1:30:06 PM Subject: Re: initiate reindexing in solr for field type changes Thanks So going by you reply, can i assume that if there is a configuration change to my schema I have to again index documents, There is no short cut of updating the index. Because we cant afford to index 2 million documents again and again. There should be some utility or command line which does this things in the background. i hope i make sense. darniz -- View this message in context: http://old.nabble.com/initiate-reindexing-in-solr-for-field-type-changes-tp26397067p26413172.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: HTMLStripCharFilterFactory does not replace #233;
Your first definition of text_fr seems to be correct and should work as expected. I tested it and worked fine (mémé was highlighted). What was the output of HTMLStripCharFilterFactory in analysis.jsp? In my analysis.jsp, I got ça va mémé ?. Koji Kundig, Andreas wrote: Hello I indexed an html document with a decimal HTML Entity encodings: the character é (e with an acute accent) is encoded as #233; The exact content of the document is: htmlbody#231;a va m#233;m#233; ?/body/html A search for 'mémé' returns no document. If I put the line above in solr admin's analysis.jsp it also doesn't match mémé. There is only a match if I replace #233; by é . This is how I configured the fieldType: fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType I tried avoiding the problem by using the MappingCharFilterFactory: fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping.txt/ charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldType I put the file mapping.txt in the conf directory. It contains just this: #233; = é This doesn't work either. How can I get this to work? (I am using solr 1.4.0) thank you Andréas Kündig World Intellectual Property Organization Disclaimer: This electronic message may contain privileged, confidential and copyright protected information. If you have received this e-mail by mistake, please immediately notify the sender and delete this e-mail and all its attachments. Please ensure all e-mail attachments are scanned for viruses prior to opening or using. -- http://www.rondhuit.com/en/
Re: Disable coord
Thanks for your reply. Nested boolean queries is a valid concern. I also realized that isCoordDisabled needs to be considered in BooleanQuery.hashCode so that a query with coord=false will have a different cache key in Solr. On Thu, Nov 12, 2009 at 12:12 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I want to disable coord for certain queries. For example, if I pass a URL : parameter like disableCoord to Solr, the BooleanQuery generated will have : coord disabled. If it's not currently supported, what would be a good way : to implement it? in order to have something like this on a per query basis it needs to be handled by the query parsers. the Lucene QueryParser doesn't provide any syntax markup to do this, so you would have to add your own -- you could subclass the LuceneQParserPlugin and just have it *always* ignore the cord if some query param coord=false was set, but you'd have to be careful about wether that's really what you want in a deeply nested set of boolean queries -- ie: (A +B -C +(D E F G H) ((X Y Z) (L M (N O P ... what if you only want to disable the coord on the (X Y Z) boolean query? : : Thanks, : Guangwei : -Hoss