Near-Realtime-Search, CommitWithin and AtomicUpdates
Hi@all, I'm using Solr 6.6 and trying to validate my setup for AtomicUpdates and Near-Realtime-Search. Some questions are bogging my mind, so maybe someone can give me a hint to make things clearer. I am posting regular updates to a collection using the UpdateHandler and Solr Command Syntax, including updates and deletes. These changes are commited using the commitWithin configuration every 30 seconds. Now I want to use AtomicUpdates on MultiValue'd fields, so I post the "add" commands for these fields only. Sometimes I have to post multiple Solr commands affecting the same document, but within the same commitWithin interval. The question is now, what is the final new value of the field after the atomic update add operations? From my point of view the final value should be the old value plus the newly added values, which is commited to the index in the next commitWithin period. So can I combine multiple AtomicUpdate commands affecting the same document within the same commitWithin interval? Another thing that is bogging me: can I combine multiple AtomicUpdates for the same document with CopyFields? Does Solr use some kind of dirty-read or pending uncommited changes to get the right value of the source field, or is the source always the last commited value? So in summary, does Solr AtomicUpdates use some kind of dirty-read mechanism do do its "magic" ? Thanks in advance, Mirko
Re: how to store _text field
Hi guys, I used the Erick's suggestions (thanks again!!) to create a new field and copy in it the _text content. curl -X POST -H 'Content-type:application/json' --data-binary '{ add-field : { name:content, type:string, indexed:true, stored:true}, add-copy-field : { source:_text, dest: [ content]}}' http://localhost:8983/solr/Test/schema That seems a good way but I discovered the presence of bias in every content field. Indeed, they start with a string of this kind: \n \n stream_content_type text/plain \n stream_size 1556 \n Content-Encoding UTF-8 \n X-Parsed-By org.apache.tika.parser.DefaultParser \n X-Parsed-By org.apache.tika.parser.txt.TXTParser \n Content-Type text/plain; charset=UTF-8 \n resourceName /home/mirko/Desktop/data sample/sample1/TEXT_CRE_20110608_3-114-500.txt Now I need to cut off this part but I have no idea also because the path (present in the last part) has a dynamic length. For someone could be a problem to have two field with the same content (double space needed). I have not this problem because I use Solrj to import, modify and export each document. Maybe I could use it to do also this but hopefully you know a cleaner method. Cheers, Mirko Mirko On 19 March 2015 at 20:11, Erick Erickson erickerick...@gmail.com wrote: Hmm, not all that sure. That's one thing about schemaless indexing, it has to guess. It does the best it can, but it's quite possible that it guesses wrong. If this is a mananged schema, you can use the REST API commands to make whatever field you want. Or you can start over with a concrete schema.xml and use _that_. Otherwise, I'm not sure what to say without actually being on your system. Wish I could help more. Erick On Thu, Mar 19, 2015 at 5:39 AM, Mirko Torrisi mirko.torr...@ucdconnect.ie wrote: Hi Erick, I'm sorry for this delay but I've just seen this reply. I'm using the last version of solr and the default setting is to use the new kind of indexing, it doesn't use schema.xml and for that I have no idea about how set store for this field. The content is grabbed because I've obtained results using the search function but it is not showed because it is not setted to store. I hope to be clear. Thanks very much. All the best, Mirko On 14/03/15 17:58, Erick Erickson wrote: Right, your schema.xml file will define, perhaps, some dynamic fields. First insure that stored=true is specified. If you change this, you have to re-index the docs. Second, insure that your fl parameter with the field is specified on the requests, something like q=*:*fl=eoe_txt. Third, insure that you are actually sending content to that field when you index docs. If none of this helps, show us the definition from schema.xml and a sample input document and a query that illustrate the problem please. Best, Erick On Fri, Mar 13, 2015 at 1:20 AM, Mirko Torrisi mirko.torr...@ucdconnect.ie wrote: Hi Alexandre, I need to visualize the content of _txt. For some reasons, actual it is not showed in the results (the response). I guess that it doesn't happen because it isn't stored (for some default setting that I'd like to change). Thanks for your help, Mirko On 13/03/15 00:27, Alexandre Rafalovitch wrote: Wait, step back. This is confusing. What's your real problem you are trying to solve? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 19:50, Mirko Torrisi mirko.torr...@ucdconnect.ie wrote: Hi folks, I googled and tried without success so I ask you: how can I modify the setting of a field to store it ? It is interesting to note that I did not add _text field so I guess it is a default one. Maybe it is normal that it is not showed on the result but actually this is my real problem. It could be grand also to copy it in a new field but I do not know how to do it with the last Solr (5) and the new kind of schema. I know that I have to use curl but I do not know how to use it to copy a field. Thank you in advance! Cheers, Mirko
Addtion to solr wiki editor list
Hi there! I'd like to be added to the list of people who are able to edit the solr wiki at https://wiki.apache.org/solr. I'm working as a Java developer for a german company using Solr (and like it a lot) a lot and I would like to be able to correct things as soon as I find them without going to the IRC-channel to get things changed. My wiki name should be campfire. Thanks in advance
Re: how to store _text field
Hi Erick, I'm sorry for this delay but I've just seen this reply. I'm using the last version of solr and the default setting is to use the new kind of indexing, it doesn't use schema.xml and for that I have no idea about how set store for this field. The content is grabbed because I've obtained results using the search function but it is not showed because it is not setted to store. I hope to be clear. Thanks very much. All the best, Mirko On 14/03/15 17:58, Erick Erickson wrote: Right, your schema.xml file will define, perhaps, some dynamic fields. First insure that stored=true is specified. If you change this, you have to re-index the docs. Second, insure that your fl parameter with the field is specified on the requests, something like q=*:*fl=eoe_txt. Third, insure that you are actually sending content to that field when you index docs. If none of this helps, show us the definition from schema.xml and a sample input document and a query that illustrate the problem please. Best, Erick On Fri, Mar 13, 2015 at 1:20 AM, Mirko Torrisi mirko.torr...@ucdconnect.ie wrote: Hi Alexandre, I need to visualize the content of _txt. For some reasons, actual it is not showed in the results (the response). I guess that it doesn't happen because it isn't stored (for some default setting that I'd like to change). Thanks for your help, Mirko On 13/03/15 00:27, Alexandre Rafalovitch wrote: Wait, step back. This is confusing. What's your real problem you are trying to solve? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 19:50, Mirko Torrisi mirko.torr...@ucdconnect.ie wrote: Hi folks, I googled and tried without success so I ask you: how can I modify the setting of a field to store it ? It is interesting to note that I did not add _text field so I guess it is a default one. Maybe it is normal that it is not showed on the result but actually this is my real problem. It could be grand also to copy it in a new field but I do not know how to do it with the last Solr (5) and the new kind of schema. I know that I have to use curl but I do not know how to use it to copy a field. Thank you in advance! Cheers, Mirko
Re: how to store _text field
Hi Alexandre, I need to visualize the content of _txt. For some reasons, actual it is not showed in the results (the response). I guess that it doesn't happen because it isn't stored (for some default setting that I'd like to change). Thanks for your help, Mirko On 13/03/15 00:27, Alexandre Rafalovitch wrote: Wait, step back. This is confusing. What's your real problem you are trying to solve? Regards, Alex. Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: http://www.solr-start.com/ On 12 March 2015 at 19:50, Mirko Torrisi mirko.torr...@ucdconnect.ie wrote: Hi folks, I googled and tried without success so I ask you: how can I modify the setting of a field to store it ? It is interesting to note that I did not add _text field so I guess it is a default one. Maybe it is normal that it is not showed on the result but actually this is my real problem. It could be grand also to copy it in a new field but I do not know how to do it with the last Solr (5) and the new kind of schema. I know that I have to use curl but I do not know how to use it to copy a field. Thank you in advance! Cheers, Mirko
Re: Invalid Date String:'1992-07-10T17'
Thanks very much for each of your replies. These resolved my problem and teach me something important. I have just discovered that I have another problem but I guess that I have to open another discussion. Cheers, Mirko On 10/03/15 20:30, Chris Hostetter wrote: : is a syntactically significant character to the query parser, so it's getting confused by it in the text of your query. you're seeing the same problem as if you tried to search for foo:bar in the yak field using q=yak:foo:bar you either need to backslash escape the : characters, or wrap the date in quotes, or use a diff parser that doesn't treat colons as special characters (but remember that since you are building this up as a java string, you have to deal with *java* string escaping as well... String a = speechDate:1992-07-10T17\\:33\\:18Z; String a = speechDate:\1992-07-10T17:33:18Z\; String a = speechDate: + ClientUtils.escapeQueryChars(1992-07-10T17:33:18Z); String a = {!field f=speechDate}1992-07-10T17:33:18Z; : My goal is to group these speeches (hopefully using date math syntax). I would Unless you are truely seraching for only documents that have an *exact* date value matching your input (down to the millisecond) then seraching or a single date value is almost certainly not what you want -- you most likely want to do a range search... String a = speechDate:[1992-07-10T00:00:00Z TO 1992-07-11T00:00:00Z]; (which doesn't require special escaping, because the query parser is smart enough to know that : aren't special inside of the [..]) : like to know if you suggest me to use date or tdate or other because I have : not understood the difference. the difference between date and tdate has to do with how you wnat to trade index size (on disk in ram) with search speed for range queries like these -- tdate takes up a little more room in the index, but came make range queries faster. -Hoss http://www.lucidworks.com/
Invalid Date String:'1992-07-10T17'
Hi all, I am very new with Solr (and Lucene) and I use the last version of it. I do not understand why I obtain this: Exception in thread main org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/Collection1: Invalid Date String:'1992-07-10T17' at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:558) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:214) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:210) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:91) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:302) at Update.main(Update.java:18) Here the code that creates this error: SolrQuery query = new SolrQuery(); String a = speechDate:1992-07-10T17:33:18Z; query.set(fq, a); //query.setQuery( a ); -- I also tried using this one. According to https://cwiki.apache.org/confluence/display/solr/Working+with+Dates, it should be right. I tried with others date, or just |-MM-DD, with no success. My goal is to group these speeches (hopefully using date math syntax). I would like to know if you suggest me to use date or tdate or other because I have not understood the difference. Thanks in advance,| Mirko||
Create field date using name file
Hi folks, Hopefully this is an easy question but I couldn't do it after several hours.. I created a new field (adding field name=date type=date indexed=true stored=true/) and I'd like to use name file value to fill out it. The name files are like: TEXT_CRE_MMGG_X-XXX-XXX.txt or TEXT_CRE_MMGG_X-XXX.txt (where every X are random numbers). I'd like to use a date field type to be able to use some group functions. Thank in advance. Have a nice week, Mirko
Re: Create field date using name file
I forgot to add that the txt files are divided in directory following this rule: //MM/**files**. Regards, Mirko
Solr Suggester ranked by boost
I want to implement a Solr Suggester (http://wiki.apache.org/solr/Suggester) that ranks suggestions by document boost factor. As I understand the documentation, the following config should work: Solrconfig.xml: ... requestHandler name=/suggest class=solr.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.count7/str str name=spellcheck.onlyMorePopulartrue/str /lst arr name=last-components strsuggest/str /arr /requestHandler searchComponent class=solr.SpellCheckComponent name=suggest lst name=spellchecker str name=namedefault/str str name=fieldsuggesttext/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.WFSTLookupFactory/str str name=buildOnCommittrue/str /lst /searchComponent ... Schema.xml: ... field name=suggesttext type=text indexed=true stored=true multiValued=true/ ... fieldType name=text class=solr.TextField omitNorms=false/ ... I added three documents with a document boost: { add: { commitWithin: 5000, overwrite: true, boost: 3.0, doc: { id: 1, suggesttext: text bb } }, add: { commitWithin: 5000, overwrite: true, boost: 2.0, doc: { id: 2, suggesttext: text cc } }, add: { commitWithin: 5000, overwrite: true, boost: 1.0, doc: { id: 3, suggesttext: text aa } } } A query the suggest handler (with spellcheck.q=te) gives the following response: { responseHeader:{ status:0, QTime:6}, command:build, response:{numFound:3,start:0,docs:[ { id:1, suggesttext:[text bb]}, { id:2, suggesttext:[text cc]}, { id:3, suggesttext:[text aa]}] }, spellcheck:{ suggestions:[ te,{ numFound:3, startOffset:0, endOffset:2, suggestion:[text aa, text bb, text cc]}]}} The search results are ranked by boost as expected. However, the suggestions are not ranked by boost (but alphabetically instead). I also tried the TSTLookup and FSTLookup lookup implementations with the same result. Any ideas what I'm missing? Thanks, Mirko
Re: Automatically build spellcheck dictionary on replicas
Ok, thanks for pointing that out! 2013/12/3 Kydryavtsev Andrey werde...@yandex.ru Yep, sorry, it doesn't work for file-based dictionaries: In particular, you still need to index the dictionary file once by issuing a search with spellcheck.build=true on the end of the URL; if you system doesn't update that dictionary file, then this only needs to be done once. This manual step may be required even if your configuration sets build=true and reload=true. http://wiki.apache.org/solr/FileBasedSpellChecker 03.12.2013, 21:27, Mirko idonthaveenoughinformat...@googlemail.com: Yes, I have that, but it doesn't help. It seems Solr still needs the query with the spellcheck.build parameter to build the spellchecker index. 2013/12/3 Kydryavtsev Andrey werde...@yandex.ru Did you try to add str name=buildOnCommittrue/str parameter to your slave's spellcheck configuration? 03.12.2013, 12:04, Mirko idonthaveenoughinformat...@googlemail.com : Hi all, We use a Solr SpellcheckComponent with a file-based dictionary. We run a master and some replica slave servers. To update the dictionary, we copy the dictionary txt file to the master, from where it is automatically replicated to all slaves. However, it seems we need to run the spellcheck.build query on all servers individually. Is there a way to automatically build the spellcheck dictionary on all servers without calling spellcheck.build on all slaves individually? We use Solr 4.0.0 Thanks, Mirko
Re: Automatically build spellcheck dictionary on replicas
Yes, I have that, but it doesn't help. It seems Solr still needs the query with the spellcheck.build parameter to build the spellchecker index. 2013/12/3 Kydryavtsev Andrey werde...@yandex.ru Did you try to add str name=buildOnCommittrue/str parameter to your slave's spellcheck configuration? 03.12.2013, 12:04, Mirko idonthaveenoughinformat...@googlemail.com: Hi all, We use a Solr SpellcheckComponent with a file-based dictionary. We run a master and some replica slave servers. To update the dictionary, we copy the dictionary txt file to the master, from where it is automatically replicated to all slaves. However, it seems we need to run the spellcheck.build query on all servers individually. Is there a way to automatically build the spellcheck dictionary on all servers without calling spellcheck.build on all slaves individually? We use Solr 4.0.0 Thanks, Mirko
Re: Parse eDisMax queries for keywords
Hi Jack, thanks for your reply. Ok in this case I agree that enriching the query in the application layer is a good idea. We are still a bit puzzled how the enriched query should look like. I'll post here when we found a solution. If somebody has suggestions, I'd be happy to hear them. Mirko 2013/11/21 Jack Krupansky j...@basetechnology.com The query parser does its own tokenization and parsing before your analyzer tokenizer and filters are called, assuring that only one white space-delimited token is analyzed at a time. You're probably best off having an application layer preprocessor for the query that enriches the query in the manner that you're describing. Or, simply settle for a heuristic approach that may give you 70% of what you want using only existing Solr features on the server side. -- Jack Krupansky -Original Message- From: Mirko Sent: Thursday, November 21, 2013 5:30 AM To: solr-user@lucene.apache.org Subject: Parse eDisMax queries for keywords Hi, We would like to implement special handling for queries that contain certain keywords. Our particular use case: In the example query Footitle season 1 we want to discover the keywords season , get the subsequent number, and boost (or filter for) documents that match 1 on field name=season. We have two fields in our schema: !-- titles contains titles -- field name=title type=text indexed=true stored=true multiValued=false/ fieldType name=text class=solr.TextField omitNorms=true analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- ... -- /analyzer /fieldType field name=season type=season_number indexed=true stored=false multiValued=false/ !-- season contains season numbers -- fieldType name=season_number class=solr.TextField omitNorms=true analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season) *0*([0-9]+).* replacement=$1/ /analyzer /fieldType Our idea was to use a Keyword tokenizer and a Regex on the season field to extract the season number from the complete query. However, we use a ExtendedDisMax query parser in our search handler: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qf title season /str /lst /requestHandler The problem is that the eDisMax tokenizes the query, so that our field season receives the tokens [Foo, season, 1] without any order, instead of the complete query. How can we pass the complete query (untokenized) to the season field? We don't understand which tokenizer is used here and why our season field received tokens instead of the complete query. Or is there another approach to solve this use case with Solr? Thanks, Mirko
Re: Suggester - how to return exact match?
Thanks! We solved this issue in the front-end now. I.e. we add the exact match to the list of suggestions there. Mirko 2013/11/22 Developer bbar...@gmail.com Might not be a perfect solution but you can use edgengram filter and copy all your field data to that field and use it for suggestion. fieldType name=text_autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=250 / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType http://localhost:8983/solr/core1/select?q=name:iphone The above query will return iphone iphone5c iphone4g -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-how-to-return-exact-match-tp4102203p4102521.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Suggester - how to return exact match?
Hi, I'd like to clarify our use case a bit more. We want to return the exact search query as a suggestion only if it is present in the index. So in my example we would expect to get the suggestion foo for the query foo but no suggestion abc for the query abc (because abc is not in the dictionary). For me this use case seems quite common. Say, we have three products in our store: foo, foo 1, foo 2. If the user types foo in the product search, we want to suggest all our products in the dropdown. Is this something we can do with the Solr suggester? Mirko 2013/11/20 Developer bbar...@gmail.com May be there is a way to do this but it doesn't make sense to return the same search query as a suggestion (Search query is not a suggestion as it might or might not be present in the index). AFAIK you can use various look up algorithm to get the suggestion list and they lookup the terms based on the query value (some alogrithm implements fuzzy logic too). so searching Foo will return FooBar, Foo2 but not foo. You should fetch the suggestion only if the numfound is greater than 0 else you don't have any suggestion. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-how-to-return-exact-match-tp4102203p4102259.html Sent from the Solr - User mailing list archive at Nabble.com.
Parse eDisMax queries for keywords
Hi, We would like to implement special handling for queries that contain certain keywords. Our particular use case: In the example query Footitle season 1 we want to discover the keywords season , get the subsequent number, and boost (or filter for) documents that match 1 on field name=season. We have two fields in our schema: !-- titles contains titles -- field name=title type=text indexed=true stored=true multiValued=false/ fieldType name=text class=solr.TextField omitNorms=true analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ !-- ... -- /analyzer /fieldType field name=season type=season_number indexed=true stored=false multiValued=false/ !-- season contains season numbers -- fieldType name=season_number class=solr.TextField omitNorms=true analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season) *0*([0-9]+).* replacement=$1/ /analyzer /fieldType Our idea was to use a Keyword tokenizer and a Regex on the season field to extract the season number from the complete query. However, we use a ExtendedDisMax query parser in our search handler: requestHandler name=/select class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qf title season /str /lst /requestHandler The problem is that the eDisMax tokenizes the query, so that our field season receives the tokens [Foo, season, 1] without any order, instead of the complete query. How can we pass the complete query (untokenized) to the season field? We don't understand which tokenizer is used here and why our season field received tokens instead of the complete query. Or is there another approach to solve this use case with Solr? Thanks, Mirko
Suggester - how to return exact match?
Hi, we implemented a Solr suggester (http://wiki.apache.org/solr/Suggester) that uses a file based dictionary. We use the results of the suggester to populate a dropdown field of a search field on a webpage. Our dictionary (autosuggest.txt) contains: foo bar Our suggester has the following behavior: We can make a request with the search query fo and get a response with the suggestion foo. This is great. However, if we make a request with the query foo (an exact match) we get no suggestions. We would expect that the response returns the suggestion foo. How can we configure the suggester to return also the perfect match as a suggestion? This is the config for our search component: searchComponent class=solr.SpellCheckComponent name=suggest str name=queryAnalyzerFieldTypespellCheck/str lst name=spellchecker str name=namedefault/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=sourceLocationautosuggest.txt/str /lst /searchComponent Thanks for help! Mirko
problem with schema.xml
Hi, I just started playing around with Solr 1.2. It has some nice improvements. I noticed that errors in the schema.xml get reported in a verbose way now, but the following steps cause a problem for me: 1. start with a correct schema.xml - Solr works fine 2. edit it in a way that is no longer correct (say, remove the /schema closing tag - Solr works fine 3. restart the webapp (through the Tomcat manager interface) - Solr complains that the schema.xml does not parse, fine. 4. now restart again (without fixing the schema.xml!) - Solr won't even start up 5. fix the above problem (add the closing tag) and restart via Tomcat's manager - the webapp cannot restart showing that there is a problem: FAIL - Application at context path /furness could not be started The following steps might seem artificial, but assume you don't manage to fix all the typos in your schema.xml for the first attempt. It seems after restart Solr gets stuck in some state and I cannot get it up and running by Tomcat's manager, only by restarting Tomcat. Am I missing something? Thanks, mirko
Re: problem with schema.xml
Hi Ryan, I have my .war file located outside the webapps folder (I am using multiple Solr instances with a config as suggested on the wiki: http://wiki.apache.org/solr/SolrTomcat). Nevertheless, I touched the .war file, the config file, the directory under webapps, but nothing seems to be working. Any other suggestions? Is someone else experiencing the same problem? thanks, mirko Quoting Ryan McKinley [EMAIL PROTECTED]: I don't use tomcat, so I can't be particularly useful. The behavior you describe does not happen with resin or jetty... My guess is that tomcat is caching the error state. Since fixing the problem is outside the webapp directory, it does not think it has changed so it stays in a broken state. if you touch the .war file, does it restart ok? but i'm just guessing...
SolrSearchGenerator for Cocoon (2.1)
Hi, I looked at the SolrSearchGenerator (this is the part which is of interest to me), but I could not get it work for Cocoon 2.1 yet. It seems that the there is no getParameters method for the org.apache.cocoon.environment interface: http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/environment/Request.html I guess you using the getParameterNames and getParameter methods instead should do the trick. Or am I missing something? mirko Quoting Thorsten Scherler [EMAIL PROTECTED]: On Mon, 2007-03-26 at 09:30 -0400, Winona Salesky wrote: Thanks Chris, I'll take another look at the forest plugin. Have a look as well at http://wiki.apache.org/solr/SolrForrest it points out the cocoon components. salu2 -- Thorsten Scherler thorsten.at.apache.org Open Source Java XMLconsulting, training and solutions
Re: Filter query doesn't always work...
Hi, you might want to use the sint (sortable integer) fieldtype instead. If you use the integer fieldtype I guess the range queries are treated as string prefixes (like in [Ab TO Ch]). You can find some documentation about it in the example schema.xml: http://svn.apache.org/viewvc/lucene/solr/trunk/example/solr/conf/schema.xml mirko Quoting escher2k [EMAIL PROTECTED]: I have a strange problem, and I don't seem to see any issue with the data. I am filtering on a field called reviews_positive_6_mos. The field is declared as an integer. If I specify - (a) fq=reviews_positive_6mos%3A[*+TO+*] = 36033 records are retrieved. (b) fq=reviews_positive_6mos%3A[*+TO+100] = 35996 records are retrieved. (c) fq=reviews_positive_6mos%3A[80+TO+100] = 0 records are retrieved. (d) fq=reviews_positive_6mos%3A[80+TO+*] = 9 records are retrieved. (e) fq=reviews_positive_6mos%3A[100+TO+100] = 764 records are retrieved. I am not sure what could be wrong in cases (c) and (d), especially when there is a lot of data where reviews_positive_6mos = 100. Any suggestions would be most appreciated. Thanks. -- View this message in context: http://www.nabble.com/Filter-query-doesn%27t-always-work...-tf3474766.html#a9698269 Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr + cocoon problem
Hi, I agree, this is not a legal URL. But the thing is that cocoon itself is sending the unescaped URL. That is why I thought I am not using the right tools from cocoon. mirko Quoting Chris Hostetter [EMAIL PROTECTED]: : java.io.IOException: Server returned HTTP response code: 505 for URL: : http://hostname/solr/select/?q=a b : : : The interesting thing is that if I access http://hostname/solr/select/?q=a b : directly it works. i don't know anything about cocoon, but that is not a legal URL, URLs can't have spaces in them ... if you type a space into your browser, it's probably being nice and URL escaping it for you (that's what most browsers seem to do now a days) i'm guessing Cocoon automaticaly un-escapes the input to your app, and you need to re-URL escape it before sending it to Solr. -Hoss
Re: solr + cocoon problem
Thanks Thorsten, that really was helpful. Cocoon's url-encode module does solve my problem. mirko Quoting Thorsten Scherler [EMAIL PROTECTED]: On Wed, 2007-01-17 at 10:25 -0500, [EMAIL PROTECTED] wrote: Hi, I agree, this is not a legal URL. But the thing is that cocoon itself is sending the unescaped URL. ...because you told it so. You use map:generate src=http://hostname/solr/select/?q={request-param:q}; type=file The request param module will not escape the param by default. salu2
solr + cocoon problem
Hi, I am trying to implement a cocoon based application using solr for searching. In particular, I would like to forward the request from my response page to solr. I have tried several alternatives, but none of them worked for me. One which would seem a logical way to me is to have response page, which is forwarded to solr with cocoon's file generator. It works fine if I perform queries which contain only alphanumeric characters, but it gives the following error if I try to query for a string containing nonalphanum characters: http://hostname/cocoon/mywebapp/response?q=a+b java.io.IOException: Server returned HTTP response code: 505 for URL: http://hostname/solr/select/?q=a b The interesting thing is that if I access http://hostname/solr/select/?q=a b directly it works. The relevant part of my sitemap.xmap: map:match pattern=response map:generate src=http://hostname/solr/select/?q={request-param:q}; type=file /map:generate map:serialize type=xml/ /map:match Any ideas on how to implement a cocoon layer above solr? thanks, mirko ps. I realize this question might be more of a cocoon question, but I am posting it here because I have gotten the idea from http://wiki.apache.org/solr/XsltResponseWriter to use cocoon on top of solr) So, I assume some of you have already had run into similar issues and/or knows the solution...
Re: Indexing XML files
Thank you all for the quick responses. They were very helpful. My XML is well-formed, so I ended up implementing my own FieldType: public class XMLField extends TextField { public void write(XMLWriter xmlWriter, String name, Fieldable f) throws IOException { xmlWriter.writePrim(xml, name, f.stringValue(), false); } } I looked at the XSD and there is one thing I don't understand: If the desired way is to conform to the XSD (and hence the types used in XSD), then how would it possible to use user-defined fieldtypes as plugins? Wouldn't they violate the same principle? thanks, mirko Quoting Chris Hostetter [EMAIL PROTECTED]: ... I think Walters got the right idea ... as a general rule, we want to make the XmlResponseWriter bullet proof so that no matter waht data you put into your index, it is garunteed to produce a well formed XML document that conforms to a specified DTD, or XSD (see SOLR-17 for one we already have but we haven't figured out what to do with yet) ... if you're interested in writing a bit of custom java code you could in fact write a new FieldType (which could easily subclass TextField) with a custom write method that just outputs the raw value directly, and then load your field type as a plugin... http://wiki.apache.org/solr/SolrPlugins -Hoss
Indexing XML files
Hi, I am trying to index an xml file as a field in lucene, see example below: add doc field name=titleAs You Like it/field field name=authorShakespeare, William/field field name=recordmyxmlhere goes the xml.../myxml/field /doc /add I can index the title and author fields because they are strings, but the record field is an xml itself and I bump into some problems as I cannot directly input an xml file using the post.sh script (solr complains). I wonder what would be the correct (and relatively simple) way of doing it. Ideally, I would like to store the xml as is, and index only the content removing the xml-tags (I believe there is HTMLStripWhitespaceAnalyzer for that). And output the result as an xml (so, simple escaping does not work for me). So far, I had the idea of escaping the xml record and then unescaping it for inner storage and using the analyzer for indexing (which would possible require creating a class like XMLField or such). thanks, mirko
Re: Indexing XML files
Hi, Thanks for the quick response. Now, I have one more question. Is it possible to get the result for a query back in the following form (considering the input is the escaped xml, what you mentioned before): response responseHeader status0/status QTime0/QTime /responseHeader result numFound=1 start=0 doc str name=labelAs You Like It (Promptbook of McVicars 1860)/str str name=authorShakespeare, William,/str str name=recordmyxml.../myxml/str /doc /result /response Note, that the here the xml data is not escaped. If yes, what do I have to do to get such results back? Would str need to be replaced with a type, say, xml which has a different write method? Or will I only be able to display escaped xml within str (and any other types). If so, why? thanks, mirko Quoting Chris Hostetter [EMAIL PROTECTED]: Since XML is the transport for sending data to Solr, you need to make sure all field values are XML escaped. If you wanted to index a plain text title and that tile contained an ampersand character Sense Sensability ...you would need to XML escape that as... Sense amp; Sensability ...Solr internally will treat that consistently as the JAva string Sense Sensability and when it comes time to return that string back to your query clients, will output it in whatever form is appropraite for your ResponseWriter -- if that's XML, then it will be XML escaped again, if it's JSON or something ike it, it can probably be left alone. The same holds tru for any other characters you wna to include in your field values: Solr doens't care that they *value* itself is an XML string, just that you properly escape the value in your XML adddoc message to Solr... add doc field name=titleAs You Like it/field field name=authorShakespeare, William/field field name=recordlt;myxmlgt;here goes the xml...lt;/myxmlgt;/field /doc /add ...does that make sense? : Ideally, I would like to store the xml as is, and index only the content : removing the xml-tags (I believe there is HTMLStripWhitespaceAnalyzer for : that). : And output the result as an xml (so, simple escaping does not work for me). the escaping is just to send the data to Solr -- once sent, Solr will process the unescaped string when deailing with analyzers, etc exactly as you'd expect. -Hoss
Re: Indexing XML files
Hi, the idea is to apply XSLT transformation on the result. But it seems that I would have to apply two transformations in a row, one which unescapes the escaped node and a second which performs the actual transformation... mirko Quoting Yonik Seeley [EMAIL PROTECTED]: On 12/5/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: You are right, it is escaped. But my question is: (how) can I make it unescaped? For what purpose? If you use an XML parser, the values it gives back to you will be unescaped. -Yonik