Near-Realtime-Search, CommitWithin and AtomicUpdates

2020-06-16 Thread Mirko Sertic
I'm using Solr 6.6 and trying to validate my setup for AtomicUpdates and
Some questions are bogging my mind, so maybe someone can give me a hint
to make things clearer.
I am posting regular updates to a collection using the UpdateHandler and
Solr Command Syntax, including updates and deletes. These changes are
commited using the commitWithin configuration every 30 seconds.
Now I want to use AtomicUpdates on MultiValue'd fields, so I post the
"add" commands for these fields only. Sometimes I have to post multiple
Solr commands affecting the same document, but within the same
commitWithin interval. The question is now, what is the final new value
of the field after the atomic update add operations? From my point of
view the final value should be the old value plus the newly added
values, which is commited to the index in the next commitWithin period.
So can I combine multiple AtomicUpdate commands affecting the same
document within the same commitWithin interval?
Another thing that is bogging me: can I combine multiple AtomicUpdates
for the same document with CopyFields? Does Solr use some kind of
dirty-read or pending uncommited changes to get the right value of the
source field, or is the source always the last commited value?
So in summary, does Solr AtomicUpdates use some kind of dirty-read
mechanism do do its "magic" ?
Thanks in advance,

Re: how to store _text field

2015-04-28 Thread Mirko Torrisi
Hi guys,

I used the Erick's suggestions (thanks again!!) to create a new field and
copy in it the _text content.

curl -X POST -H 'Content-type:application/json' --data-binary '{
add-field : { name:content, type:string, indexed:true,
stored:true}, add-copy-field : { source:_text, dest: [
content]}}' http://localhost:8983/solr/Test/schema

That seems a good way but I discovered the presence of bias in every
content field. Indeed, they start with a string of this kind:

 \n \n stream_content_type text/plain  \n stream_size 1556  \n
Content-Encoding UTF-8  \n X-Parsed-By
org.apache.tika.parser.DefaultParser  \n X-Parsed-By
org.apache.tika.parser.txt.TXTParser  \n Content-Type text/plain;
charset=UTF-8  \n resourceName /home/mirko/Desktop/data

Now I need to cut off this part but I have no idea also because the path
(present in the last part) has a dynamic length.

For someone could be a problem to have two field with the same content
(double space needed). I have not this problem because I use Solrj to
import, modify and export each document. Maybe I could use it to do also
this but hopefully you know a cleaner method.



On 19 March 2015 at 20:11, Erick Erickson wrote:

 Hmm, not all that sure. That's one thing about schemaless indexing, it
 has to guess. It does the best it can, but it's quite possible that it
 guesses wrong.

 If this is a mananged schema, you can use the REST API commands to
 make whatever field you want. Or you can start over with a concrete
 schema.xml and use _that_. Otherwise, I'm not sure what to say without
 actually being on your system.

 Wish I could help more.

 On Thu, Mar 19, 2015 at 5:39 AM, Mirko Torrisi wrote:
  Hi Erick,
  I'm sorry for this delay but I've just seen this reply.
  I'm using the last version of solr and the default setting is to use the
  kind of indexing, it doesn't use schema.xml and for that I have no idea
  about how set store for this field.
  The content is grabbed because I've obtained results using the search
  function but it is not showed because it is not setted to store.
  I hope to be clear.
  Thanks very much.
  All the best,
  On 14/03/15 17:58, Erick Erickson wrote:
  Right, your schema.xml file will define, perhaps, some dynamic
  fields. First insure that stored=true is specified. If you change
  this, you have to re-index the docs.
  Second, insure that your fl parameter with the field is specified on
  the requests, something like q=*:*fl=eoe_txt.
  Third, insure that you are actually sending content to that field when
  you index docs.
  If none of this helps, show us the definition from schema.xml and a
  sample input document and a query that illustrate the problem please.
  On Fri, Mar 13, 2015 at 1:20 AM, Mirko Torrisi wrote:
  Hi Alexandre,
  I need to visualize the content of _txt. For some reasons, actual it is
  showed in the results (the response).
  I guess that it doesn't happen because it isn't stored (for some
  setting that I'd like to change).
  Thanks for your help,
  On 13/03/15 00:27, Alexandre Rafalovitch wrote:
  Wait, step back. This is confusing. What's your real problem you are
  trying to solve?
  Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
  On 12 March 2015 at 19:50, Mirko Torrisi
  Hi folks,
  I googled and tried without success so I ask you: how can I modify
  setting of a field to store it ?
  It is interesting to note that I did not add _text field so I guess
  default one. Maybe it is normal that it is not showed on the result
  actually this is my real problem. It could be grand also to copy it
  field but I do not know how to do it with the last Solr (5) and the
  of schema. I know that I have to use curl but I do not know how to
  copy a field.
  Thank you in advance!

Addtion to solr wiki editor list

2015-04-19 Thread Mirko Cegledi
Hi there!

I'd like to be added to the list of people who are able to edit the solr
wiki at I'm working as a Java developer for a
german company using Solr (and like it a lot) a lot and I would like to be
able to correct things as soon as I find them without going to the
IRC-channel to get things changed.

My wiki name should be campfire.

Thanks in advance

Re: how to store _text field

2015-03-19 Thread Mirko Torrisi

Hi Erick,

I'm sorry for this delay but I've just seen this reply.

I'm using the last version of solr and the default setting is to use the 
new kind of indexing, it doesn't use schema.xml and for that I have no 
idea about how set store for this field.
The content is grabbed because I've obtained results using the search 
function but it is not showed because it is not setted to store.

I hope to be clear.
Thanks very much.

All the best,


On 14/03/15 17:58, Erick Erickson wrote:

Right, your schema.xml file will define, perhaps, some dynamic
fields. First insure that stored=true is specified. If you change
this, you have to re-index the docs.

Second, insure that your fl parameter with the field is specified on
the requests, something like q=*:*fl=eoe_txt.

Third, insure that you are actually sending content to that field when
you index docs.

If none of this helps, show us the definition from schema.xml and a
sample input document and a query that illustrate the problem please.


On Fri, Mar 13, 2015 at 1:20 AM, Mirko Torrisi wrote:

Hi Alexandre,

I need to visualize the content of _txt. For some reasons, actual it is not
showed in the results (the response).
I guess that it doesn't happen because it isn't stored (for some default
setting that I'd like to change).

Thanks for your help,


On 13/03/15 00:27, Alexandre Rafalovitch wrote:

Wait, step back. This is confusing. What's your real problem you are
trying to solve?


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

On 12 March 2015 at 19:50, Mirko Torrisi

Hi folks,

I googled and tried without success so I ask you: how can I modify the
setting of a field to store it ?

It is interesting to note that I did not add _text field so I guess it is
default one. Maybe it is normal that it is not showed on the result but
actually this is my real problem. It could be grand also to copy it in a
field but I do not know how to do it with the last Solr (5) and the new
of schema. I know that I have to use curl but I do not know how to use it
copy a field.

Thank you in advance!


Re: how to store _text field

2015-03-13 Thread Mirko Torrisi

Hi Alexandre,

I need to visualize the content of _txt. For some reasons, actual it is 
not showed in the results (the response).
I guess that it doesn't happen because it isn't stored (for some default 
setting that I'd like to change).

Thanks for your help,


On 13/03/15 00:27, Alexandre Rafalovitch wrote:

Wait, step back. This is confusing. What's your real problem you are
trying to solve?


Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:

On 12 March 2015 at 19:50, Mirko Torrisi wrote:

Hi folks,

I googled and tried without success so I ask you: how can I modify the
setting of a field to store it ?

It is interesting to note that I did not add _text field so I guess it is a
default one. Maybe it is normal that it is not showed on the result but
actually this is my real problem. It could be grand also to copy it in a new
field but I do not know how to do it with the last Solr (5) and the new kind
of schema. I know that I have to use curl but I do not know how to use it to
copy a field.

Thank you in advance!


Re: Invalid Date String:'1992-07-10T17'

2015-03-11 Thread Mirko Torrisi
Thanks very much for each of your replies. These resolved my problem and 
teach me something important.
I have just discovered that I have another problem but I guess that I 
have to open another discussion.



On 10/03/15 20:30, Chris Hostetter wrote:

: is a syntactically significant character to the query parser, so it's
getting confused by it in the text of your query.

you're seeing the same problem as if you tried to search for foo:bar in
the yak field using q=yak:foo:bar

you either need to backslash escape the : characters, or wrap the date
in quotes, or use a diff parser that doesn't treat colons as special
characters (but remember that since you are building this up as a java
string, you have to deal with *java* string escaping as well...

String a = speechDate:1992-07-10T17\\:33\\:18Z;
String a = speechDate:\1992-07-10T17:33:18Z\;
String a = speechDate: + 
String a = {!field f=speechDate}1992-07-10T17:33:18Z;

: My goal is to group these speeches (hopefully using date math syntax). I would

Unless you are truely seraching for only documents that have an *exact*
date value matching your input (down to the millisecond) then seraching or
a single date value is almost certainly not what you want -- you most
likely want to do a range search...

   String a = speechDate:[1992-07-10T00:00:00Z TO 1992-07-11T00:00:00Z];

(which doesn't require special escaping, because the query parser is smart
enough to know that : aren't special inside of the [..])

: like to know if you suggest me to use date or tdate or other because I have
: not understood the difference.

the difference between date and tdate has to do with how you wnat to trade
index size (on disk  in ram) with search speed for range queries like
these -- tdate takes up a little more room in the index, but came make
range queries faster.


Invalid Date String:'1992-07-10T17'

2015-03-10 Thread Mirko Torrisi

Hi all,

I am very new with Solr (and Lucene) and I use the last version of it.
I do not understand why I obtain this:

   Exception in thread main
   org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
   from server at http://localhost:8983/solr/Collection1: Invalid Date
at Update.main(

Here the code that creates this error:

SolrQuery query = new SolrQuery();
String a = speechDate:1992-07-10T17:33:18Z;
query.set(fq, a);
//query.setQuery( a );  -- I also tried using this one.

According to, it 
should be right. I tried with others date, or just |-MM-DD, with no 

My goal is to group these speeches (hopefully using date math syntax). I 
would like to know if you suggest me to use date or tdate or other 
because I have not understood the difference.

Thanks in advance,|


Create field date using name file

2015-03-02 Thread Mirko Torrisi

Hi folks,

Hopefully this is an easy question but I couldn't do it after several 

I created a new field (adding field name=date type=date 
indexed=true stored=true/) and I'd like to use name file value to 
fill out it.
The name files are like: TEXT_CRE_MMGG_X-XXX-XXX.txt or 
TEXT_CRE_MMGG_X-XXX.txt (where every X are random numbers).

I'd like to use a date field type to be able to use some group functions.

Thank in advance.
Have a nice week,


Re: Create field date using name file

2015-03-02 Thread Mirko Torrisi
I forgot to add that the txt files are divided in directory following 
this rule: //MM/**files**.


Solr Suggester ranked by boost

2013-12-04 Thread Mirko
I want to implement a Solr Suggester (
that ranks suggestions by document boost factor.

As I understand the documentation, the following config should work:


requestHandler name=/suggest class=solr.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.count7/str
str name=spellcheck.onlyMorePopulartrue/str
arr name=last-components

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namedefault/str
str name=fieldsuggesttext/str
str name=buildOnCommittrue/str


field name=suggesttext type=text indexed=true  stored=true
fieldType name=text class=solr.TextField omitNorms=false/

I added three documents with a document boost:


add: {
  commitWithin: 5000,
  overwrite: true,
  boost: 3.0,
  doc: {
id: 1,
suggesttext: text bb
add: {
  commitWithin: 5000,
  overwrite: true,
  boost: 2.0,
  doc: {
id: 2,
suggesttext: text cc
add: {
  commitWithin: 5000,
  overwrite: true,
  boost: 1.0,
  doc: {
id: 3,
suggesttext: text aa


A query the suggest handler (with spellcheck.q=te) gives the following

suggesttext:[text bb]},
suggesttext:[text cc]},
suggesttext:[text aa]}]
suggestion:[text aa,
  text bb,
  text cc]}]}}

The search results are ranked by boost as expected. However, the
suggestions are not ranked by boost (but alphabetically instead). I also
tried the TSTLookup and FSTLookup lookup implementations with the same

Any ideas what I'm missing?


Re: Automatically build spellcheck dictionary on replicas

2013-12-04 Thread Mirko
Ok, thanks for pointing that out!

2013/12/3 Kydryavtsev Andrey

 Yep, sorry, it doesn't work for file-based dictionaries:

  In particular, you still need to index the dictionary file once by
 issuing a search with on the end of the URL; if you
 system doesn't update that dictionary file, then this only needs to be done
 once. This manual step may be required even if your configuration sets
 build=true and reload=true.

 03.12.2013, 21:27, Mirko
  Yes, I have that, but it doesn't help. It seems Solr still needs the
  with the parameter to build the spellchecker index.
  2013/12/3 Kydryavtsev Andrey
   Did you try to add
 str name=buildOnCommittrue/str
parameter to your slave's spellcheck configuration?
   03.12.2013, 12:04, Mirko
   Hi all,
   We use a Solr SpellcheckComponent with a file-based dictionary. We
 run a
   master and some replica slave servers. To update the dictionary, we
   the dictionary txt file to the master, from where it is automatically
   replicated to all slaves. However, it seems we need to run the query on all servers individually.
   Is there a way to automatically build the spellcheck dictionary on all
   servers without calling on all slaves individually?
   We use Solr 4.0.0

Re: Automatically build spellcheck dictionary on replicas

2013-12-03 Thread Mirko
Yes, I have that, but it doesn't help. It seems Solr still needs the query
with the parameter to build the spellchecker index.

2013/12/3 Kydryavtsev Andrey

 Did you try to add
   str name=buildOnCommittrue/str
  parameter to your slave's spellcheck configuration?

 03.12.2013, 12:04, Mirko
  Hi all,
  We use a Solr SpellcheckComponent with a file-based dictionary. We run a
  master and some replica slave servers. To update the dictionary, we copy
  the dictionary txt file to the master, from where it is automatically
  replicated to all slaves. However, it seems we need to run the query on all servers individually.
  Is there a way to automatically build the spellcheck dictionary on all
  servers without calling on all slaves individually?
  We use Solr 4.0.0

Re: Parse eDisMax queries for keywords

2013-11-25 Thread Mirko
Hi Jack,
thanks for your reply. Ok in this case I agree that enriching the query
in the application layer is a good idea. We are still a bit puzzled how the
enriched query should look like. I'll post here when we found a solution.
If somebody has suggestions, I'd be happy to hear them.


2013/11/21 Jack Krupansky

 The query parser does its own tokenization and parsing before your
 analyzer tokenizer and filters are called, assuring that only one white
 space-delimited token is analyzed at a time.

 You're probably best off having an application layer preprocessor for the
 query that enriches the query in the manner that you're describing.

 Or, simply settle for a heuristic approach that may give you 70% of what
 you want using only existing Solr features on the server side.

 -- Jack Krupansky

 -Original Message- From: Mirko
 Sent: Thursday, November 21, 2013 5:30 AM
 Subject: Parse eDisMax queries for keywords

 We would like to implement special handling for queries that contain
 certain keywords. Our particular use case:

 In the example query Footitle season 1 we want to discover the keywords
 season , get the subsequent number, and boost (or filter for) documents
 that match 1 on field name=season.

 We have two fields in our schema:

 !-- titles contains titles --
 field name=title type=text indexed=true stored=true

 fieldType name=text class=solr.TextField omitNorms=true
charFilter class=solr.MappingCharFilterFactory
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
!-- ... --

 field name=season type=season_number indexed=true stored=false

 !-- season contains season numbers --
 fieldType name=season_number class=solr.TextField omitNorms=true 
 analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
 filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season)
 *0*([0-9]+).* replacement=$1/

 Our idea was to use a Keyword tokenizer and a Regex on the season field
 to extract the season number from the complete query.

 However, we use a ExtendedDisMax query parser in our search handler:

 requestHandler name=/select class=solr.SearchHandler
lst name=defaults
str name=defTypeedismax/str
str name=qf
title season


 The problem is that the eDisMax tokenizes the query, so that our field
 season receives the tokens [Foo, season, 1] without any order,
 instead of the complete query.

 How can we pass the complete query (untokenized) to the season field? We
 don't understand which tokenizer is used here and why our season field
 received tokens instead of the complete query.

 Or is there another approach to solve this use case with Solr?


Re: Suggester - how to return exact match?

2013-11-25 Thread Mirko
Thanks! We solved this issue in the front-end now. I.e. we add the exact
match to the list of suggestions there.


2013/11/22 Developer

 Might not be a perfect solution but you can use edgengram filter and copy
 your field data to that field and use it for suggestion.

 fieldType name=text_autocomplete class=solr.TextField
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=1
 maxGramSize=250 /
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/


 The above query will return

 View this message in context:
 Sent from the Solr - User mailing list archive at

Re: Suggester - how to return exact match?

2013-11-21 Thread Mirko
I'd like to clarify our use case a bit more.

We want to return the exact search query as a suggestion only if it is
present in the index. So in my example we would expect to get the
suggestion foo for the query foo but no suggestion abc for the query
abc (because abc is not in the dictionary).

For me this use case seems quite common. Say, we have three products in our
store: foo, foo 1, foo 2. If the user types foo in the product
search, we want to suggest all our products in the dropdown.

Is this something we can do with the Solr suggester?

2013/11/20 Developer

 May be there is a way to do this but it doesn't make sense to return the
 search query as a suggestion (Search query is not a suggestion as it might
 or might not be present in the index).

 AFAIK you can use various look up algorithm to get the suggestion list and
 they lookup the terms based on the query value (some alogrithm implements
 fuzzy logic too). so searching Foo will return FooBar, Foo2 but not foo.

 You should fetch the suggestion only if the numfound is greater than 0 else
 you don't have any suggestion.

 View this message in context:
 Sent from the Solr - User mailing list archive at

Parse eDisMax queries for keywords

2013-11-21 Thread Mirko
We would like to implement special handling for queries that contain
certain keywords. Our particular use case:

In the example query Footitle season 1 we want to discover the keywords
season , get the subsequent number, and boost (or filter for) documents
that match 1 on field name=season.

We have two fields in our schema:

!-- titles contains titles --
field name=title type=text indexed=true stored=true

fieldType name=text class=solr.TextField omitNorms=true
charFilter class=solr.MappingCharFilterFactory
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
!-- ... --

field name=season type=season_number indexed=true stored=false

!-- season contains season numbers --
fieldType name=season_number class=solr.TextField omitNorms=true 
analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=.*(?:season)
*0*([0-9]+).* replacement=$1/

Our idea was to use a Keyword tokenizer and a Regex on the season field
to extract the season number from the complete query.

However, we use a ExtendedDisMax query parser in our search handler:

requestHandler name=/select class=solr.SearchHandler
lst name=defaults
str name=defTypeedismax/str
str name=qf
title season


The problem is that the eDisMax tokenizes the query, so that our field
season receives the tokens [Foo, season, 1] without any order,
instead of the complete query.

How can we pass the complete query (untokenized) to the season field? We
don't understand which tokenizer is used here and why our season field
received tokens instead of the complete query.

Or is there another approach to solve this use case with Solr?


Suggester - how to return exact match?

2013-11-20 Thread Mirko
we implemented a Solr suggester (
that uses a file based dictionary. We use the results of the suggester to
populate a dropdown field of a search field on a webpage.

Our dictionary (autosuggest.txt) contains:


Our suggester has the following behavior:

We can make a request with the search query fo and get a response with
the suggestion foo. This is great.

However, if we make a request with the query foo (an exact match) we get
no suggestions. We would expect that the response returns the suggestion

How can we configure the suggester to return also the perfect match as a

This is the config for our search component:

searchComponent class=solr.SpellCheckComponent name=suggest
str name=queryAnalyzerFieldTypespellCheck/str
lst name=spellchecker
  str name=namedefault/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str name=sourceLocationautosuggest.txt/str

Thanks for help!

problem with schema.xml

2007-06-08 Thread mirko

I just started playing around with Solr 1.2.  It has some nice improvements.
I noticed that errors in the schema.xml get reported in a verbose way now, but
the following steps cause a problem for me:

1. start with a correct schema.xml - Solr works fine
2. edit it in a way that is no longer correct (say, remove the /schema closing
tag - Solr works fine
3. restart the webapp (through the Tomcat manager interface) - Solr complains
that the schema.xml does not parse, fine.
4. now restart again (without fixing the schema.xml!) - Solr won't even start up
5. fix the above problem (add the closing tag) and restart via Tomcat's manager
- the webapp cannot restart showing that there is a problem:
FAIL - Application at context path /furness could not be started

The following steps might seem artificial, but assume you don't manage to fix
all the typos in your schema.xml for the first attempt.  It seems after restart
Solr gets stuck in some state and I cannot get it up and running by Tomcat's
manager, only by restarting Tomcat.

Am I missing something?

Re: problem with schema.xml

2007-06-08 Thread mirko
Hi Ryan,

I have my .war file located outside the webapps folder (I am using multiple
Solr instances with a config as suggested on the wiki:

Nevertheless, I touched the .war file, the config file, the directory under
webapps, but nothing seems to be working.

Any other suggestions?  Is someone else experiencing the same problem?

Quoting Ryan McKinley [EMAIL PROTECTED]:

 I don't use tomcat, so I can't be particularly useful.  The behavior you
 describe does not happen with resin or jetty...

 My guess is that tomcat is caching the error state.  Since fixing the
 problem is outside the webapp directory, it does not think it has
 changed so it stays in a broken state.

 if you touch the .war file, does it restart ok?

 but i'm just guessing...

SolrSearchGenerator for Cocoon (2.1)

2007-03-27 Thread mirko

I looked at the SolrSearchGenerator (this is the part which is of interest to
me), but I could not get it work for Cocoon 2.1 yet.

It seems that the there is no getParameters method for the
org.apache.cocoon.environment interface:
I guess you using the getParameterNames and getParameter methods instead should
do the trick.

Or am I missing something?


Quoting Thorsten Scherler [EMAIL PROTECTED]:

 On Mon, 2007-03-26 at 09:30 -0400, Winona Salesky wrote:
  Thanks Chris, I'll take another look at the forest plugin.

 Have a look as well at
 it points out the cocoon components.

 Thorsten Scherler
 Open Source Java  XMLconsulting, training and solutions

Re: Filter query doesn't always work...

2007-03-27 Thread mirko

you might want to use the sint (sortable integer) fieldtype instead.  If you use
 the integer fieldtype I guess the range queries are treated as string prefixes
(like in [Ab TO Ch]).

You can find some documentation about it in the example schema.xml:


Quoting escher2k [EMAIL PROTECTED]:

 I have a strange problem, and I don't seem to see any issue with the data. I
 am filtering
 on a field called reviews_positive_6_mos. The field is declared as an

 If I specify -
 (a) fq=reviews_positive_6mos%3A[*+TO+*] = 36033 records are retrieved.
 (b) fq=reviews_positive_6mos%3A[*+TO+100] = 35996 records are retrieved.
 (c) fq=reviews_positive_6mos%3A[80+TO+100] = 0 records are retrieved.
 (d) fq=reviews_positive_6mos%3A[80+TO+*] = 9 records are retrieved.
 (e) fq=reviews_positive_6mos%3A[100+TO+100] = 764 records are retrieved.

 I am not sure what could be wrong in cases (c) and (d), especially when
 there is a lot of data where
 reviews_positive_6mos = 100. Any suggestions would be most appreciated.

 View this message in context:
 Sent from the Solr - User mailing list archive at

Re: solr + cocoon problem

2007-01-17 Thread mirko

I agree, this is not a legal URL.  But the thing is that cocoon itself is
sending the unescaped URL.  That is why I thought I am not using the right
tools from cocoon.


Quoting Chris Hostetter [EMAIL PROTECTED]:

 : Server returned HTTP response code: 505 for URL:
 : http://hostname/solr/select/?q=a b
 : The interesting thing is that if I access http://hostname/solr/select/?q=a
 : directly it works.

 i don't know anything about cocoon, but that is not a legal URL, URLs
 can't have spaces in them ... if you type a space into your browser, it's
 probably being nice and URL escaping it for you (that's what most browsers
 seem to do now a days)

 i'm guessing Cocoon automaticaly un-escapes the input to your app, and you
 need to re-URL escape it before sending it to Solr.


Re: solr + cocoon problem

2007-01-17 Thread mirko
Thanks Thorsten,

that really was helpful.  Cocoon's url-encode module does solve my problem.


Quoting Thorsten Scherler [EMAIL PROTECTED]:

 On Wed, 2007-01-17 at 10:25 -0500, [EMAIL PROTECTED] wrote:
  I agree, this is not a legal URL.  But the thing is that cocoon itself is
  sending the unescaped URL.

 ...because you told it so.

 You use

 The request param module will not escape the param by default.


solr + cocoon problem

2007-01-16 Thread mirko

I am trying to implement a cocoon based application using solr for searching.
In particular, I would like to forward the request from my response page to
solr.  I have tried several alternatives, but none of them worked for me.

One which would seem a logical way to me is to have response page, which is
forwarded to solr with cocoon's file generator.  It works fine if I perform
queries which contain only alphanumeric characters, but it gives the following
error if I try to query for a string containing nonalphanum characters:

http://hostname/cocoon/mywebapp/response?q=a+b Server returned HTTP response code: 505 for URL:
http://hostname/solr/select/?q=a b

The interesting thing is that if I access http://hostname/solr/select/?q=a b
directly it works.

The relevant part of my sitemap.xmap:

map:match pattern=response
  map:serialize type=xml/

Any ideas on how to implement a cocoon layer above solr?


ps. I realize this question might be more of a cocoon question, but I am
posting it here because I have gotten the idea from to use cocoon on top of solr) 
So, I assume some of you have already had run into similar issues and/or knows
the solution...

Re: Indexing XML files

2006-12-07 Thread mirko
Thank you all for the quick responses.  They were very helpful.

My XML is well-formed, so I ended up implementing my own FieldType:

public class XMLField extends TextField {
  public void write(XMLWriter xmlWriter, String name, Fieldable f) throws
IOException {
xmlWriter.writePrim(xml, name, f.stringValue(), false);

I looked at the XSD and there is one thing I don't understand:

If the desired way is to conform to the XSD (and hence the types used in XSD),
then how would it possible to use user-defined fieldtypes as plugins?  Wouldn't
they violate the same principle?


Quoting Chris Hostetter [EMAIL PROTECTED]:
 I think Walters got the right idea ... as a general rule, we want to make
 the XmlResponseWriter bullet proof so that no matter waht data you put
 into your index, it is garunteed to produce a well formed XML document
 that conforms to a specified DTD, or XSD (see SOLR-17 for one we already
 have but we haven't figured out what to do with yet)


 if you're interested in writing a bit of custom java code you could in
 fact write a new FieldType (which could easily subclass TextField) with a
 custom write method that just outputs the raw value directly, and then
 load your field type as a plugin...


Indexing XML files

2006-12-05 Thread mirko

I am trying to index an xml file as a field in lucene, see example below:

  field name=titleAs You Like it/field
  field name=authorShakespeare, William/field
  field name=recordmyxmlhere goes the xml.../myxml/field

I can index the title and author fields because they are strings, but the
record field is an xml itself and I bump into some problems as I cannot
directly input an xml file using the script (solr complains).

I wonder what would be the correct (and relatively simple) way of doing it. 
Ideally, I would like to store the xml as is, and index only the content
removing the xml-tags (I believe there is HTMLStripWhitespaceAnalyzer for
And output the result as an xml (so, simple escaping does not work for me).

So far, I had the idea of escaping the xml record and then unescaping it for
inner storage and using the analyzer for indexing (which would possible
require creating a class like XMLField or such).


Re: Indexing XML files

2006-12-05 Thread mirko

Thanks for the quick response.  Now, I have one more question.
Is it possible to get the result for a query back in the following form
(considering the input is the escaped xml, what you mentioned before):


 result numFound=1 start=0
   str name=labelAs You Like It (Promptbook of McVicars 1860)/str
   str name=authorShakespeare, William,/str
   str name=recordmyxml.../myxml/str

Note, that the here the xml data is not escaped.  If yes, what do I have to do
to get such results back?  Would str need to be replaced with a type, say,
xml which has a different write method?  Or will I only be able to display
escaped xml within str (and any other types).  If so, why?


Quoting Chris Hostetter [EMAIL PROTECTED]:

 Since XML is the transport for sending data to Solr, you need to make sure
 all field values are XML escaped.

 If you wanted to index a plain text title and that tile contained an
 ampersand character

   Sense  Sensability would need to XML escape that as...

   Sense amp; Sensability

 ...Solr internally will treat that consistently as the JAva string Sense
  Sensability and when it comes time to return that string back to your
 query clients, will output it in whatever form is appropraite for your
 ResponseWriter -- if that's XML, then it will be XML escaped again, if
 it's JSON or something ike it, it can probably be left alone.

 The same holds tru for any other characters you wna to include in your
 field values: Solr doens't care that they *value* itself is an XML string,
 just that you properly escape the value in your XML adddoc message to

field name=titleAs You Like it/field
field name=authorShakespeare, William/field
field name=recordlt;myxmlgt;here goes the;/myxmlgt;/field

 ...does that make sense?

 : Ideally, I would like to store the xml as is, and index only the content
 : removing the xml-tags (I believe there is HTMLStripWhitespaceAnalyzer for
 : that).
 : And output the result as an xml (so, simple escaping does not work for me).

 the escaping is just to send the data to Solr -- once sent, Solr will
 process the unescaped string when deailing with analyzers, etc exactly as
 you'd expect.


Re: Indexing XML files

2006-12-05 Thread mirko

the idea is to apply XSLT transformation on the result.  But it seems that
I would have to apply two transformations in a row, one which unescapes the
escaped node and a second which performs the actual transformation...


Quoting Yonik Seeley [EMAIL PROTECTED]:

  You are right, it is escaped.  But my question is: (how) can I
  make it unescaped?

 For what purpose?
 If you use an XML parser, the values it gives back to you will be unescaped.
