date:20091008

I will try this out. How does 1 and 2 boost the my startswith query? Is it
because of the n-gram filter?


On Thu, Oct 8, 2009 at 1:29 PM, Avlesh Singh avl...@gmail.com wrote:

 You would need to boost your startswith matches artificially for the
 desired behavior.
 I would do it this way -

   1. Create a KeywordTokenized field with n-gram filter.
   2. Create a Whitespace tokenized field with n-gram flter.
   3. Search on both the fields, boost matches for #1 over #2.

 Hope this helps.

 Cheers
 Avlesh

 On Thu, Oct 8, 2009 at 10:30 AM, R. Tan tanrihae...@gmail.com wrote:

  Hi,
  How can I get wildcard search (e.g. cha*) to score documents based on the
  position of the keyword in a field? Closer (to the start) means higher
  score.
 
  For example, I have multiple documents with titles containing the word
  champion. Some of the document titles start with the word champion
 and
  some our entitled we are the champions. The ones that starts with the
  keyword needs to rank first or score higher. Is there a way to do this?
 I'm
  using this query for auto-suggest term feature where the keyword doesn't
  necessarily need to be the first word.
 
  Rihaed

Re: Scoring for specific field queries

This might work and I also have a single value field which makes it cleaner.
Can sort be customized (with indexOf()) from the solr parameters alone?

Thanks!


On Thu, Oct 8, 2009 at 1:40 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote:


 Hi Rihaed,
 I guess we don't need to depend on scores all the times.
 You can use custom sort to sort the results. Take a dynamicField, fill it
 with indexOf(keyword) value, sort the results by the field in ascending
 order. Then the records which contain the keyword at the earlier position
 will come first.

 Regards,
 Sandeep


 R. Tan wrote:
 
  Hi,
  How can I get wildcard search (e.g. cha*) to score documents based on the
  position of the keyword in a field? Closer (to the start) means higher
  score.
 

 --
 View this message in context:
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798657.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scoring for specific field queries

2009-10-08 Thread Sandeep Tagore


Hi Avlesh,
Thanks for your attention to my post. 

   1. If the word computer occurs in multiple times in a document what
   would you do in that case? Is this dynamic field supposed to be
multivalued?
   I can't even imagine what would you do if the word computer occurs in
   multiple documents multiple times?
   = It doesn't matter how many times a word occurs in that document.
Consider its first occurrence and use it for sorting. The dynamic field
should not be multivalued. If the keyword occurs at the same position in
multiple documents then the document which is inserted first will come
first.
   2. Multivalued fields cannot be sorted upon.
   = Yes.. I agree.
   3. One needs to know the unique number of such keywords before
   implementing because you'll potentially end up creating those many
fields.
   = I didn't get this. Why one should know the unique number of keywords
before implementation. If we have the logic, it works for all the keywords.
Most of the people do the same in case of geographical sorting. They
calculate the distance and sort it before displaying it. They don't need to
worry about the distance which user requests for.

Please tell me your thoughts and correct me if I am wrong.

Thanks,
Sandeep
-- 
View this message in context: 
http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798925.html
Sent from the Solr - User mailing list archive at Nabble.com.

Sorting by insertion time

2009-10-08 Thread tarjei


Hi,

Quite often I want a set of documents ordered by the time they were 
inserted, i.e. give me the 5 latest items that matches query foo. I 
usually solve this by sorting on a date field.


I had a chat with Eric Hatcher when he visited Javazone 2009 and he said 
that Solr places documents on disk in insertion order.


This would make it possible for me to save a sorting step by not sorting 
by a specific field, but by insertion time in reverse.


AFAIK Lucene knows how to do this, but which request parameters should I 
use in Solr?


Kind regards,
Tarjei


--
Tarjei Huse
Mobil: 920 63 413

Re: Scoring for specific field queries

2009-10-08 Thread Sandeep Tagore


Yes it can be done but it needs some customization. Search for custom sort
implementations/discussions.
You can check...
http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html.
Let us know if you have any issues.

Sandeep


R. Tan wrote:
 
 This might work and I also have a single value field which makes it
 cleaner.
 Can sort be customized (with indexOf()) from the solr parameters alone?
 

-- 
View this message in context: 
http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facet query pb

2009-10-08 Thread clico




clico wrote:
 
 That's not a pb
 I want to use that in order to drill down a tree
 
 
 Christian Zambrano wrote:
 
 Clico,
 
 Because you are doing a wildcard query, the token 'AMERICA' will not be 
 analyzed at all. This means that 'AMERICA*' will NOT match 'america'.
 
 On 10/07/2009 12:30 PM, Avlesh Singh wrote:
 I have no idea what pb mean but this is what you probably want -
 fq=(location_field:(NORTH AMERICA*))

 Cheers
 Avlesh

 On Wed, Oct 7, 2009 at 10:40 PM, clicocl...@mairie-marseille.fr 
 wrote:


 Hello
 I have a pb trying to retrieve a tree with facet use

 I 've got a field location_field
 Each doc in my index has a location_field

 Location field can be
 continent/country/city


 I have 2 queries:

 http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29:
 ok, retrieve docs

 http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*)
 : not ok


 I think with NORTH AMERICA I have a pb with the space caractere

 Could u help me



 --
 View this message in context:
 http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html
 Sent from the Solr - User mailing list archive at Nabble.com.


  

 
 
 
 

I'm sorry, this syntax does not work anymore
-- 
View this message in context: 
http://www.nabble.com/Facet-query-pb-tp25790667p25799911.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scoring for specific field queries

I will have to pass on this and try your suggestion first. So, how does your
suggestion (1 and 2) boost the my startswith query? Is it because of the
n-gram filter?


On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote:


 Yes it can be done but it needs some customization. Search for custom sort
 implementations/discussions.
 You can check...

 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
 .
 Let us know if you have any issues.

 Sandeep


 R. Tan wrote:
 
  This might work and I also have a single value field which makes it
  cleaner.
  Can sort be customized (with indexOf()) from the solr parameters alone?
 

 --
 View this message in context:
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: ISOLatin1AccentFilter before or after Snowball?

2009-10-08 Thread Chantal Ackermann


Now, you got me wondering - wich one should I like better?
I didn't even know there is an alternative. :-)

Chantal

Koji Sekiguchi schrieb:

No, ISOLatin1AccentFilterFactory is not deprecated.
You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt
or ISOLatin1AccentFilterFactory whichever you'd like.

Koji


Jay Hill wrote:

Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory
deprecated in favor of:
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/

in 1.4?

-Jay
http://www.lucidimagination.com

Re: how to rename a schema field, whose values are indexed already?

2009-10-08 Thread Sandeep Tagore


I guess you cant do it. I tried it before. I had a field with name 'KEYWORD'
and i changed it to 'keyword' and it didn't work. Everything else was normal
and I searched with 'KEYWORD' i got an exception saying undefined field and
I searched with 'keyword' , I got 0 results. It didn't work even after
optimizing. I re-indexed the data and it worked. 

Regards,
Sandeep


M.Noor wrote:
 
  how to rename a schema field, if its values are indexed already ??
 
-- 
View this message in context: 
http://www.nabble.com/how-to-rename-a-schema-field%2C-whose-values-are-indexed-already--tp25800631p25801695.html
Sent from the Solr - User mailing list archive at Nabble.com.

how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Pravin Karne

Hi,
I am new to solr. I am able to index, search and update with small size(around 
500mb)
But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
heap exception.
While investigation I found that post jar or post.sh load whole file in memory.

I use one work around with dividing small file in small files..and it's working

Is there any other way to post large file as above work around is not feasible 
for 1 TB file

Thanks
-Pravin


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.

Re: how to rename a schema field, whose values are indexed already?

2009-10-08 Thread noor


Without re-indexing the data,
how i rename, any one of the schema field ??

Sandeep Tagore wrote:

I guess you cant do it. I tried it before. I had a field with name 'KEYWORD'
and i changed it to 'keyword' and it didn't work. Everything else was normal
and I searched with 'KEYWORD' i got an exception saying undefined field and
I searched with 'keyword' , I got 0 results. It didn't work even after
optimizing. I re-indexed the data and it worked. 


Regards,
Sandeep


M.Noor wrote:
  

 how to rename a schema field, if its values are indexed already ??

Re: how to rename a schema field, whose values are indexed already?

2009-10-08 Thread noor


Without re-indexing the data,
how to rename, any one of the schema field ??

Sandeep Tagore wrote:

I guess you cant do it. I tried it before. I had a field with name 'KEYWORD'
and i changed it to 'keyword' and it didn't work. Everything else was normal
and I searched with 'KEYWORD' i got an exception saying undefined field and
I searched with 'keyword' , I got 0 results. It didn't work even after
optimizing. I re-indexed the data and it worked. 


Regards,
Sandeep


M.Noor wrote:
  

 how to rename a schema field, if its values are indexed already ??

Re: Ranking of search results

2009-10-08 Thread bhaskar chandrasekar

Hi Amith,
 
 
I tried with the options you gave and gave debug=true at the end of the URL.
I am getting output as 
 
 
lst name=debug
  str name=rawquerystringchannel/str 
  str name=querystringchannel/str 
  str name=parsedquerytext:channel/str 
  str name=parsedquery_toStringtext:channel/str 
- lst name=explain
  str name=http://hotmail;1.2682627 = (MATCH) fieldWeight(text:channel in 
3), product of: 2.828427 = tf(termFreq(text:channel)=8) 2.049822 = 
idf(docFreq=6, numDocs=20) 0.21875 = fieldNorm(field=text, doc=3)/str 
  str name=http://share;1.0026497 = (MATCH) fieldWeight(text:channel in 19), 
product of: 2.236068 = tf(termFreq(text:channel)=5) 2.049822 = idf(docFreq=6, 
numDocs=20) 0.21875 = fieldNorm(field=text, doc=19)/str 
  str name=http://metacreek;0.6341314 = (MATCH) fieldWeight(text:channel in 
10), product of: 1.4142135 = tf(termFreq(text:channel)=2) 2.049822 = 
idf(docFreq=6, numDocs=20) 0.21875 = fieldNorm(field=text, doc=10)/str 
  str name=http://yahoo;0.5124555 = (MATCH) fieldWeight(text:channel in 0), 
product of: 1.0 = tf(termFreq(text:channel)=1) 2.049822 = idf(docFreq=6, 
numDocs=20) 0.25 = fieldNorm(field=text, doc=0)/str 
  str name=http://sharemarket;0.4483986 = (MATCH) fieldWeight(text:channel 
in 1), product of: 1.0 = tf(termFreq(text:channel)=1) 2.049822 = idf(docFreq=6, 
numDocs=20) 0.21875 = fieldNorm(field=text, doc=1)/str 
  str name=http://Altavista;0.4483986 = (MATCH) fieldWeight(text:channel in 
5), product of: 1.0 = tf(termFreq(text:channel)=1) 2.049822 = idf(docFreq=6, 
numDocs=20) 0.21875 = fieldNorm(field=text, doc=5)/str 
  /lst

What does the numeric terms denotes?.With this numeric value will i be able 
to i set preference for my search links?.If so how?.
 
Regards
Bhaskar









 
- On Thu, 10/1/09, bhaskar chandrasekar bas_s...@yahoo.co.in wrote:


From: bhaskar chandrasekar bas_s...@yahoo.co.in
Subject: Re: Ranking of search results
To: solr-user@lucene.apache.org
Date: Thursday, October 1, 2009, 7:34 PM








--- On Wed, 9/23/09, Amit Nithian anith...@gmail.com wrote:


Hi Amith,
 
Thanks for your reply.How do i set preference for the links , which should 
appear first,second in the search results.
Which configuration file in Solr needs to be modified to achieve the same?.
 
Regards
Bhaskar
From: Amit Nithian anith...@gmail.com
Subject: Re: Ranking of search results
To: solr-user@lucene.apache.org
Date: Wednesday, September 23, 2009, 11:33 AM


It depends on several things:1) The query handler that you are using
2) The fields that you are searching on and default fields specified

For the default handler, it will issue a query for the default field and
return results accordingly. To see what is going on  pass the
debugQuery=true to the end of the URL to see detailed output. If you are
using the DisMaxHandler (DisJoint Max) then you will have a qf, pf and bf
(query fields, phrase fields, boosting function). I would start looking at
http://wiki.apache.org/solr/DisMaxRequestHandler

http://wiki.apache.org/solr/DisMaxRequestHandler- Amit

On Wed, Sep 23, 2009 at 10:25 AM, bhaskar chandrasekar bas_s...@yahoo.co.in
 wrote:

 Hi,

 When i give a input string for search in Solr , it displays me the
 corresponding results for the given input string.

 How the results are ranked and displayed.On what basis the search results
 are displayed.
 Is there any algorithm followed for displaying the results with first
 result and so on.


 Regards
 Bhaskar

Re: Scoring for specific field queries

2009-10-08 Thread Noble Paul നോബിള്‍ नोब्ळ्

Hi Avlesh,

I can't seem to get the scores right.

I now have these types for the fields I'm targeting,

fieldType name=autoComplete class=solr.TextField
positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType
fieldType name=autoComplete2 class=solr.TextField
positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

My query is this,
q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0

What should I tweak from the above config and query?

Thanks,
Rih


On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote:

 I will have to pass on this and try your suggestion first. So, how does
 your suggestion (1 and 2) boost the my startswith query? Is it because of
 the n-gram filter?



 On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore 
 sandeep.tag...@gmail.comwrote:


 Yes it can be done but it needs some customization. Search for custom sort
 implementations/discussions.
 You can check...

 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
 .
 Let us know if you have any issues.

 Sandeep


 R. Tan wrote:
 
  This might work and I also have a single value field which makes it
  cleaner.
  Can sort be customized (with indexOf()) from the solr parameters alone?
 

 --
 View this message in context:
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Scoring for specific field queries

2009-10-08 Thread Avlesh Singh

Filters? I did not mean filters at all.
I am in a mad rush right now, but on the face of it your field definitions
look right.

This is what I asked for -
q=(autoComplete2:cha^10 autoComplete:cha)

Lemme know if this does not work for you.

Cheers
Avlesh

On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote:

 Hi Avlesh,

 I can't seem to get the scores right.

 I now have these types for the fields I'm targeting,

 fieldType name=autoComplete class=solr.TextField
 positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
 maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType
fieldType name=autoComplete2 class=solr.TextField
 positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
 maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

 My query is this,

 q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0

 What should I tweak from the above config and query?

 Thanks,
 Rih


 On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote:

  I will have to pass on this and try your suggestion first. So, how does
  your suggestion (1 and 2) boost the my startswith query? Is it because of
  the n-gram filter?
 
 
 
  On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.com
 wrote:
 
 
  Yes it can be done but it needs some customization. Search for custom
 sort
  implementations/discussions.
  You can check...
 
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
  .
  Let us know if you have any issues.
 
  Sandeep
 
 
  R. Tan wrote:
  
   This might work and I also have a single value field which makes it
   cleaner.
   Can sort be customized (with indexOf()) from the solr parameters
 alone?
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
  Sent from the Solr - User mailing list archive at Nabble.com.

Re: ISOLatin1AccentFilter before or after Snowball?

2009-10-08 Thread Koji Sekiguchi


In this particular case, I don't think one is better than the other...

In general, MappingCharFilter is more flexible than specific
TokenFilters, such as ISOLatin1AccentFilter.
For example, if you want your own character mapping rules,
you can add them to mapping.txt. It should be easier than
modifing TokenFilters as you don't need programming.

Koji

Chantal Ackermann wrote:

Now, you got me wondering - wich one should I like better?
I didn't even know there is an alternative. :-)

Chantal

Koji Sekiguchi schrieb:

No, ISOLatin1AccentFilterFactory is not deprecated.
You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt
or ISOLatin1AccentFilterFactory whichever you'd like.

Koji


Jay Hill wrote:

Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory
deprecated in favor of:
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/

in 1.4?

-Jay
http://www.lucidimagination.com

Re: how to rename a schema field, whose values are indexed already?

2009-10-08 Thread Shalin Shekhar Mangar

On Thu, Oct 8, 2009 at 4:32 PM, noor noo...@opentechindia.com wrote:

 Without re-indexing the data,
 how to rename, any one of the schema field ??


Solr does not support renaming without re-indexing.

Re-indexing is your best bet. If you cannot re-index for some reasons and if
you have all fields as stored then you can write a program to read all
documents and write new ones with the same values.

-- 
Regards,
Shalin Shekhar Mangar.

Re: solr reporting tool adapter

2009-10-08 Thread Rakhi Khatwani

Hi Lance,
thnx a tonwill look into BIRT
Regards,
Raakhi

On Thu, Oct 8, 2009 at 1:22 AM, Lance Norskog goks...@gmail.com wrote:

 The BIRT project can do what you want. It has a nice form creator and
 you can configure http XML input formats.

 It includes very complete Eclipse plugins and there is a book about it.


 On 10/7/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote:
  On Wed, Oct 7, 2009 at 2:51 PM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:
 
  we basically wanna generate PDF reports which contain, tag clouds, bar
  charts, pie charts etc.
 
 
  Faceting on a field will give you top terms and frequency information
 which
  can be used to create tag clouds. What do you want to plot on a bar
 chart?
 
  I don't know of a reporting tool which can hook into Solr for creating
 such
  things.
 
  --
  Regards,
  Shalin Shekhar Mangar.
 


 --
 Lance Norskog
 goks...@gmail.com

Re: Facet query pb

2009-10-08 Thread clico




clico wrote:
 
 
 
 clico wrote:
 
 That's not a pb
 I want to use that in order to drill down a tree
 
 
 Christian Zambrano wrote:
 
 Clico,
 
 Because you are doing a wildcard query, the token 'AMERICA' will not be 
 analyzed at all. This means that 'AMERICA*' will NOT match 'america'.
 
 On 10/07/2009 12:30 PM, Avlesh Singh wrote:
 I have no idea what pb mean but this is what you probably want -
 fq=(location_field:(NORTH AMERICA*))

 Cheers
 Avlesh

 On Wed, Oct 7, 2009 at 10:40 PM, clicocl...@mairie-marseille.fr 
 wrote:


 Hello
 I have a pb trying to retrieve a tree with facet use

 I 've got a field location_field
 Each doc in my index has a location_field

 Location field can be
 continent/country/city


 I have 2 queries:

 http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29:
 ok, retrieve docs

 http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*)
 : not ok


 I think with NORTH AMERICA I have a pb with the space caractere

 Could u help me



 --
 View this message in context:
 http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html
 Sent from the Solr - User mailing list archive at Nabble.com.


  

 
 
 
 
 
 I'm sorry, this syntax does not work anymore
 


When I try a debug mode here is the result

arr name=parsed_filter_queries
str+location_field:NORTH+location_field:AMERICA*/str
/arr

My location_field is a type String, containing 
NORTH AMERICA/NY/NYC

Thanks for helping me



-- 
View this message in context: 
http://www.nabble.com/Facet-query-pb-tp25790667p25802964.html
Sent from the Solr - User mailing list archive at Nabble.com.

issue in adding data to a multivalued field

2009-10-08 Thread Rakhi Khatwani

Hi,
  i have a small schema with some of the fields defined as:
field name=id type=string indexed=true stored=true
multiValued=false required=true/
 field name=content type=text indexed=true stored=true
multivalued=false /
 field name=author_name type=text indexed=true stored=false
multivalued=true/

where the field author_name is multivalued.
however in UI (schema browser), following r the details of author_name
field,  its nowhere mentioned tht its multivalued.
Field: author_name
Field Type: text

Properties: Indexed, Tokenized
when i try creating and adding a document into solr, i get an exception
ERROR_id1_multiple_values_encountered_for_non_multiValued_field_author_name_ninad_raakhi_goureya_sheetal
here's my code snippet:
  solrDoc17.addField(id, id1);
  solrDoc17.addField(content, SOLR);
  solrDoc17.addField(author_name, ninad);
  solrDoc17.addField(author_name, raakhi);
  solrDoc17.addField(author_name, goureya);
  solrDoc17.addField(author_name, sheetal);
  server.add(solrDoc17);
  server.commit();
ny pointers??
regards,
Raakhi

Re: How to retrieve the index of a string within a field?

2009-10-08 Thread Elaine Li

Sandeep,

When I submit query, i actually make sure the searched phrase is
wrapped with double quotes. When I do that, it will only return
sentences with 'get what you'. If it does not have double quotes, it
will return all the sentences as described in your email because
without double quotes, it is a 'get OR what OR you' query. I don't
know too much about the concepts behind search. I just make use of
whatever works for me. Do you think I am still ok using text as my
sentence field type?

If the return is 100 thousands of results, will Solrj's http call hung
up on it?

Thanks a lot.

Elaine

On Thu, Oct 8, 2009 at 1:31 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote:

Elaine,
The field type text contains tokenizer
class=solr.WhitespaceTokenizerFactory/ in its definition. So all the
sentences that are indexed / queried will be split in to words. So when you
search for 'get what you', you will get sentences containing get, what, you,
get what, get you, what you, get what you. So when you try to find the
indexOf of the keyword in that sentence (from results), you may not get it
all the times.

Solrj can give the results in one shot but it uses http call. You cant avoid
it. You don't need to query multiple times with Solrj. Query once, get the
results, store them in java beans, process it and display the results.

Regards,
Sandeep

Elaine Li wrote:

Sandeep, I do get results when I search for get what you, not 0 results.
What in my schema makes this difference?
I need to learn Solrj. I am currently using javascript as a client and
invoke http calls to get results to display in the browser. Can Solrj
get all the results at one short w/o the http call? I need to do some
postprocessing against all the results and then display the processed
data. Submitting multiple http queries and post-process after each
query does not seem to be the right way.

--
View this message in context:
http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Elaine Li

You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar *.xml
Or i split the file if it is too big.

Elaine

On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
pravin_ka...@persistent.co.in wrote:
 Hi,
 I am new to solr. I am able to index, search and update with small 
 size(around 500mb)
 But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
 heap exception.
 While investigation I found that post jar or post.sh load whole file in 
 memory.

 I use one work around with dividing small file in small files..and it's 
 working

 Is there any other way to post large file as above work around is not 
 feasible for 1 TB file

 Thanks
 -Pravin


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute or 
 use this message. If you have received this communication in error, please 
 notify the sender and delete all copies of this message. Persistent Systems 
 Ltd. does not accept any liability for virus infected mails.

Re: how to post(index) large file of 5 GB or greater than this

you can write a simple program which streams the file from the disk to
post it to Solr


On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li elaine.bing...@gmail.com wrote:
 You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar *.xml
 Or i split the file if it is too big.

 Elaine

 On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
 pravin_ka...@persistent.co.in wrote:
 Hi,
 I am new to solr. I am able to index, search and update with small 
 size(around 500mb)
 But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
 heap exception.
 While investigation I found that post jar or post.sh load whole file in 
 memory.

 I use one work around with dividing small file in small files..and it's 
 working

 Is there any other way to post large file as above work around is not 
 feasible for 1 TB file

 Thanks
 -Pravin


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute 
 or use this message. If you have received this communication in error, 
 please notify the sender and delete all copies of this message. Persistent 
 Systems Ltd. does not accept any liability for virus infected mails.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Walter Underwood

Are you are indexing multiple documents? If so, split them into  
multiple files.
A single XML file with all documents is not a good idea. Solr is  
designed to

use batches for indexing.

It will be extremely hard to index a 1TB XML file. I would guess that  
would need

a JVM heap of well over 1TB.

wunder

On Oct 8, 2009, at 6:56 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



you can write a simple program which streams the file from the disk to
post it to Solr


On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li elaine.bing...@gmail.com  
wrote:
You can increase the java heap size, e.g. java -Xms128m -Xmx8192m - 
jar *.xml

Or i split the file if it is too big.

Elaine

On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
pravin_ka...@persistent.co.in wrote:

Hi,
I am new to solr. I am able to index, search and update with small  
size(around 500mb)
But if I try to index file with 5 to 10 or more that (500mb) it  
gives memory heap exception.
While investigation I found that post jar or post.sh load whole  
file in memory.


I use one work around with dividing small file in small files..and  
it's working


Is there any other way to post large file as above work around is  
not feasible for 1 TB file


Thanks
-Pravin


DISCLAIMER
==
This e-mail may contain privileged and confidential information  
which is the property of Persistent Systems Ltd. It is intended  
only for the use of the individual or entity to which it is  
addressed. If you are not the intended recipient, you are not  
authorized to read, retain, copy, print, distribute or use this  
message. If you have received this communication in error, please  
notify the sender and delete all copies of this message.  
Persistent Systems Ltd. does not accept any liability for virus  
infected mails.








--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

correct syntax for boolean search

2009-10-08 Thread Elaine Li

Hi,

What is the correct syntax for the following boolean search from a field?

fieldname1:(word_a1 or word_b1)  (word_a2 or word_b2)  (word_a3 or
word_b3)  fieldname2:.

Thanks.

Elaine

Re: Default query parameter for one core

2009-10-08 Thread Michael

On Wed, Oct 7, 2009 at 1:46 PM, Michael solrco...@gmail.com wrote:
 Is there a way to not have the shards param at all for most cores, and for 
 core0 to specify it?

E.g. core0 requests always get a shards=foo appended, while other
cores don't have an shards param at all.

Or, barring that, is there a way to tell one core use this chunk of
XML for your defaults tag, and tell the other cores use this other
chunk of XML for your defaults tag?

Re: how to post(index) large file of 5 GB or greater than this

What is this huge file?  Solr XML? CSV?

Anyway, if it's a local file, you can get Solr to directly read/stream
it via stream.file
Examples in http://wiki.apache.org/solr/UpdateCSV
but it should work for any update format, not just CSV.

-Yonik
http://www.lucidimagination.com



On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
pravin_ka...@persistent.co.in wrote:
 Hi,
 I am new to solr. I am able to index, search and update with small 
 size(around 500mb)
 But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
 heap exception.
 While investigation I found that post jar or post.sh load whole file in 
 memory.

 I use one work around with dividing small file in small files..and it's 
 working

 Is there any other way to post large file as above work around is not 
 feasible for 1 TB file

 Thanks
 -Pravin


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute or 
 use this message. If you have received this communication in error, please 
 notify the sender and delete all copies of this message. Persistent Systems 
 Ltd. does not accept any liability for virus infected mails.

Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-08 Thread gdeconto


I did check the other posts, as well as whatever I could find on the net but
didnt find anything.

Has anyone encountered this type of issue, or is what I am doing (extending
QParserPlugin) that unusual??



gdeconto wrote:
 
 ...
 one thing I noticed is that if I append debugQuery=true to a query that
 includes the virtual function, I get a NullPointerException, likely
 because the debugging code looks at the query passed in and not the
 expanded query that my code generates and that gets used by solr for
 retrieving data.
 ...
 

-- 
View this message in context: 
http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25803277.html
Sent from the Solr - User mailing list archive at Nabble.com.

UTF-8 and latin accents

2009-10-08 Thread Claudio Martella

Hello list,

I'm trying to index documents with latin accents (italian documents). I
extract the text from .doc documents with Tika directly into .xml files.
If i open up the XML document with my Dashcode (i run mac os x) i can
see the characters correctly. my xml document is an xml document with the
?xml version=1.0 encoding=UTF-8?
adddoc
...
headers.

When i search and retrieve documents in solr the accented characters are
replaced by an '?'. What is the problem?
I guess the problem could be in (1) the schema (2) the xml document file
coding itself (i don't see the characters correctly if i open it up with
vim in terminal).

Any suggestions? thanks

-- 
Claudio Martella
Digital Technologies
Unit Research  Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of 
Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
process your personal data in order to fulfil contractual and fiscal 
obligations and also to send you information regarding our services and events. 
Your personal data are processed with and without electronic means and by 
respecting data subjects' rights, fundamental freedoms and dignity, 
particularly with regard to confidentiality, personal identity and the right to 
personal data protection. At any time and without formalities you can write an 
e-mail to priv...@tis.bz.it in order to object the processing of your personal 
data for the purpose of sending advertising materials and also to exercise the 
right to access personal data and other rights referred to in Section 7 of 
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, 
Siemens Street n. 19, Bolzano. You can find the complete information on the web 
site www.tis.bz.it.

Re: UTF-8 and latin accents

On Thu, Oct 8, 2009 at 12:48 PM, Claudio Martella
claudio.marte...@tis.bz.it wrote:
 I'm trying to index documents with latin accents (italian documents). I
 extract the text from .doc documents with Tika directly into .xml files.
 If i open up the XML document with my Dashcode (i run mac os x) i can
 see the characters correctly. my xml document is an xml document with the
 ?xml version=1.0 encoding=UTF-8?
 adddoc
 ...
 headers.

Maybe those documents aren't actually in UTF8.
Why don't you try Solr's example/exampledocs/utf8-example.xml

 When i search and retrieve documents in solr the accented characters are
 replaced by an '?'. What is the problem?
 I guess the problem could be in (1) the schema (2) the xml document file
 coding itself (i don't see the characters correctly if i open it up with
 vim in terminal).

in vim/gvim try
:set encoding=utf8

-Yonik
http://www.lucidimagination.com

Re: ISOLatin1AccentFilter before or after Snowball?

2009-10-08 Thread Claudio Martella

Hello,

i'm following the thread but i think it still hasn't been answered if
the isolatinfilter goes before or after the stemmer.

any direct answer?


Koji Sekiguchi wrote:
 In this particular case, I don't think one is better than the other...

 In general, MappingCharFilter is more flexible than specific
 TokenFilters, such as ISOLatin1AccentFilter.
 For example, if you want your own character mapping rules,
 you can add them to mapping.txt. It should be easier than
 modifing TokenFilters as you don't need programming.

 Koji

 Chantal Ackermann wrote:
 Now, you got me wondering - wich one should I like better?
 I didn't even know there is an alternative. :-)

 Chantal

 Koji Sekiguchi schrieb:
 No, ISOLatin1AccentFilterFactory is not deprecated.
 You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt
 or ISOLatin1AccentFilterFactory whichever you'd like.

 Koji


 Jay Hill wrote:
 Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory
 deprecated in favor of:
 charFilter class=solr.MappingCharFilterFactory
 mapping=mapping-ISOLatin1Accent.txt/

 in 1.4?

 -Jay
 http://www.lucidimagination.com







-- 
Claudio Martella
Digital Technologies
Unit Research  Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of 
Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
process your personal data in order to fulfil contractual and fiscal 
obligations and also to send you information regarding our services and events. 
Your personal data are processed with and without electronic means and by 
respecting data subjects' rights, fundamental freedoms and dignity, 
particularly with regard to confidentiality, personal identity and the right to 
personal data protection. At any time and without formalities you can write an 
e-mail to priv...@tis.bz.it in order to object the processing of your personal 
data for the purpose of sending advertising materials and also to exercise the 
right to access personal data and other rights referred to in Section 7 of 
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, 
Siemens Street n. 19, Bolzano. You can find the complete information on the web 
site www.tis.bz.it.

Re: correct syntax for boolean search

2009-10-08 Thread Avlesh Singh

q=+fieldname1:(+(word_a1 word_b1) +(word_a2 word_b2) +(word_a3 word_b3))
+fieldname2:...

Cheers
Avlesh

On Thu, Oct 8, 2009 at 7:40 PM, Elaine Li elaine.bing...@gmail.com wrote:

 Hi,

 What is the correct syntax for the following boolean search from a field?

 fieldname1:(word_a1 or word_b1)  (word_a2 or word_b2)  (word_a3 or
 word_b3)  fieldname2:.

 Thanks.

 Elaine

Re: how can I use debugQuery if I have extended QParserPlugin?

On Thu, Oct 8, 2009 at 12:14 PM, gdeconto
gerald.deco...@topproducer.com wrote:
 I did check the other posts, as well as whatever I could find on the net but
 didnt find anything.

 Has anyone encountered this type of issue, or is what I am doing (extending
 QParserPlugin) that unusual??


I think you need to provide some more information such as a stack
trace for the NPE, or a more elaborate description of what you think
the problem is with the debug component.
You said because the debugging code looks at the query passed in and
not the expanded query, but I don't understand that.  The debug
component is passed the actual Query object that the QParserPlugin
created.

-Yonik
http://www.lucidimagination.com





 gdeconto wrote:

 ...
 one thing I noticed is that if I append debugQuery=true to a query that
 includes the virtual function, I get a NullPointerException, likely
 because the debugging code looks at the query passed in and not the
 expanded query that my code generates and that gets used by solr for
 retrieving data.
 ...


 --
 View this message in context: 
 http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25803277.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: IndexWriter InfoStream in solrconfig not working

I can't get it to work either, so I reopened
https://issues.apache.org/jira/browse/SOLR-1145

-Yonik
http://www.lucidimagination.com

On Wed, Oct 7, 2009 at 1:45 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 I had the same problem. I'd be very interested to know how to get this 
 working...

 -Gio.

 -Original Message-
 From: Burton-West, Tom [mailto:tburt...@umich.edu]
 Sent: Wednesday, October 07, 2009 12:13 PM
 To: solr-user@lucene.apache.org
 Subject: IndexWriter InfoStream in solrconfig not working

 Hello,

 We are trying to debug an indexing/optimizing problem and have tried setting 
 the infoStream  file in solrconf.xml so that the SolrIndexWriter will write a 
 log file.  Here is our setting:

 !--
         To aid in advanced debugging, you may turn on IndexWriter debug 
 logging. Uncommenting this and setting to true
         will set the file that the underlying Lucene IndexWriter will write 
 its debug infostream to.
        --
    infoStream file=/tmp/LuceneIndexWriterDebug.logtrue/infoStream

 After making that change to solrconfig.xml, restarting Solr, we see a message 
 in the tomcat logs saying that the log is enabled:

 build-2_log.2009-10-06.txt:INFO: IndexWriter infoStream debug log is enabled: 
 /tmp/LuceneIndexWriterDebug.log

 However, if we then run an optimize we can't see any log file being written.

 I also looked at the patch for  
 http://issues.apache.org/jira/browse/SOLR-1145, but did not see a unit test 
 that I might try to run in our system.


 Do others have this logging working successfully ?
 Is there something else that needs to be set up?

 Tom

Re: IndexWriter InfoStream in solrconfig not working

OK, move the infoStream part in solrconfig.xml from indexDefaults into
mainIndex and it should work.

-Yonik
http://www.lucidimagination.com


On Thu, Oct 8, 2009 at 2:40 PM, Yonik Seeley
yonik.see...@lucidimagination.com wrote:
 I can't get it to work either, so I reopened
 https://issues.apache.org/jira/browse/SOLR-1145

 -Yonik
 http://www.lucidimagination.com

 On Wed, Oct 7, 2009 at 1:45 PM, Giovanni Fernandez-Kincade
 gfernandez-kinc...@capitaliq.com wrote:
 I had the same problem. I'd be very interested to know how to get this 
 working...

 -Gio.

 -Original Message-
 From: Burton-West, Tom [mailto:tburt...@umich.edu]
 Sent: Wednesday, October 07, 2009 12:13 PM
 To: solr-user@lucene.apache.org
 Subject: IndexWriter InfoStream in solrconfig not working

 Hello,

 We are trying to debug an indexing/optimizing problem and have tried setting 
 the infoStream  file in solrconf.xml so that the SolrIndexWriter will write 
 a log file.  Here is our setting:

 !--
         To aid in advanced debugging, you may turn on IndexWriter debug 
 logging. Uncommenting this and setting to true
         will set the file that the underlying Lucene IndexWriter will write 
 its debug infostream to.
        --
    infoStream file=/tmp/LuceneIndexWriterDebug.logtrue/infoStream

 After making that change to solrconfig.xml, restarting Solr, we see a 
 message in the tomcat logs saying that the log is enabled:

 build-2_log.2009-10-06.txt:INFO: IndexWriter infoStream debug log is 
 enabled: /tmp/LuceneIndexWriterDebug.log

 However, if we then run an optimize we can't see any log file being written.

 I also looked at the patch for  
 http://issues.apache.org/jira/browse/SOLR-1145, but did not see a unit test 
 that I might try to run in our system.


 Do others have this logging working successfully ?
 Is there something else that needs to be set up?

 Tom

releasing memory?

2009-10-08 Thread Ryan McKinley


Hello-

I have an application that can run in the background on a user Desktop  
-- it will go through phases of being used and not being used.  I want  
to be able to free as many system resources when not in use as possible.


Currently I have a timer that wants for 10 mins of inactivity and  
releases a bunch of memory (unrelated to lucene/solor).  Any  
suggestion on the best way to do this in lucene/solor?  perhaps reload  
a core?


thanks for any pointers
ryan

Re: Scoring for specific field queries

Hmm... I don't quite get the desired results. Those starting with cha are
now randomly ordered. Is there something wrong with the filters I applied?


On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com wrote:

 Filters? I did not mean filters at all.
 I am in a mad rush right now, but on the face of it your field definitions
 look right.

 This is what I asked for -
 q=(autoComplete2:cha^10 autoComplete:cha)

 Lemme know if this does not work for you.

 Cheers
 Avlesh

 On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote:

  Hi Avlesh,
 
  I can't seem to get the scores right.
 
  I now have these types for the fields I'm targeting,
 
  fieldType name=autoComplete class=solr.TextField
  positionIncrementGap=1
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.NGramFilterFactory minGramSize=1
  maxGramSize=20/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 fieldType name=autoComplete2 class=solr.TextField
  positionIncrementGap=1
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.NGramFilterFactory minGramSize=1
  maxGramSize=20/
   /analyzer
   analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 
  My query is this,
 
 
 q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0
 
  What should I tweak from the above config and query?
 
  Thanks,
  Rih
 
 
  On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote:
 
   I will have to pass on this and try your suggestion first. So, how does
   your suggestion (1 and 2) boost the my startswith query? Is it because
 of
   the n-gram filter?
  
  
  
   On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore 
 sandeep.tag...@gmail.com
  wrote:
  
  
   Yes it can be done but it needs some customization. Search for custom
  sort
   implementations/discussions.
   You can check...
  
  
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
   .
   Let us know if you have any issues.
  
   Sandeep
  
  
   R. Tan wrote:
   
This might work and I also have a single value field which makes it
cleaner.
Can sort be customized (with indexOf()) from the solr parameters
  alone?
   
  
   --
   View this message in context:
  
 
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
   Sent from the Solr - User mailing list archive at Nabble.com.

Re: delay while adding document to solr index

On Thu, Oct 8, 2009 at 1:58 AM, swapna_here swapna.here...@gmail.com wrote:
 i don't understand why my solr index increasing daily
 when i am adding and deleting the same number of documents daily

A delete is just a bit flip, and does not reclaim disk space immediately.
Deleted documents are squeezed out when segment merges happen
(including an optimize which merges all segments).
If you have large segments that documents are deleted from, those
segments may not be involved in a merge and hence the deleted docs can
hang around for quite some time.

-Yonik
http://www.lucidimagination.com




 i run org.apache.solr.client.solrj.SolrServer.optimize() manually four times
 a day

 is it not the right way to run optimize, if yes what is the procedure to run
 optimize?

 thanks in advance :)
 --
 View this message in context: 
 http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25798789.html
 Sent from the Solr - User mailing list archive at Nabble.com.

indexing frequently-changing fields

2009-10-08 Thread didier deshommes

I am using Solr to index data in a SQL database.  Most of the data
doesn't change after initial commit, except for a single boolean field
that indicates whether an item is flagged as 'needing attention'.  So
I have a need_attention field in the database that I update whenever a
user marks an item as needing attention in my UI.  The problem I have
is that I want to offer the ability to include need_attention in my
user's queries, but do not want to incur the expense of having to
reindex whenever this flag changes on an individual document.

I have thought about different solutions to this problem, including
using multi-core and having a smaller core for recently-marked items
that I am willing to do 'near-real-time' commits on.  Are there are
any common solutions to this problem, which I have to imagine is
common in this community?

Re: indexing frequently-changing fields

It's a bit round-about but you might be able to use ExternalFileField
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

The fieldType definition would look like
fieldType name=file keyField=id defVal=1 stored=false
indexed=false class=solr.ExternalFileField valType=float/

Then you can use frange to include/exclude certain values:

http://www.lucidimagination.com/blog/tag/frange/

-Yonik
http://www.lucidimagination.com


On Thu, Oct 8, 2009 at 4:59 PM, didier deshommes dfdes...@gmail.com wrote:
 I am using Solr to index data in a SQL database.  Most of the data
 doesn't change after initial commit, except for a single boolean field
 that indicates whether an item is flagged as 'needing attention'.  So
 I have a need_attention field in the database that I update whenever a
 user marks an item as needing attention in my UI.  The problem I have
 is that I want to offer the ability to include need_attention in my
 user's queries, but do not want to incur the expense of having to
 reindex whenever this flag changes on an individual document.

 I have thought about different solutions to this problem, including
 using multi-core and having a smaller core for recently-marked items
 that I am willing to do 'near-real-time' commits on.  Are there are
 any common solutions to this problem, which I have to imagine is
 common in this community?

RE: Problems with WordDelimiterFilterFactory

Here's the query and the error - 

Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization AND 
status_i:(2)) 
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
Oct 09 08:20:17  [error] Error on searching: 400 Status: 
org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
Civilization AND status_i:(2)) ': Encount

Bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com] 
Sent: Thursday, 8 October 2009 12:48 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au 
  wrote:

 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email: 
 bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au 
 
 Website: http://www.deakin.edu.au
 http://www.deakin.edu.au/Deakin University CRICOS Provider Code  
 00113B (Vic)

 Important Notice: The contents of this email are intended solely  
 for the named addressee and are confidential; any unauthorised use,  
 reproduction or storage of the contents is expressly prohibited. If  
 you have received this email in error, please delete it and any  
 attachments immediately and advise the sender by return email or  
 telephone.
 Deakin University does not warrant that this email and any  
 attachments are error or virus free

RE: Problems with WordDelimiterFilterFactory

Sorry, the last line was truncated -

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
'(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 
7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM 
... WILDTERM ... [ ... { ... NUMBER ...

-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
Sent: Friday, 9 October 2009 8:22 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Problems with WordDelimiterFilterFactory

Here's the query and the error - 

Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization AND 
status_i:(2)) 
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
Oct 09 08:20:17  [error] Error on searching: 400 Status: 
org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
Civilization AND status_i:(2)) ': Encount

Bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com] 
Sent: Thursday, 8 October 2009 12:48 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au 
  wrote:

 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email: 
 bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au 
 
 Website: http://www.deakin.edu.au
 http://www.deakin.edu.au/Deakin University CRICOS Provider Code  
 00113B (Vic)

 Important Notice: The contents of this email are intended solely  
 for the named addressee and are confidential; any unauthorised use,  
 reproduction or storage of the contents is expressly prohibited. If  
 you have received this email in error, please delete it and any

Re: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Patrick Jungermann

Hi Bern,

the problem is the character sequence --. A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
 Sorry, the last line was truncated -
 
 HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
 '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 
 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM 
 ... WILDTERM ... [ ... { ... NUMBER ...
 
 -Original Message-
 From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
 Sent: Friday, 9 October 2009 8:22 AM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: Problems with WordDelimiterFilterFactory
 
 Here's the query and the error - 
 
 Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization 
 AND status_i:(2)) 
 Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
 Oct 09 08:20:17  [error] Error on searching: 400 Status: 
 org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
 Civilization AND status_i:(2)) ': Encount
 
 Bern
 
 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com] 
 Sent: Thursday, 8 October 2009 12:48 PM
 To: solr-user@lucene.apache.org
 Cc: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory
 
 Bern,
 
 I am interested on the solr query. In other words, the query that your  
 system sends to solr.
 
 Thanks,
 
 
 Christian
 
 On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
 bernadette.hough...@deakin.edu.au 
   wrote:
 
 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email:

RE: Sorting by insertion time

2009-10-08 Thread Steven A Rowe

Hi Tarjei,

See https://issues.apache.org/jira/browse/SOLR-1478 - with trunk Solr (and 
soon, 1.4), you can use pseudo-field _docid_ for this purpose.

Steve

 -Original Message-
 From: tarjei [mailto:tar...@nu.no]
 Sent: Thursday, October 08, 2009 2:18 AM
 To: solr-user@lucene.apache.org
 Subject: Sorting by insertion time
 
 Hi,
 
 Quite often I want a set of documents ordered by the time they were
 inserted, i.e. give me the 5 latest items that matches query foo. I
 usually solve this by sorting on a date field.
 
 I had a chat with Eric Hatcher when he visited Javazone 2009 and he
 said
 that Solr places documents on disk in insertion order.
 
 This would make it possible for me to save a sorting step by not
 sorting
 by a specific field, but by insertion time in reverse.
 
 AFAIK Lucene knows how to do this, but which request parameters should
 I
 use in Solr?
 
 Kind regards,
 Tarjei
 
 
 --
 Tarjei Huse
 Mobil: 920 63 413

RE: Problems with WordDelimiterFilterFactory

Thanks for this, marklo; it is a *very* useful page.
bern

-Original Message-
From: marklo [mailto:mar...@pcmall.com] 
Sent: Thursday, 8 October 2009 1:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory


Use http://solr-url/solr/admin/analysis.jsp to see how your data is
indexed/queried

-- 
View this message in context: 
http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25797377.html
Sent from the Solr - User mailing list archive at Nabble.com.

[slightly off topic] Jetty and NIO

2009-10-08 Thread Grant Ingersoll

In the Solr example jetty.xml, there is the following setup and  
comments:

!-- Use this connector for many frequently idle connections
 and for threadless continuations.
Call name=addConnector
  Arg
  New class=org.mortbay.jetty.nio.SelectChannelConnector
Set name=portSystemProperty name=jetty.port  
default=8983//Set

Set name=maxIdleTime3/Set
Set name=Acceptors2/Set
Set name=confidentialPort8443/Set
  /New
  /Arg
/Call
   --

!-- Use this connector if NIO is not available. --
!-- This connector is currently being used for Solr because the
 nio.SelectChannelConnector showed poor performance under  
WindowsXP
 from a single client with non-persistent connections (35s vs  
~3min)

 to complete 10,000 requests)
--
Call name=addConnector
  Arg
  New class=org.mortbay.jetty.bio.SocketConnector
Set name=portSystemProperty name=jetty.port  
default=8983//Set

Set name=maxIdleTime5/Set
Set name=lowResourceMaxIdleTime1500/Set
  /New
  /Arg
/Call

So, if I'm on Centos 2.6 (64 bit), what connector should I be using?   
Based on the comments, I'm not sure the top one is the right thing  
either, but it also sounds like it is my only other choice.


The other thing I'm noticing is if I profile my app and I am  
retrieving something like 50 rows at a time, 30-60% of the time is  
spent in org.mortbay.jetty.bio.SocketConnector$Connection.fill().   I  
realize the answer may just be to get less results, but I was  
wondering if there are other tuning parameters that can make this more  
efficient b/c the 50 rows thing is a biz. reqt and I may not be able  
to get that changed.


Thanks,
Grant

RE: Problems with WordDelimiterFilterFactory

Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up 
the error, but still doesn't find the right record. I see from marklo's 
analysis page that solr is still parsing it with a hyphen. Changing this part 
of our schema.xml -

filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement= replace=all
/

To 

filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement=  replace=all
/

i.e. replacing non-alpha chars with a space, looks like it may handle that 
aspect. 

Regards
Bern

-Original Message-
From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com] 
Sent: Friday, 9 October 2009 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Hi Bern,

the problem is the character sequence --. A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
 Sorry, the last line was truncated -
 
 HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
 '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 
 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM 
 ... WILDTERM ... [ ... { ... NUMBER ...
 
 -Original Message-
 From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
 Sent: Friday, 9 October 2009 8:22 AM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: Problems with WordDelimiterFilterFactory
 
 Here's the query and the error - 
 
 Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization 
 AND status_i:(2)) 
 Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
 Oct 09 08:20:17  [error] Error on searching: 400 Status: 
 org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
 Civilization AND status_i:(2)) ': Encount
 
 Bern
 
 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com] 
 Sent: Thursday, 8 October 2009 12:48 PM
 To: solr-user@lucene.apache.org
 Cc: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory
 
 Bern,
 
 I am interested on the solr query. In other words, the query that your  
 system sends to solr.
 
 Thanks,
 
 
 Christian
 
 On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
 bernadette.hough...@deakin.edu.au 
   wrote:
 
 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/

Re: [slightly off topic] Jetty and NIO