Re: Scoring for specific field queries

2009-10-08 Thread R. Tan
I will try this out. How does 1 and 2 boost the my startswith query? Is it
because of the n-gram filter?


On Thu, Oct 8, 2009 at 1:29 PM, Avlesh Singh avl...@gmail.com wrote:

 You would need to boost your startswith matches artificially for the
 desired behavior.
 I would do it this way -

   1. Create a KeywordTokenized field with n-gram filter.
   2. Create a Whitespace tokenized field with n-gram flter.
   3. Search on both the fields, boost matches for #1 over #2.

 Hope this helps.

 Cheers
 Avlesh

 On Thu, Oct 8, 2009 at 10:30 AM, R. Tan tanrihae...@gmail.com wrote:

  Hi,
  How can I get wildcard search (e.g. cha*) to score documents based on the
  position of the keyword in a field? Closer (to the start) means higher
  score.
 
  For example, I have multiple documents with titles containing the word
  champion. Some of the document titles start with the word champion
 and
  some our entitled we are the champions. The ones that starts with the
  keyword needs to rank first or score higher. Is there a way to do this?
 I'm
  using this query for auto-suggest term feature where the keyword doesn't
  necessarily need to be the first word.
 
  Rihaed
 



Re: Scoring for specific field queries

2009-10-08 Thread R. Tan
This might work and I also have a single value field which makes it cleaner.
Can sort be customized (with indexOf()) from the solr parameters alone?

Thanks!


On Thu, Oct 8, 2009 at 1:40 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote:


 Hi Rihaed,
 I guess we don't need to depend on scores all the times.
 You can use custom sort to sort the results. Take a dynamicField, fill it
 with indexOf(keyword) value, sort the results by the field in ascending
 order. Then the records which contain the keyword at the earlier position
 will come first.

 Regards,
 Sandeep


 R. Tan wrote:
 
  Hi,
  How can I get wildcard search (e.g. cha*) to score documents based on the
  position of the keyword in a field? Closer (to the start) means higher
  score.
 

 --
 View this message in context:
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798657.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Scoring for specific field queries

2009-10-08 Thread Sandeep Tagore

Hi Avlesh,
Thanks for your attention to my post. 

   1. If the word computer occurs in multiple times in a document what
   would you do in that case? Is this dynamic field supposed to be
multivalued?
   I can't even imagine what would you do if the word computer occurs in
   multiple documents multiple times?
   = It doesn't matter how many times a word occurs in that document.
Consider its first occurrence and use it for sorting. The dynamic field
should not be multivalued. If the keyword occurs at the same position in
multiple documents then the document which is inserted first will come
first.
   2. Multivalued fields cannot be sorted upon.
   = Yes.. I agree.
   3. One needs to know the unique number of such keywords before
   implementing because you'll potentially end up creating those many
fields.
   = I didn't get this. Why one should know the unique number of keywords
before implementation. If we have the logic, it works for all the keywords.
Most of the people do the same in case of geographical sorting. They
calculate the distance and sort it before displaying it. They don't need to
worry about the distance which user requests for.

Please tell me your thoughts and correct me if I am wrong.

Thanks,
Sandeep
-- 
View this message in context: 
http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25798925.html
Sent from the Solr - User mailing list archive at Nabble.com.



Sorting by insertion time

2009-10-08 Thread tarjei

Hi,

Quite often I want a set of documents ordered by the time they were 
inserted, i.e. give me the 5 latest items that matches query foo. I 
usually solve this by sorting on a date field.


I had a chat with Eric Hatcher when he visited Javazone 2009 and he said 
that Solr places documents on disk in insertion order.


This would make it possible for me to save a sorting step by not sorting 
by a specific field, but by insertion time in reverse.


AFAIK Lucene knows how to do this, but which request parameters should I 
use in Solr?


Kind regards,
Tarjei


--
Tarjei Huse
Mobil: 920 63 413


Re: Scoring for specific field queries

2009-10-08 Thread Sandeep Tagore

Yes it can be done but it needs some customization. Search for custom sort
implementations/discussions.
You can check...
http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html.
Let us know if you have any issues.

Sandeep


R. Tan wrote:
 
 This might work and I also have a single value field which makes it
 cleaner.
 Can sort be customized (with indexOf()) from the solr parameters alone?
 

-- 
View this message in context: 
http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Facet query pb

2009-10-08 Thread clico



clico wrote:
 
 That's not a pb
 I want to use that in order to drill down a tree
 
 
 Christian Zambrano wrote:
 
 Clico,
 
 Because you are doing a wildcard query, the token 'AMERICA' will not be 
 analyzed at all. This means that 'AMERICA*' will NOT match 'america'.
 
 On 10/07/2009 12:30 PM, Avlesh Singh wrote:
 I have no idea what pb mean but this is what you probably want -
 fq=(location_field:(NORTH AMERICA*))

 Cheers
 Avlesh

 On Wed, Oct 7, 2009 at 10:40 PM, clicocl...@mairie-marseille.fr 
 wrote:


 Hello
 I have a pb trying to retrieve a tree with facet use

 I 've got a field location_field
 Each doc in my index has a location_field

 Location field can be
 continent/country/city


 I have 2 queries:

 http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29:
 ok, retrieve docs

 http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*)
 : not ok


 I think with NORTH AMERICA I have a pb with the space caractere

 Could u help me



 --
 View this message in context:
 http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html
 Sent from the Solr - User mailing list archive at Nabble.com.


  

 
 
 
 

I'm sorry, this syntax does not work anymore
-- 
View this message in context: 
http://www.nabble.com/Facet-query-pb-tp25790667p25799911.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Scoring for specific field queries

2009-10-08 Thread R. Tan
I will have to pass on this and try your suggestion first. So, how does your
suggestion (1 and 2) boost the my startswith query? Is it because of the
n-gram filter?


On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.comwrote:


 Yes it can be done but it needs some customization. Search for custom sort
 implementations/discussions.
 You can check...

 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
 .
 Let us know if you have any issues.

 Sandeep


 R. Tan wrote:
 
  This might work and I also have a single value field which makes it
  cleaner.
  Can sort be customized (with indexOf()) from the solr parameters alone?
 

 --
 View this message in context:
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: ISOLatin1AccentFilter before or after Snowball?

2009-10-08 Thread Chantal Ackermann

Now, you got me wondering - wich one should I like better?
I didn't even know there is an alternative. :-)

Chantal

Koji Sekiguchi schrieb:

No, ISOLatin1AccentFilterFactory is not deprecated.
You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt
or ISOLatin1AccentFilterFactory whichever you'd like.

Koji


Jay Hill wrote:

Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory
deprecated in favor of:
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/

in 1.4?

-Jay
http://www.lucidimagination.com





Re: how to rename a schema field, whose values are indexed already?

2009-10-08 Thread Sandeep Tagore

I guess you cant do it. I tried it before. I had a field with name 'KEYWORD'
and i changed it to 'keyword' and it didn't work. Everything else was normal
and I searched with 'KEYWORD' i got an exception saying undefined field and
I searched with 'keyword' , I got 0 results. It didn't work even after
optimizing. I re-indexed the data and it worked. 

Regards,
Sandeep


M.Noor wrote:
 
  how to rename a schema field, if its values are indexed already ??
 
-- 
View this message in context: 
http://www.nabble.com/how-to-rename-a-schema-field%2C-whose-values-are-indexed-already--tp25800631p25801695.html
Sent from the Solr - User mailing list archive at Nabble.com.



how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Pravin Karne
Hi,
I am new to solr. I am able to index, search and update with small size(around 
500mb)
But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
heap exception.
While investigation I found that post jar or post.sh load whole file in memory.

I use one work around with dividing small file in small files..and it's working

Is there any other way to post large file as above work around is not feasible 
for 1 TB file

Thanks
-Pravin


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: how to rename a schema field, whose values are indexed already?

2009-10-08 Thread noor

Without re-indexing the data,
how i rename, any one of the schema field ??

Sandeep Tagore wrote:

I guess you cant do it. I tried it before. I had a field with name 'KEYWORD'
and i changed it to 'keyword' and it didn't work. Everything else was normal
and I searched with 'KEYWORD' i got an exception saying undefined field and
I searched with 'keyword' , I got 0 results. It didn't work even after
optimizing. I re-indexed the data and it worked. 


Regards,
Sandeep


M.Noor wrote:
  

 how to rename a schema field, if its values are indexed already ??






Re: how to rename a schema field, whose values are indexed already?

2009-10-08 Thread noor

Without re-indexing the data,
how to rename, any one of the schema field ??

Sandeep Tagore wrote:

I guess you cant do it. I tried it before. I had a field with name 'KEYWORD'
and i changed it to 'keyword' and it didn't work. Everything else was normal
and I searched with 'KEYWORD' i got an exception saying undefined field and
I searched with 'keyword' , I got 0 results. It didn't work even after
optimizing. I re-indexed the data and it worked. 


Regards,
Sandeep


M.Noor wrote:
  

 how to rename a schema field, if its values are indexed already ??






Re: Ranking of search results

2009-10-08 Thread bhaskar chandrasekar
Hi Amith,
 
 
I tried with the options you gave and gave debug=true at the end of the URL.
I am getting output as 
 
 
lst name=debug
  str name=rawquerystringchannel/str 
  str name=querystringchannel/str 
  str name=parsedquerytext:channel/str 
  str name=parsedquery_toStringtext:channel/str 
- lst name=explain
  str name=http://hotmail;1.2682627 = (MATCH) fieldWeight(text:channel in 
3), product of: 2.828427 = tf(termFreq(text:channel)=8) 2.049822 = 
idf(docFreq=6, numDocs=20) 0.21875 = fieldNorm(field=text, doc=3)/str 
  str name=http://share;1.0026497 = (MATCH) fieldWeight(text:channel in 19), 
product of: 2.236068 = tf(termFreq(text:channel)=5) 2.049822 = idf(docFreq=6, 
numDocs=20) 0.21875 = fieldNorm(field=text, doc=19)/str 
  str name=http://metacreek;0.6341314 = (MATCH) fieldWeight(text:channel in 
10), product of: 1.4142135 = tf(termFreq(text:channel)=2) 2.049822 = 
idf(docFreq=6, numDocs=20) 0.21875 = fieldNorm(field=text, doc=10)/str 
  str name=http://yahoo;0.5124555 = (MATCH) fieldWeight(text:channel in 0), 
product of: 1.0 = tf(termFreq(text:channel)=1) 2.049822 = idf(docFreq=6, 
numDocs=20) 0.25 = fieldNorm(field=text, doc=0)/str 
  str name=http://sharemarket;0.4483986 = (MATCH) fieldWeight(text:channel 
in 1), product of: 1.0 = tf(termFreq(text:channel)=1) 2.049822 = idf(docFreq=6, 
numDocs=20) 0.21875 = fieldNorm(field=text, doc=1)/str 
  str name=http://Altavista;0.4483986 = (MATCH) fieldWeight(text:channel in 
5), product of: 1.0 = tf(termFreq(text:channel)=1) 2.049822 = idf(docFreq=6, 
numDocs=20) 0.21875 = fieldNorm(field=text, doc=5)/str 
  /lst

What does the numeric terms denotes?.With this numeric value will i be able 
to i set preference for my search links?.If so how?.
 
Regards
Bhaskar









 
- On Thu, 10/1/09, bhaskar chandrasekar bas_s...@yahoo.co.in wrote:


From: bhaskar chandrasekar bas_s...@yahoo.co.in
Subject: Re: Ranking of search results
To: solr-user@lucene.apache.org
Date: Thursday, October 1, 2009, 7:34 PM








--- On Wed, 9/23/09, Amit Nithian anith...@gmail.com wrote:


Hi Amith,
 
Thanks for your reply.How do i set preference for the links , which should 
appear first,second in the search results.
Which configuration file in Solr needs to be modified to achieve the same?.
 
Regards
Bhaskar
From: Amit Nithian anith...@gmail.com
Subject: Re: Ranking of search results
To: solr-user@lucene.apache.org
Date: Wednesday, September 23, 2009, 11:33 AM


It depends on several things:1) The query handler that you are using
2) The fields that you are searching on and default fields specified

For the default handler, it will issue a query for the default field and
return results accordingly. To see what is going on  pass the
debugQuery=true to the end of the URL to see detailed output. If you are
using the DisMaxHandler (DisJoint Max) then you will have a qf, pf and bf
(query fields, phrase fields, boosting function). I would start looking at
http://wiki.apache.org/solr/DisMaxRequestHandler

http://wiki.apache.org/solr/DisMaxRequestHandler- Amit

On Wed, Sep 23, 2009 at 10:25 AM, bhaskar chandrasekar bas_s...@yahoo.co.in
 wrote:

 Hi,

 When i give a input string for search in Solr , it displays me the
 corresponding results for the given input string.

 How the results are ranked and displayed.On what basis the search results
 are displayed.
 Is there any algorithm followed for displaying the results with first
 result and so on.


 Regards
 Bhaskar








  

Re: Scoring for specific field queries

2009-10-08 Thread R. Tan
Hi Avlesh,

I can't seem to get the scores right.

I now have these types for the fields I'm targeting,

fieldType name=autoComplete class=solr.TextField
positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType
fieldType name=autoComplete2 class=solr.TextField
positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

My query is this,
q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0

What should I tweak from the above config and query?

Thanks,
Rih


On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote:

 I will have to pass on this and try your suggestion first. So, how does
 your suggestion (1 and 2) boost the my startswith query? Is it because of
 the n-gram filter?



 On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore 
 sandeep.tag...@gmail.comwrote:


 Yes it can be done but it needs some customization. Search for custom sort
 implementations/discussions.
 You can check...

 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
 .
 Let us know if you have any issues.

 Sandeep


 R. Tan wrote:
 
  This might work and I also have a single value field which makes it
  cleaner.
  Can sort be customized (with indexOf()) from the solr parameters alone?
 

 --
 View this message in context:
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
 Sent from the Solr - User mailing list archive at Nabble.com.





Re: Scoring for specific field queries

2009-10-08 Thread Avlesh Singh
Filters? I did not mean filters at all.
I am in a mad rush right now, but on the face of it your field definitions
look right.

This is what I asked for -
q=(autoComplete2:cha^10 autoComplete:cha)

Lemme know if this does not work for you.

Cheers
Avlesh

On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote:

 Hi Avlesh,

 I can't seem to get the scores right.

 I now have these types for the fields I'm targeting,

 fieldType name=autoComplete class=solr.TextField
 positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
 maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType
fieldType name=autoComplete2 class=solr.TextField
 positionIncrementGap=1
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.NGramFilterFactory minGramSize=1
 maxGramSize=20/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

 My query is this,

 q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0

 What should I tweak from the above config and query?

 Thanks,
 Rih


 On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote:

  I will have to pass on this and try your suggestion first. So, how does
  your suggestion (1 and 2) boost the my startswith query? Is it because of
  the n-gram filter?
 
 
 
  On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore sandeep.tag...@gmail.com
 wrote:
 
 
  Yes it can be done but it needs some customization. Search for custom
 sort
  implementations/discussions.
  You can check...
 
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
  .
  Let us know if you have any issues.
 
  Sandeep
 
 
  R. Tan wrote:
  
   This might work and I also have a single value field which makes it
   cleaner.
   Can sort be customized (with indexOf()) from the solr parameters
 alone?
  
 
  --
  View this message in context:
 
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 



Re: ISOLatin1AccentFilter before or after Snowball?

2009-10-08 Thread Koji Sekiguchi

In this particular case, I don't think one is better than the other...

In general, MappingCharFilter is more flexible than specific
TokenFilters, such as ISOLatin1AccentFilter.
For example, if you want your own character mapping rules,
you can add them to mapping.txt. It should be easier than
modifing TokenFilters as you don't need programming.

Koji

Chantal Ackermann wrote:

Now, you got me wondering - wich one should I like better?
I didn't even know there is an alternative. :-)

Chantal

Koji Sekiguchi schrieb:

No, ISOLatin1AccentFilterFactory is not deprecated.
You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt
or ISOLatin1AccentFilterFactory whichever you'd like.

Koji


Jay Hill wrote:

Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory
deprecated in favor of:
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/

in 1.4?

-Jay
http://www.lucidimagination.com









Re: how to rename a schema field, whose values are indexed already?

2009-10-08 Thread Shalin Shekhar Mangar
On Thu, Oct 8, 2009 at 4:32 PM, noor noo...@opentechindia.com wrote:

 Without re-indexing the data,
 how to rename, any one of the schema field ??


Solr does not support renaming without re-indexing.

Re-indexing is your best bet. If you cannot re-index for some reasons and if
you have all fields as stored then you can write a program to read all
documents and write new ones with the same values.

-- 
Regards,
Shalin Shekhar Mangar.


Re: solr reporting tool adapter

2009-10-08 Thread Rakhi Khatwani
Hi Lance,
thnx a tonwill look into BIRT
Regards,
Raakhi

On Thu, Oct 8, 2009 at 1:22 AM, Lance Norskog goks...@gmail.com wrote:

 The BIRT project can do what you want. It has a nice form creator and
 you can configure http XML input formats.

 It includes very complete Eclipse plugins and there is a book about it.


 On 10/7/09, Shalin Shekhar Mangar shalinman...@gmail.com wrote:
  On Wed, Oct 7, 2009 at 2:51 PM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:
 
  we basically wanna generate PDF reports which contain, tag clouds, bar
  charts, pie charts etc.
 
 
  Faceting on a field will give you top terms and frequency information
 which
  can be used to create tag clouds. What do you want to plot on a bar
 chart?
 
  I don't know of a reporting tool which can hook into Solr for creating
 such
  things.
 
  --
  Regards,
  Shalin Shekhar Mangar.
 


 --
 Lance Norskog
 goks...@gmail.com



Re: Facet query pb

2009-10-08 Thread clico



clico wrote:
 
 
 
 clico wrote:
 
 That's not a pb
 I want to use that in order to drill down a tree
 
 
 Christian Zambrano wrote:
 
 Clico,
 
 Because you are doing a wildcard query, the token 'AMERICA' will not be 
 analyzed at all. This means that 'AMERICA*' will NOT match 'america'.
 
 On 10/07/2009 12:30 PM, Avlesh Singh wrote:
 I have no idea what pb mean but this is what you probably want -
 fq=(location_field:(NORTH AMERICA*))

 Cheers
 Avlesh

 On Wed, Oct 7, 2009 at 10:40 PM, clicocl...@mairie-marseille.fr 
 wrote:


 Hello
 I have a pb trying to retrieve a tree with facet use

 I 've got a field location_field
 Each doc in my index has a location_field

 Location field can be
 continent/country/city


 I have 2 queries:

 http://server/solr//select?fq=(location_field:NORTH*)http://server/solr//select?fq=%28location_field:NORTH*%29:
 ok, retrieve docs

 http://server/solr//select?fq=(location_field:NORTHhttp://server/solr//select?fq=%28location_field:NORTHAMERICA*)
 : not ok


 I think with NORTH AMERICA I have a pb with the space caractere

 Could u help me



 --
 View this message in context:
 http://www.nabble.com/Facet-query-pb-tp25790667p25790667.html
 Sent from the Solr - User mailing list archive at Nabble.com.


  

 
 
 
 
 
 I'm sorry, this syntax does not work anymore
 


When I try a debug mode here is the result

arr name=parsed_filter_queries
str+location_field:NORTH+location_field:AMERICA*/str
/arr

My location_field is a type String, containing 
NORTH AMERICA/NY/NYC

Thanks for helping me



-- 
View this message in context: 
http://www.nabble.com/Facet-query-pb-tp25790667p25802964.html
Sent from the Solr - User mailing list archive at Nabble.com.



issue in adding data to a multivalued field

2009-10-08 Thread Rakhi Khatwani
Hi,
  i have a small schema with some of the fields defined as:
field name=id type=string indexed=true stored=true
multiValued=false required=true/
 field name=content type=text indexed=true stored=true
multivalued=false /
 field name=author_name type=text indexed=true stored=false
multivalued=true/

where the field author_name is multivalued.
however in UI (schema browser), following r the details of author_name
field,  its nowhere mentioned tht its multivalued.
Field: author_name
Field Type: text

Properties: Indexed, Tokenized
when i try creating and adding a document into solr, i get an exception
ERROR_id1_multiple_values_encountered_for_non_multiValued_field_author_name_ninad_raakhi_goureya_sheetal
here's my code snippet:
  solrDoc17.addField(id, id1);
  solrDoc17.addField(content, SOLR);
  solrDoc17.addField(author_name, ninad);
  solrDoc17.addField(author_name, raakhi);
  solrDoc17.addField(author_name, goureya);
  solrDoc17.addField(author_name, sheetal);
  server.add(solrDoc17);
  server.commit();
ny pointers??
regards,
Raakhi


Re: How to retrieve the index of a string within a field?

2009-10-08 Thread Elaine Li
Sandeep,

When I submit query, i actually make sure the searched phrase is
wrapped with double quotes. When I do that, it will only return
sentences with 'get what you'. If it does not have double quotes, it
will return all the sentences as described in your email because
without double quotes, it is a 'get OR what OR you' query. I don't
know too much about the concepts behind search. I just make use of
whatever works for me. Do you think I am still ok using text as my
sentence field type?

If the return is 100 thousands of results, will Solrj's http call hung
up on it?

Thanks a lot.

Elaine

On Thu, Oct 8, 2009 at 1:31 AM, Sandeep Tagore sandeep.tag...@gmail.com wrote:

 Elaine,
 The field type text contains tokenizer
 class=solr.WhitespaceTokenizerFactory/ in its definition. So all the
 sentences that are indexed / queried will be split in to words. So when you
 search for 'get what you', you will get sentences containing get, what, you,
 get what, get you, what you, get what you. So when you try to find the
 indexOf of the keyword in that sentence (from results), you may not get it
 all the times.

 Solrj can give the results in one shot but it uses http call. You cant avoid
 it. You don't need to query multiple times with Solrj. Query once, get the
 results, store them in java beans, process it and display the results.

 Regards,
 Sandeep


 Elaine Li wrote:

 Sandeep, I do get results when I search for get what you, not 0 results.
 What in my schema makes this difference?
 I need to learn Solrj. I am currently using javascript as a client and
 invoke http calls to get results to display in the browser. Can Solrj
 get all the results at one short w/o the http call? I need to do some
 postprocessing against all the results and then display the processed
 data. Submitting multiple http queries and post-process after each
 query does not seem to be the right way.

 --
 View this message in context: 
 http://www.nabble.com/How-to-retrieve-the-index-of-a-string-within-a-field--tp25771821p25798586.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Elaine Li
You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar *.xml
Or i split the file if it is too big.

Elaine

On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
pravin_ka...@persistent.co.in wrote:
 Hi,
 I am new to solr. I am able to index, search and update with small 
 size(around 500mb)
 But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
 heap exception.
 While investigation I found that post jar or post.sh load whole file in 
 memory.

 I use one work around with dividing small file in small files..and it's 
 working

 Is there any other way to post large file as above work around is not 
 feasible for 1 TB file

 Thanks
 -Pravin


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute or 
 use this message. If you have received this communication in error, please 
 notify the sender and delete all copies of this message. Persistent Systems 
 Ltd. does not accept any liability for virus infected mails.



Re: how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
you can write a simple program which streams the file from the disk to
post it to Solr


On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li elaine.bing...@gmail.com wrote:
 You can increase the java heap size, e.g. java -Xms128m -Xmx8192m -jar *.xml
 Or i split the file if it is too big.

 Elaine

 On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
 pravin_ka...@persistent.co.in wrote:
 Hi,
 I am new to solr. I am able to index, search and update with small 
 size(around 500mb)
 But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
 heap exception.
 While investigation I found that post jar or post.sh load whole file in 
 memory.

 I use one work around with dividing small file in small files..and it's 
 working

 Is there any other way to post large file as above work around is not 
 feasible for 1 TB file

 Thanks
 -Pravin


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute 
 or use this message. If you have received this communication in error, 
 please notify the sender and delete all copies of this message. Persistent 
 Systems Ltd. does not accept any liability for virus infected mails.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Walter Underwood
Are you are indexing multiple documents? If so, split them into  
multiple files.
A single XML file with all documents is not a good idea. Solr is  
designed to

use batches for indexing.

It will be extremely hard to index a 1TB XML file. I would guess that  
would need

a JVM heap of well over 1TB.

wunder

On Oct 8, 2009, at 6:56 AM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



you can write a simple program which streams the file from the disk to
post it to Solr


On Thu, Oct 8, 2009 at 7:10 PM, Elaine Li elaine.bing...@gmail.com  
wrote:
You can increase the java heap size, e.g. java -Xms128m -Xmx8192m - 
jar *.xml

Or i split the file if it is too big.

Elaine

On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
pravin_ka...@persistent.co.in wrote:

Hi,
I am new to solr. I am able to index, search and update with small  
size(around 500mb)
But if I try to index file with 5 to 10 or more that (500mb) it  
gives memory heap exception.
While investigation I found that post jar or post.sh load whole  
file in memory.


I use one work around with dividing small file in small files..and  
it's working


Is there any other way to post large file as above work around is  
not feasible for 1 TB file


Thanks
-Pravin


DISCLAIMER
==
This e-mail may contain privileged and confidential information  
which is the property of Persistent Systems Ltd. It is intended  
only for the use of the individual or entity to which it is  
addressed. If you are not the intended recipient, you are not  
authorized to read, retain, copy, print, distribute or use this  
message. If you have received this communication in error, please  
notify the sender and delete all copies of this message.  
Persistent Systems Ltd. does not accept any liability for virus  
infected mails.








--
-
Noble Paul | Principal Engineer| AOL | http://aol.com





correct syntax for boolean search

2009-10-08 Thread Elaine Li
Hi,

What is the correct syntax for the following boolean search from a field?

fieldname1:(word_a1 or word_b1)  (word_a2 or word_b2)  (word_a3 or
word_b3)  fieldname2:.

Thanks.

Elaine


Re: Default query parameter for one core

2009-10-08 Thread Michael
On Wed, Oct 7, 2009 at 1:46 PM, Michael solrco...@gmail.com wrote:
 Is there a way to not have the shards param at all for most cores, and for 
 core0 to specify it?

E.g. core0 requests always get a shards=foo appended, while other
cores don't have an shards param at all.

Or, barring that, is there a way to tell one core use this chunk of
XML for your defaults tag, and tell the other cores use this other
chunk of XML for your defaults tag?


Re: how to post(index) large file of 5 GB or greater than this

2009-10-08 Thread Yonik Seeley
What is this huge file?  Solr XML? CSV?

Anyway, if it's a local file, you can get Solr to directly read/stream
it via stream.file
Examples in http://wiki.apache.org/solr/UpdateCSV
but it should work for any update format, not just CSV.

-Yonik
http://www.lucidimagination.com



On Thu, Oct 8, 2009 at 6:47 AM, Pravin Karne
pravin_ka...@persistent.co.in wrote:
 Hi,
 I am new to solr. I am able to index, search and update with small 
 size(around 500mb)
 But if I try to index file with 5 to 10 or more that (500mb) it gives memory 
 heap exception.
 While investigation I found that post jar or post.sh load whole file in 
 memory.

 I use one work around with dividing small file in small files..and it's 
 working

 Is there any other way to post large file as above work around is not 
 feasible for 1 TB file

 Thanks
 -Pravin


 DISCLAIMER
 ==
 This e-mail may contain privileged and confidential information which is the 
 property of Persistent Systems Ltd. It is intended only for the use of the 
 individual or entity to which it is addressed. If you are not the intended 
 recipient, you are not authorized to read, retain, copy, print, distribute or 
 use this message. If you have received this communication in error, please 
 notify the sender and delete all copies of this message. Persistent Systems 
 Ltd. does not accept any liability for virus infected mails.



Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-08 Thread gdeconto

I did check the other posts, as well as whatever I could find on the net but
didnt find anything.

Has anyone encountered this type of issue, or is what I am doing (extending
QParserPlugin) that unusual??



gdeconto wrote:
 
 ...
 one thing I noticed is that if I append debugQuery=true to a query that
 includes the virtual function, I get a NullPointerException, likely
 because the debugging code looks at the query passed in and not the
 expanded query that my code generates and that gets used by solr for
 retrieving data.
 ...
 

-- 
View this message in context: 
http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25803277.html
Sent from the Solr - User mailing list archive at Nabble.com.



UTF-8 and latin accents

2009-10-08 Thread Claudio Martella
Hello list,

I'm trying to index documents with latin accents (italian documents). I
extract the text from .doc documents with Tika directly into .xml files.
If i open up the XML document with my Dashcode (i run mac os x) i can
see the characters correctly. my xml document is an xml document with the
?xml version=1.0 encoding=UTF-8?
adddoc
...
headers.

When i search and retrieve documents in solr the accented characters are
replaced by an '?'. What is the problem?
I guess the problem could be in (1) the schema (2) the xml document file
coding itself (i don't see the characters correctly if i open it up with
vim in terminal).

Any suggestions? thanks

-- 
Claudio Martella
Digital Technologies
Unit Research  Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of 
Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
process your personal data in order to fulfil contractual and fiscal 
obligations and also to send you information regarding our services and events. 
Your personal data are processed with and without electronic means and by 
respecting data subjects' rights, fundamental freedoms and dignity, 
particularly with regard to confidentiality, personal identity and the right to 
personal data protection. At any time and without formalities you can write an 
e-mail to priv...@tis.bz.it in order to object the processing of your personal 
data for the purpose of sending advertising materials and also to exercise the 
right to access personal data and other rights referred to in Section 7 of 
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, 
Siemens Street n. 19, Bolzano. You can find the complete information on the web 
site www.tis.bz.it.




Re: UTF-8 and latin accents

2009-10-08 Thread Yonik Seeley
On Thu, Oct 8, 2009 at 12:48 PM, Claudio Martella
claudio.marte...@tis.bz.it wrote:
 I'm trying to index documents with latin accents (italian documents). I
 extract the text from .doc documents with Tika directly into .xml files.
 If i open up the XML document with my Dashcode (i run mac os x) i can
 see the characters correctly. my xml document is an xml document with the
 ?xml version=1.0 encoding=UTF-8?
 adddoc
 ...
 headers.

Maybe those documents aren't actually in UTF8.
Why don't you try Solr's example/exampledocs/utf8-example.xml

 When i search and retrieve documents in solr the accented characters are
 replaced by an '?'. What is the problem?
 I guess the problem could be in (1) the schema (2) the xml document file
 coding itself (i don't see the characters correctly if i open it up with
 vim in terminal).

in vim/gvim try
:set encoding=utf8

-Yonik
http://www.lucidimagination.com


Re: ISOLatin1AccentFilter before or after Snowball?

2009-10-08 Thread Claudio Martella
Hello,

i'm following the thread but i think it still hasn't been answered if
the isolatinfilter goes before or after the stemmer.

any direct answer?


Koji Sekiguchi wrote:
 In this particular case, I don't think one is better than the other...

 In general, MappingCharFilter is more flexible than specific
 TokenFilters, such as ISOLatin1AccentFilter.
 For example, if you want your own character mapping rules,
 you can add them to mapping.txt. It should be easier than
 modifing TokenFilters as you don't need programming.

 Koji

 Chantal Ackermann wrote:
 Now, you got me wondering - wich one should I like better?
 I didn't even know there is an alternative. :-)

 Chantal

 Koji Sekiguchi schrieb:
 No, ISOLatin1AccentFilterFactory is not deprecated.
 You can use either MappingCharFilterFactory+mapping-ISOLatin1Accent.txt
 or ISOLatin1AccentFilterFactory whichever you'd like.

 Koji


 Jay Hill wrote:
 Correct me if I'm wrong, but wasn't the ISOLatin1AccentFilterFactory
 deprecated in favor of:
 charFilter class=solr.MappingCharFilterFactory
 mapping=mapping-ISOLatin1Accent.txt/

 in 1.4?

 -Jay
 http://www.lucidimagination.com







-- 
Claudio Martella
Digital Technologies
Unit Research  Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of 
Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
process your personal data in order to fulfil contractual and fiscal 
obligations and also to send you information regarding our services and events. 
Your personal data are processed with and without electronic means and by 
respecting data subjects' rights, fundamental freedoms and dignity, 
particularly with regard to confidentiality, personal identity and the right to 
personal data protection. At any time and without formalities you can write an 
e-mail to priv...@tis.bz.it in order to object the processing of your personal 
data for the purpose of sending advertising materials and also to exercise the 
right to access personal data and other rights referred to in Section 7 of 
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, 
Siemens Street n. 19, Bolzano. You can find the complete information on the web 
site www.tis.bz.it.




Re: correct syntax for boolean search

2009-10-08 Thread Avlesh Singh
q=+fieldname1:(+(word_a1 word_b1) +(word_a2 word_b2) +(word_a3 word_b3))
+fieldname2:...

Cheers
Avlesh

On Thu, Oct 8, 2009 at 7:40 PM, Elaine Li elaine.bing...@gmail.com wrote:

 Hi,

 What is the correct syntax for the following boolean search from a field?

 fieldname1:(word_a1 or word_b1)  (word_a2 or word_b2)  (word_a3 or
 word_b3)  fieldname2:.

 Thanks.

 Elaine



Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-08 Thread Yonik Seeley
On Thu, Oct 8, 2009 at 12:14 PM, gdeconto
gerald.deco...@topproducer.com wrote:
 I did check the other posts, as well as whatever I could find on the net but
 didnt find anything.

 Has anyone encountered this type of issue, or is what I am doing (extending
 QParserPlugin) that unusual??


I think you need to provide some more information such as a stack
trace for the NPE, or a more elaborate description of what you think
the problem is with the debug component.
You said because the debugging code looks at the query passed in and
not the expanded query, but I don't understand that.  The debug
component is passed the actual Query object that the QParserPlugin
created.

-Yonik
http://www.lucidimagination.com





 gdeconto wrote:

 ...
 one thing I noticed is that if I append debugQuery=true to a query that
 includes the virtual function, I get a NullPointerException, likely
 because the debugging code looks at the query passed in and not the
 expanded query that my code generates and that gets used by solr for
 retrieving data.
 ...


 --
 View this message in context: 
 http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25803277.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: IndexWriter InfoStream in solrconfig not working

2009-10-08 Thread Yonik Seeley
I can't get it to work either, so I reopened
https://issues.apache.org/jira/browse/SOLR-1145

-Yonik
http://www.lucidimagination.com

On Wed, Oct 7, 2009 at 1:45 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
 I had the same problem. I'd be very interested to know how to get this 
 working...

 -Gio.

 -Original Message-
 From: Burton-West, Tom [mailto:tburt...@umich.edu]
 Sent: Wednesday, October 07, 2009 12:13 PM
 To: solr-user@lucene.apache.org
 Subject: IndexWriter InfoStream in solrconfig not working

 Hello,

 We are trying to debug an indexing/optimizing problem and have tried setting 
 the infoStream  file in solrconf.xml so that the SolrIndexWriter will write a 
 log file.  Here is our setting:

 !--
         To aid in advanced debugging, you may turn on IndexWriter debug 
 logging. Uncommenting this and setting to true
         will set the file that the underlying Lucene IndexWriter will write 
 its debug infostream to.
        --
    infoStream file=/tmp/LuceneIndexWriterDebug.logtrue/infoStream

 After making that change to solrconfig.xml, restarting Solr, we see a message 
 in the tomcat logs saying that the log is enabled:

 build-2_log.2009-10-06.txt:INFO: IndexWriter infoStream debug log is enabled: 
 /tmp/LuceneIndexWriterDebug.log

 However, if we then run an optimize we can't see any log file being written.

 I also looked at the patch for  
 http://issues.apache.org/jira/browse/SOLR-1145, but did not see a unit test 
 that I might try to run in our system.


 Do others have this logging working successfully ?
 Is there something else that needs to be set up?

 Tom




Re: IndexWriter InfoStream in solrconfig not working

2009-10-08 Thread Yonik Seeley
OK, move the infoStream part in solrconfig.xml from indexDefaults into
mainIndex and it should work.

-Yonik
http://www.lucidimagination.com


On Thu, Oct 8, 2009 at 2:40 PM, Yonik Seeley
yonik.see...@lucidimagination.com wrote:
 I can't get it to work either, so I reopened
 https://issues.apache.org/jira/browse/SOLR-1145

 -Yonik
 http://www.lucidimagination.com

 On Wed, Oct 7, 2009 at 1:45 PM, Giovanni Fernandez-Kincade
 gfernandez-kinc...@capitaliq.com wrote:
 I had the same problem. I'd be very interested to know how to get this 
 working...

 -Gio.

 -Original Message-
 From: Burton-West, Tom [mailto:tburt...@umich.edu]
 Sent: Wednesday, October 07, 2009 12:13 PM
 To: solr-user@lucene.apache.org
 Subject: IndexWriter InfoStream in solrconfig not working

 Hello,

 We are trying to debug an indexing/optimizing problem and have tried setting 
 the infoStream  file in solrconf.xml so that the SolrIndexWriter will write 
 a log file.  Here is our setting:

 !--
         To aid in advanced debugging, you may turn on IndexWriter debug 
 logging. Uncommenting this and setting to true
         will set the file that the underlying Lucene IndexWriter will write 
 its debug infostream to.
        --
    infoStream file=/tmp/LuceneIndexWriterDebug.logtrue/infoStream

 After making that change to solrconfig.xml, restarting Solr, we see a 
 message in the tomcat logs saying that the log is enabled:

 build-2_log.2009-10-06.txt:INFO: IndexWriter infoStream debug log is 
 enabled: /tmp/LuceneIndexWriterDebug.log

 However, if we then run an optimize we can't see any log file being written.

 I also looked at the patch for  
 http://issues.apache.org/jira/browse/SOLR-1145, but did not see a unit test 
 that I might try to run in our system.


 Do others have this logging working successfully ?
 Is there something else that needs to be set up?

 Tom





releasing memory?

2009-10-08 Thread Ryan McKinley

Hello-

I have an application that can run in the background on a user Desktop  
-- it will go through phases of being used and not being used.  I want  
to be able to free as many system resources when not in use as possible.


Currently I have a timer that wants for 10 mins of inactivity and  
releases a bunch of memory (unrelated to lucene/solor).  Any  
suggestion on the best way to do this in lucene/solor?  perhaps reload  
a core?


thanks for any pointers
ryan


Re: Scoring for specific field queries

2009-10-08 Thread R. Tan
Hmm... I don't quite get the desired results. Those starting with cha are
now randomly ordered. Is there something wrong with the filters I applied?


On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com wrote:

 Filters? I did not mean filters at all.
 I am in a mad rush right now, but on the face of it your field definitions
 look right.

 This is what I asked for -
 q=(autoComplete2:cha^10 autoComplete:cha)

 Lemme know if this does not work for you.

 Cheers
 Avlesh

 On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote:

  Hi Avlesh,
 
  I can't seem to get the scores right.
 
  I now have these types for the fields I'm targeting,
 
  fieldType name=autoComplete class=solr.TextField
  positionIncrementGap=1
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.NGramFilterFactory minGramSize=1
  maxGramSize=20/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 fieldType name=autoComplete2 class=solr.TextField
  positionIncrementGap=1
   analyzer type=index
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
 filter class=solr.NGramFilterFactory minGramSize=1
  maxGramSize=20/
   /analyzer
   analyzer type=query
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType
 
  My query is this,
 
 
 q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0
 
  What should I tweak from the above config and query?
 
  Thanks,
  Rih
 
 
  On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote:
 
   I will have to pass on this and try your suggestion first. So, how does
   your suggestion (1 and 2) boost the my startswith query? Is it because
 of
   the n-gram filter?
  
  
  
   On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore 
 sandeep.tag...@gmail.com
  wrote:
  
  
   Yes it can be done but it needs some customization. Search for custom
  sort
   implementations/discussions.
   You can check...
  
  
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
   .
   Let us know if you have any issues.
  
   Sandeep
  
  
   R. Tan wrote:
   
This might work and I also have a single value field which makes it
cleaner.
Can sort be customized (with indexOf()) from the solr parameters
  alone?
   
  
   --
   View this message in context:
  
 
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  
  
  
 



Re: delay while adding document to solr index

2009-10-08 Thread Yonik Seeley
On Thu, Oct 8, 2009 at 1:58 AM, swapna_here swapna.here...@gmail.com wrote:
 i don't understand why my solr index increasing daily
 when i am adding and deleting the same number of documents daily

A delete is just a bit flip, and does not reclaim disk space immediately.
Deleted documents are squeezed out when segment merges happen
(including an optimize which merges all segments).
If you have large segments that documents are deleted from, those
segments may not be involved in a merge and hence the deleted docs can
hang around for quite some time.

-Yonik
http://www.lucidimagination.com




 i run org.apache.solr.client.solrj.SolrServer.optimize() manually four times
 a day

 is it not the right way to run optimize, if yes what is the procedure to run
 optimize?

 thanks in advance :)
 --
 View this message in context: 
 http://www.nabble.com/delay-while-adding-document-to-solr-index-tp25676777p25798789.html
 Sent from the Solr - User mailing list archive at Nabble.com.




indexing frequently-changing fields

2009-10-08 Thread didier deshommes
I am using Solr to index data in a SQL database.  Most of the data
doesn't change after initial commit, except for a single boolean field
that indicates whether an item is flagged as 'needing attention'.  So
I have a need_attention field in the database that I update whenever a
user marks an item as needing attention in my UI.  The problem I have
is that I want to offer the ability to include need_attention in my
user's queries, but do not want to incur the expense of having to
reindex whenever this flag changes on an individual document.

I have thought about different solutions to this problem, including
using multi-core and having a smaller core for recently-marked items
that I am willing to do 'near-real-time' commits on.  Are there are
any common solutions to this problem, which I have to imagine is
common in this community?


Re: indexing frequently-changing fields

2009-10-08 Thread Yonik Seeley
It's a bit round-about but you might be able to use ExternalFileField
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

The fieldType definition would look like
fieldType name=file keyField=id defVal=1 stored=false
indexed=false class=solr.ExternalFileField valType=float/

Then you can use frange to include/exclude certain values:

http://www.lucidimagination.com/blog/tag/frange/

-Yonik
http://www.lucidimagination.com


On Thu, Oct 8, 2009 at 4:59 PM, didier deshommes dfdes...@gmail.com wrote:
 I am using Solr to index data in a SQL database.  Most of the data
 doesn't change after initial commit, except for a single boolean field
 that indicates whether an item is flagged as 'needing attention'.  So
 I have a need_attention field in the database that I update whenever a
 user marks an item as needing attention in my UI.  The problem I have
 is that I want to offer the ability to include need_attention in my
 user's queries, but do not want to incur the expense of having to
 reindex whenever this flag changes on an individual document.

 I have thought about different solutions to this problem, including
 using multi-core and having a smaller core for recently-marked items
 that I am willing to do 'near-real-time' commits on.  Are there are
 any common solutions to this problem, which I have to imagine is
 common in this community?



RE: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Bernadette Houghton
Here's the query and the error - 

Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization AND 
status_i:(2)) 
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
Oct 09 08:20:17  [error] Error on searching: 400 Status: 
org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
Civilization AND status_i:(2)) ': Encount

Bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com] 
Sent: Thursday, 8 October 2009 12:48 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au 
  wrote:

 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email: 
 bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au 
 
 Website: http://www.deakin.edu.au
 http://www.deakin.edu.au/Deakin University CRICOS Provider Code  
 00113B (Vic)

 Important Notice: The contents of this email are intended solely  
 for the named addressee and are confidential; any unauthorised use,  
 reproduction or storage of the contents is expressly prohibited. If  
 you have received this email in error, please delete it and any  
 attachments immediately and advise the sender by return email or  
 telephone.
 Deakin University does not warrant that this email and any  
 attachments are error or virus free





RE: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Bernadette Houghton
Sorry, the last line was truncated -

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
'(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 
7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM 
... WILDTERM ... [ ... { ... NUMBER ...

-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
Sent: Friday, 9 October 2009 8:22 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Problems with WordDelimiterFilterFactory

Here's the query and the error - 

Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization AND 
status_i:(2)) 
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
Oct 09 08:20:17  [error] Error on searching: 400 Status: 
org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
Civilization AND status_i:(2)) ': Encount

Bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com] 
Sent: Thursday, 8 October 2009 12:48 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your  
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au 
  wrote:

 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email: 
 bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au 
 
 Website: http://www.deakin.edu.au
 http://www.deakin.edu.au/Deakin University CRICOS Provider Code  
 00113B (Vic)

 Important Notice: The contents of this email are intended solely  
 for the named addressee and are confidential; any unauthorised use,  
 reproduction or storage of the contents is expressly prohibited. If  
 you have received this email in error, please delete it and any 

Re: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Patrick Jungermann
Hi Bern,

the problem is the character sequence --. A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
 Sorry, the last line was truncated -
 
 HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
 '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 
 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM 
 ... WILDTERM ... [ ... { ... NUMBER ...
 
 -Original Message-
 From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
 Sent: Friday, 9 October 2009 8:22 AM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: Problems with WordDelimiterFilterFactory
 
 Here's the query and the error - 
 
 Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization 
 AND status_i:(2)) 
 Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
 Oct 09 08:20:17  [error] Error on searching: 400 Status: 
 org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
 Civilization AND status_i:(2)) ': Encount
 
 Bern
 
 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com] 
 Sent: Thursday, 8 October 2009 12:48 PM
 To: solr-user@lucene.apache.org
 Cc: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory
 
 Bern,
 
 I am interested on the solr query. In other words, the query that your  
 system sends to solr.
 
 Thanks,
 
 
 Christian
 
 On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
 bernadette.hough...@deakin.edu.au 
   wrote:
 
 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory  
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=0  
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType

 Bernadette Houghton, Library Business Applications Developer
 Deakin University Geelong Victoria 3217 Australia.
 Phone: 03 5227 8230 International: +61 3 5227 8230
 Fax: 03 5227 8000 International: +61 3 5227 8000
 MSN: bern_hough...@hotmail.com
 Email: 

RE: Sorting by insertion time

2009-10-08 Thread Steven A Rowe
Hi Tarjei,

See https://issues.apache.org/jira/browse/SOLR-1478 - with trunk Solr (and 
soon, 1.4), you can use pseudo-field _docid_ for this purpose.

Steve

 -Original Message-
 From: tarjei [mailto:tar...@nu.no]
 Sent: Thursday, October 08, 2009 2:18 AM
 To: solr-user@lucene.apache.org
 Subject: Sorting by insertion time
 
 Hi,
 
 Quite often I want a set of documents ordered by the time they were
 inserted, i.e. give me the 5 latest items that matches query foo. I
 usually solve this by sorting on a date field.
 
 I had a chat with Eric Hatcher when he visited Javazone 2009 and he
 said
 that Solr places documents on disk in insertion order.
 
 This would make it possible for me to save a sorting step by not
 sorting
 by a specific field, but by insertion time in reverse.
 
 AFAIK Lucene knows how to do this, but which request parameters should
 I
 use in Solr?
 
 Kind regards,
 Tarjei
 
 
 --
 Tarjei Huse
 Mobil: 920 63 413


RE: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Bernadette Houghton
Thanks for this, marklo; it is a *very* useful page.
bern

-Original Message-
From: marklo [mailto:mar...@pcmall.com] 
Sent: Thursday, 8 October 2009 1:10 PM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory


Use http://solr-url/solr/admin/analysis.jsp to see how your data is
indexed/queried

-- 
View this message in context: 
http://www.nabble.com/Problems-with-WordDelimiterFilterFactory-tp25795589p25797377.html
Sent from the Solr - User mailing list archive at Nabble.com.



[slightly off topic] Jetty and NIO

2009-10-08 Thread Grant Ingersoll
In the Solr example jetty.xml, there is the following setup and  
comments:

!-- Use this connector for many frequently idle connections
 and for threadless continuations.
Call name=addConnector
  Arg
  New class=org.mortbay.jetty.nio.SelectChannelConnector
Set name=portSystemProperty name=jetty.port  
default=8983//Set

Set name=maxIdleTime3/Set
Set name=Acceptors2/Set
Set name=confidentialPort8443/Set
  /New
  /Arg
/Call
   --

!-- Use this connector if NIO is not available. --
!-- This connector is currently being used for Solr because the
 nio.SelectChannelConnector showed poor performance under  
WindowsXP
 from a single client with non-persistent connections (35s vs  
~3min)

 to complete 10,000 requests)
--
Call name=addConnector
  Arg
  New class=org.mortbay.jetty.bio.SocketConnector
Set name=portSystemProperty name=jetty.port  
default=8983//Set

Set name=maxIdleTime5/Set
Set name=lowResourceMaxIdleTime1500/Set
  /New
  /Arg
/Call

So, if I'm on Centos 2.6 (64 bit), what connector should I be using?   
Based on the comments, I'm not sure the top one is the right thing  
either, but it also sounds like it is my only other choice.


The other thing I'm noticing is if I profile my app and I am  
retrieving something like 50 rows at a time, 30-60% of the time is  
spent in org.mortbay.jetty.bio.SocketConnector$Connection.fill().   I  
realize the answer may just be to get less results, but I was  
wondering if there are other tuning parameters that can make this more  
efficient b/c the 50 rows thing is a biz. reqt and I may not be able  
to get that changed.


Thanks,
Grant


RE: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Bernadette Houghton
Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up 
the error, but still doesn't find the right record. I see from marklo's 
analysis page that solr is still parsing it with a hyphen. Changing this part 
of our schema.xml -

filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement= replace=all
/

To 

filter class=solr.PatternReplaceFilterFactory
pattern=([^a-z]) replacement=  replace=all
/

i.e. replacing non-alpha chars with a space, looks like it may handle that 
aspect. 

Regards
Bern

-Original Message-
From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com] 
Sent: Friday, 9 October 2009 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Hi Bern,

the problem is the character sequence --. A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
 Sorry, the last line was truncated -
 
 HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse 
 '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, column 
 7. Was expecting one of: ( ... * ... QUOTED ... TERM ... PREFIXTERM 
 ... WILDTERM ... [ ... { ... NUMBER ...
 
 -Original Message-
 From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
 Sent: Friday, 9 October 2009 8:22 AM
 To: 'solr-user@lucene.apache.org'
 Subject: RE: Problems with WordDelimiterFilterFactory
 
 Here's the query and the error - 
 
 Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization 
 AND status_i:(2)) 
 Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc 
 Oct 09 08:20:17  [error] Error on searching: 400 Status: 
 org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- 
 Civilization AND status_i:(2)) ': Encount
 
 Bern
 
 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com] 
 Sent: Thursday, 8 October 2009 12:48 PM
 To: solr-user@lucene.apache.org
 Cc: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory
 
 Bern,
 
 I am interested on the solr query. In other words, the query that your  
 system sends to solr.
 
 Thanks,
 
 
 Christian
 
 On Oct 7, 2009, at 5:56 PM, Bernadette Houghton 
 bernadette.hough...@deakin.edu.au 
   wrote:
 
 Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

 Either scroll down and click one of the television broadcasting --  
 asia links, or type it in the Quick Search box.


 TIA

 bern

 -Original Message-
 From: Christian Zambrano [mailto:czamb...@gmail.com]
 Sent: Thursday, 8 October 2009 9:43 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Problems with WordDelimiterFilterFactory

 Could you please provide the exact URL of a query where you are
 experiencing this problem?
 eg(Not URL encoded): q=fieldName:hot and cold: temperatures

 On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
 We are having some issues with our solr parent application not  
 retrieving records as expected.

 For example, if the input query includes a colon (e.g. hot and  
 cold: temperatures), the relevant record (which contains a colon in  
 the same place) does not get retrieved; if the input query does not  
 include the colon, all is fine.  Ditto if the user searches for a  
 query containing hyphens, e.g. asia - civilization, although with  
 the qualifier that something like asia-civilization (no spaces  
 either side of the hyphen) works fine, whereas asia -  
 civilization (spaces either side of hyphen) doesn't work.

 Our schema.xml contains the following -

 fieldType name=text class=solr.TextField  
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory  
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter  
 class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true  
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory  
 generateWordParts=1 generateNumberParts=1 catenateWords=1  
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory  
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter  
 class=solr.ISOLatin1AccentFilterFactory/

Re: [slightly off topic] Jetty and NIO

2009-10-08 Thread Yonik Seeley
On Thu, Oct 8, 2009 at 6:24 PM, Grant Ingersoll gsing...@apache.org wrote:
 So, if I'm on Centos 2.6 (64 bit), what connector should I be using?  Based
 on the comments, I'm not sure the top one is the right thing either, but it
 also sounds like it is my only other choice.

Right - the connector that Solr uses in the example is fine for
typical Solr uses - NIO won't help.

 The other thing I'm noticing is if I profile my app and I am retrieving
 something like 50 rows at a time, 30-60% of the time is spent in
 org.mortbay.jetty.bio.SocketConnector$Connection.fill().

On the Solr server side?  That's code that *reads* a request from the
client... so if a lot of time is being spent there, it's probably
blocking waiting for the rest of the request?  The tests could be
network bound, or the test client may not be fast enough?

If we are saturating the network connection, then use SolrJ if you're
not, w/ the binary response format, or use something like JSON format
otherwise.  If you end up using a text response format, you could try
enabling compression for responses (not sure how with jetty).

-Yonik
http://www.lucidimagination.com

   I realize the
 answer may just be to get less results, but I was wondering if there are
 other tuning parameters that can make this more efficient b/c the 50 rows
 thing is a biz. reqt and I may not be able to get that changed.

 Thanks,
 Grant



Re: how can I use debugQuery if I have extended QParserPlugin?

2009-10-08 Thread gdeconto

Hi Yonik;

My original post (
http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tt25789546.html
http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tt25789546.html
) has the stack trace.  =^D

I am having trouble reproducing this issue consistently (I sometimes dont
get the NPE) so will have to track this down a bit more.  Luckily, someone
just showed me how to debug the core solr files with Eclipse.  Hopefully I
can now figure this out on my own.

thx


Yonik Seeley-2 wrote:
 
 I think you need to provide some more information such as a stack
 trace for the NPE, or a more elaborate description of what you think
 the problem is with the debug component.
 You said because the debugging code looks at the query passed in and
 not the expanded query, but I don't understand that.  The debug
 component is passed the actual Query object that the QParserPlugin
 created.
 

-- 
View this message in context: 
http://www.nabble.com/how-can-I-use-debugQuery-if-I-have-extended-QParserPlugin--tp25789546p25812899.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Problems with WordDelimiterFilterFactory

2009-10-08 Thread Christian Zambrano

Bern,

The only way that could be happening is if you are not using the field 
type you described on your original e-mail. The TokenFilter 
WordDelimiterFilterFactory should take care of the hyphen.


On 10/08/2009 05:30 PM, Bernadette Houghton wrote:

Thanks for this Patrick. If I remove one of the hyphens, solr doesn't throw up 
the error, but still doesn't find the right record. I see from marklo's 
analysis page that solr is still parsing it with a hyphen. Changing this part 
of our schema.xml -

 filter class=solr.PatternReplaceFilterFactory
 pattern=([^a-z]) replacement= replace=all
 /

To

 filter class=solr.PatternReplaceFilterFactory
 pattern=([^a-z]) replacement=  replace=all
 /

i.e. replacing non-alpha chars with a space, looks like it may handle that 
aspect.

Regards
Bern

-Original Message-
From: Patrick Jungermann [mailto:patrick.jungerm...@googlemail.com]
Sent: Friday, 9 October 2009 9:03 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Hi Bern,

the problem is the character sequence --. A query is not allowed to
have minus characters that consequent upon another one. Remove one minus
character and the query will be parsed without problems.

Because of this parsing problem, I'd recommend a query cleanup before
the submit to the Solr server that replaces each sequence of minus
characters by a single one.


Regards, Patrick



Bernadette Houghton schrieb:
   

Sorry, the last line was truncated -

HTTP Status 400 - org.apache.lucene.queryParser.ParseException: Cannot parse '(Asia -- Civilization AND status_i:(2)) ': Encountered - at line 1, 
column 7. Was expecting one of: ( ... * ...QUOTED  ...TERM  ...PREFIXTERM  ...WILDTERM  ... 
[ ... { ...NUMBER  ...

-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au]
Sent: Friday, 9 October 2009 8:22 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: Problems with WordDelimiterFilterFactory

Here's the query and the error -

Oct 09 08:20:17  [debug] [196] Solr query string:(Asia -- Civilization AND 
status_i:(2))
Oct 09 08:20:17  [debug] [196] Solr sort by:  score desc
Oct 09 08:20:17  [error] Error on searching: 400 Status: 
org.apache.lucene.queryParser.ParseException: Cannot parse '   (Asia -- Civilization AND 
status_i:(2)) ': Encount

Bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com]
Sent: Thursday, 8 October 2009 12:48 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Bern,

I am interested on the solr query. In other words, the query that your
system sends to solr.

Thanks,


Christian

On Oct 7, 2009, at 5:56 PM, Bernadette 
Houghtonbernadette.hough...@deakin.edu.au
wrote:

 

Hi Christian, try this one - http://www.deakin.edu.au/dro/view/DU:3601

Either scroll down and click one of the television broadcasting --
asia links, or type it in the Quick Search box.


TIA

bern

-Original Message-
From: Christian Zambrano [mailto:czamb...@gmail.com]
Sent: Thursday, 8 October 2009 9:43 AM
To: solr-user@lucene.apache.org
Subject: Re: Problems with WordDelimiterFilterFactory

Could you please provide the exact URL of a query where you are
experiencing this problem?
eg(Not URL encoded): q=fieldName:hot and cold: temperatures

On 10/07/2009 05:32 PM, Bernadette Houghton wrote:
   

We are having some issues with our solr parent application not
retrieving records as expected.

For example, if the input query includes a colon (e.g. hot and
cold: temperatures), the relevant record (which contains a colon in
the same place) does not get retrieved; if the input query does not
include the colon, all is fine.  Ditto if the user searches for a
query containing hyphens, e.g. asia - civilization, although with
the qualifier that something like asia-civilization (no spaces
either side of the hyphen) works fine, whereas asia -
civilization (spaces either side of hyphen) doesn't work.

Our schema.xml contains the following -

 fieldType name=text class=solr.TextField
positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 !-- in this example, we will only use synonyms at query time
 filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
 --
 filter
class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   

concatenating tokens

2009-10-08 Thread Joe Calderon
hello *, im using a combination of tokenizers and filters that give me
the desired tokens, however for a particular field i want to
concatenate these tokens back to a single string, is there a filter to
do that, if not what are the steps needed to make my own filter to
concatenate tokens?

for example, i start with Sprocket (widget) - Blue the analyzers
churn out the tokens [sprocket,widget,blue] i want to end up with the
string sprocket widget blue, this is a simple example and in the
general case lowercasing and punctuation removal does not work, hence
why im looking to concatenate tokens

--joe


Re: [slightly off topic] Jetty and NIO

2009-10-08 Thread Grant Ingersoll


On Oct 8, 2009, at 7:37 PM, Yonik Seeley wrote:

On Thu, Oct 8, 2009 at 6:24 PM, Grant Ingersoll  
gsing...@apache.org wrote:
So, if I'm on Centos 2.6 (64 bit), what connector should I be  
using?  Based
on the comments, I'm not sure the top one is the right thing  
either, but it

also sounds like it is my only other choice.


Right - the connector that Solr uses in the example is fine for
typical Solr uses - NIO won't help.

The other thing I'm noticing is if I profile my app and I am  
retrieving

something like 50 rows at a time, 30-60% of the time is spent in
org.mortbay.jetty.bio.SocketConnector$Connection.fill().


On the Solr server side?


Yes.


That's code that *reads* a request from the
client...


If I change nothing else and set rows=10, the time spent in .fill() is  
 5%.  I'll double check everything on my end.




so if a lot of time is being spent there, it's probably
blocking waiting for the rest of the request?  The tests could be
network bound, or the test client may not be fast enough?

If we are saturating the network connection, then use SolrJ if you're
not, w/ the binary response format, or use something like JSON format
otherwise.  If you end up using a text response format, you could try
enabling compression for responses (not sure how with jetty).





multi-word synonyms and analysis.jsp vs real field analysis (query, index)

2009-10-08 Thread Patrick Jungermann
Hi list,

I worked on a field type and its analyzing chain, at which I want to use
the SynonymFilter with entries similar to:

foo bar=foo_bar

During the analysis phase, I used the /admin/analysis.jsp view to test
the analyzing results produced by the created field type. The output
shows that a query foo bar will first be separated by the
WhitespaceTokenizer to the two tokens foo and bar, and that the
SynonymFilter will replace the both tokens with foo_bar. But as I
tried this at real query time with the request handler standard and
also with dismax, the tokens foo and bar were not replaced. The
parsedQueryString was something similar to field:foo field:bar. At
index time, it works like expected.

Has anybody experienced this and/or knows a workaround, a solution for it?


Thanks, Patrick







Re: issue in adding data to a multivalued field

2009-10-08 Thread Koji Sekiguchi

Hi Rakhi,

Use multiValued (capital V), not multivalued. :)

Koji


Rakhi Khatwani wrote:

Hi,
  i have a small schema with some of the fields defined as:
field name=id type=string indexed=true stored=true
multiValued=false required=true/
 field name=content type=text indexed=true stored=true
multivalued=false /
 field name=author_name type=text indexed=true stored=false
multivalued=true/

where the field author_name is multivalued.
however in UI (schema browser), following r the details of author_name
field,  its nowhere mentioned tht its multivalued.
Field: author_name
Field Type: text

Properties: Indexed, Tokenized
when i try creating and adding a document into solr, i get an exception
ERROR_id1_multiple_values_encountered_for_non_multiValued_field_author_name_ninad_raakhi_goureya_sheetal
here's my code snippet:
  solrDoc17.addField(id, id1);
  solrDoc17.addField(content, SOLR);
  solrDoc17.addField(author_name, ninad);
  solrDoc17.addField(author_name, raakhi);
  solrDoc17.addField(author_name, goureya);
  solrDoc17.addField(author_name, sheetal);
  server.add(solrDoc17);
  server.commit();
ny pointers??
regards,
Raakhi

  




DIH: Setting rows= on full-import has no effect

2009-10-08 Thread Jay Hill
In the past setting rows=n with the full-import command has stopped the DIH
importing at the number I passed in, but now this doesn't seem to be
working. Here is the command I'm using:
curl '
http://localhost:8983/solr/indexer/mediawiki?command=full-importrows=100'

But when 100 docs are imported the process keeps running. Here's the log
output:

Oct 8, 2009 5:23:32 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 100
Oct 8, 2009 5:23:33 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 200
Oct 8, 2009 5:23:35 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 300
Oct 8, 2009 5:23:36 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 400
Oct 8, 2009 5:23:38 PM org.apache.solr.handler.dataimport.DocBuilder
buildDocument
INFO: Indexing stopped at docCount = 500

and so on.

Running on the most recent nightly: 1.4-dev 823366M - jayhill - 2009-10-08
17:31:22

I've used that exact url in the past and the indexing stopped at the rows
number as expected, but I haven't run the command for about two months on a
build from back in early July.

Here's the dih config:

 dataConfig
dataSource
   name=dsFiles
   type=FileDataSource
   encoding=UTF-8/
document
  entity
 name=f
 processor=FileListEntityProcessor
 baseDir=/path/to/files
 fileName=.*xml
 recursive=true
 rootEntity=false
 dataSource=null

entity
   name=wikixml
   processor=XPathEntityProcessor
   forEach=/mediawiki/page
   url=${f.fileAbsolutePath}
   dataSource=dsFiles
   onError=skip
   
  field column=id xpath=/mediawiki/page/id/
  field column=title xpath=/mediawiki/page/title/
  field column=contributor
xpath=/mediawiki/page/revision/contributor/username/
  field column=comment xpath=/mediawiki/page/revision/comment/
  field column=text xpath=/mediawiki/page/revision/text/

/entity
  /entity
/document
/dataConfig


-Jay


Re: multi-word synonyms and analysis.jsp vs real field analysis (query, index)

2009-10-08 Thread Koji Sekiguchi

Patrick,

 parsedQueryString was something similar to field:foo field:bar. At
 index time, it works like expected.

I guess because you are searching q=foo bar, this causes OR query.
Use q=foo bar, instead.

Koji


Patrick Jungermann wrote:

Hi list,

I worked on a field type and its analyzing chain, at which I want to use
the SynonymFilter with entries similar to:

foo bar=foo_bar

During the analysis phase, I used the /admin/analysis.jsp view to test
the analyzing results produced by the created field type. The output
shows that a query foo bar will first be separated by the
WhitespaceTokenizer to the two tokens foo and bar, and that the
SynonymFilter will replace the both tokens with foo_bar. But as I
tried this at real query time with the request handler standard and
also with dismax, the tokens foo and bar were not replaced. The
parsedQueryString was something similar to field:foo field:bar. At
index time, it works like expected.

Has anybody experienced this and/or knows a workaround, a solution for it?


Thanks, Patrick






  




DIH Error in latest Nightly Builds

2009-10-08 Thread jayakeerthi s
Hi All,

I tried Indexing data and got the following error., Used Solr nightly Oct5th
and nightly 8th, The same Configuration/query  is working in Older
version(May nightly Build)

The db-data-config.xml has the simple Select query


SEVERE: Full Import failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: select CATALOG_ID, CATALOG_NUMBER, CATALOG_NAME,
SEGMENTATION_
TYPE, BEGIN_OFFER_DATE, END_OFFER_DATE, FUTURE_BEGIN_DATE, FUTURE_END_DATE,
ATONCE_BEGIN_DATE, ATONCE_END_DATE, REFERENCE_BEGIN_DATE, REFERENCE_END_DA
TE, BEGIN_SEASON, LANGUAGE, COUNTRY, SIZE_TYPE, CURRENCY, DIVISION,
LIFECYCLE, PRODUCT_CD, STYLE_CD, GLOBAL_STYLE_NAME, REGION_STYLE_NAME,
NEW_STYLE,
SIZE_RUN, COLOR_NBR, GLOBAL_COLOR_DESC, REGION_COLOR_DESC, WIDTH, CATEGORY,
SUB_CATEGORY, CATEGORY_SUMMARY, CATEGORY_CORE_FOCUS, SPORT_ACTIVITY, SPORT
_ACTIVITY_SUMMARY, GENDER_AGE, GENDER_AGE_SUMMARY, SILO, SILHOUETTE,
SILHOUETTE_SUMMARY, SEGMENTATION_TIER, PRIMARY_COLOR, NEW_PRODUCT,
CARRYOVER_PROD
UCT, WHOLESALE_AMOUNT, RETAIL_AMOUNT, CATALOG_LAST_MOD_DATE,
PRODUCT_LAST_MOD_DATE, STYLE_LAST_MOD_DATE, CATALOG_ID || '-' || PRODUCT_CD
as UNIQ from
prodsearch_atlasatgcombine Processing Document # 1
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:356)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Caused by: java.sql.SQLException: Unsupported feature
at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)
at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179)
at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:269)
at
oracle.jdbc.dbaccess.DBError.throwUnsupportedFeatureSqlException(DBError.java:689)
at
oracle.jdbc.driver.OracleConnection.setHoldability(OracleConnection.java:3065)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:191)
at
org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128)
at
org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363)
at
org.apache.solr.handler.dataimport.JdbcDataSource.access$300(JdbcDataSource.java:39)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
... 11 more
Oct 8, 2009 6:30:23 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: start rollback
Oct 8, 2009 6:30:23 PM org.apache.solr.update.DirectUpdateHandler2 rollback
INFO: end_rollback
2009-10-08 18:31:12.149::INFO:  Shutdown hook executing
2009-10-08 18:31:12.149::INFO:  Shutdown hook complete

Thanks and regards,
JK


Re: Scoring for specific field queries

2009-10-08 Thread Avlesh Singh
Use the field analysis tool to see how the data is being analyzed in both
the fields.

Cheers
Avlesh

On Fri, Oct 9, 2009 at 12:56 AM, R. Tan tanrihae...@gmail.com wrote:

 Hmm... I don't quite get the desired results. Those starting with cha are
 now randomly ordered. Is there something wrong with the filters I applied?


 On Thu, Oct 8, 2009 at 7:38 PM, Avlesh Singh avl...@gmail.com wrote:

  Filters? I did not mean filters at all.
  I am in a mad rush right now, but on the face of it your field
 definitions
  look right.
 
  This is what I asked for -
  q=(autoComplete2:cha^10 autoComplete:cha)
 
  Lemme know if this does not work for you.
 
  Cheers
  Avlesh
 
  On Thu, Oct 8, 2009 at 4:58 PM, R. Tan tanrihae...@gmail.com wrote:
 
   Hi Avlesh,
  
   I can't seem to get the scores right.
  
   I now have these types for the fields I'm targeting,
  
   fieldType name=autoComplete class=solr.TextField
   positionIncrementGap=1
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.NGramFilterFactory minGramSize=1
   maxGramSize=20/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
/analyzer
  /fieldType
  fieldType name=autoComplete2 class=solr.TextField
   positionIncrementGap=1
analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.NGramFilterFactory minGramSize=1
   maxGramSize=20/
/analyzer
analyzer type=query
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
/analyzer
  /fieldType
  
   My query is this,
  
  
 
 q=*:*fq=autoCompleteHelper:cha+autoCompleteHelper2:chaqf=autoCompleteHelper^10.0+autoCompleteHelper2^1.0
  
   What should I tweak from the above config and query?
  
   Thanks,
   Rih
  
  
   On Thu, Oct 8, 2009 at 4:38 PM, R. Tan tanrihae...@gmail.com wrote:
  
I will have to pass on this and try your suggestion first. So, how
 does
your suggestion (1 and 2) boost the my startswith query? Is it
 because
  of
the n-gram filter?
   
   
   
On Thu, Oct 8, 2009 at 2:27 PM, Sandeep Tagore 
  sandeep.tag...@gmail.com
   wrote:
   
   
Yes it can be done but it needs some customization. Search for
 custom
   sort
implementations/discussions.
You can check...
   
   
  
 
 http://lucene.apache.org/solr/api/org/apache/solr/schema/RandomSortField.html
.
Let us know if you have any issues.
   
Sandeep
   
   
R. Tan wrote:

 This might work and I also have a single value field which makes
 it
 cleaner.
 Can sort be customized (with indexOf()) from the solr parameters
   alone?

   
--
View this message in context:
   
  
 
 http://www.nabble.com/Scoring-for-specific-field-queries-tp25798390p25799055.html
Sent from the Solr - User mailing list archive at Nabble.com.
   
   
   
  
 



Re: DIH Error in latest Nightly Builds

2009-10-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
raised an issue
https://issues.apache.org/jira/browse/SOLR-1500

On Fri, Oct 9, 2009 at 7:10 AM, jayakeerthi s mail2keer...@gmail.com wrote:
 Hi All,

 I tried Indexing data and got the following error., Used Solr nightly Oct5th
 and nightly 8th, The same Configuration/query  is working in Older
 version(May nightly Build)

 The db-data-config.xml has the simple Select query


 SEVERE: Full Import failed
 org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
 execute query: select CATALOG_ID, CATALOG_NUMBER, CATALOG_NAME,
 SEGMENTATION_
 TYPE, BEGIN_OFFER_DATE, END_OFFER_DATE, FUTURE_BEGIN_DATE, FUTURE_END_DATE,
 ATONCE_BEGIN_DATE, ATONCE_END_DATE, REFERENCE_BEGIN_DATE, REFERENCE_END_DA
 TE, BEGIN_SEASON, LANGUAGE, COUNTRY, SIZE_TYPE, CURRENCY, DIVISION,
 LIFECYCLE, PRODUCT_CD, STYLE_CD, GLOBAL_STYLE_NAME, REGION_STYLE_NAME,
 NEW_STYLE,
 SIZE_RUN, COLOR_NBR, GLOBAL_COLOR_DESC, REGION_COLOR_DESC, WIDTH, CATEGORY,
 SUB_CATEGORY, CATEGORY_SUMMARY, CATEGORY_CORE_FOCUS, SPORT_ACTIVITY, SPORT
 _ACTIVITY_SUMMARY, GENDER_AGE, GENDER_AGE_SUMMARY, SILO, SILHOUETTE,
 SILHOUETTE_SUMMARY, SEGMENTATION_TIER, PRIMARY_COLOR, NEW_PRODUCT,
 CARRYOVER_PROD
 UCT, WHOLESALE_AMOUNT, RETAIL_AMOUNT, CATALOG_LAST_MOD_DATE,
 PRODUCT_LAST_MOD_DATE, STYLE_LAST_MOD_DATE, CATALOG_ID || '-' || PRODUCT_CD
 as UNIQ from
 prodsearch_atlasatgcombine Processing Document # 1
        at
 org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:72)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:253)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:210)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:39)
        at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:58)
        at
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
        at
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
        at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:356)
        at
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
        at
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
        at
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
        at
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
        at
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
 Caused by: java.sql.SQLException: Unsupported feature
        at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:134)
        at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:179)
        at oracle.jdbc.dbaccess.DBError.throwSqlException(DBError.java:269)
        at
 oracle.jdbc.dbaccess.DBError.throwUnsupportedFeatureSqlException(DBError.java:689)
        at
 oracle.jdbc.driver.OracleConnection.setHoldability(OracleConnection.java:3065)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:191)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource$1.call(JdbcDataSource.java:128)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource.getConnection(JdbcDataSource.java:363)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource.access$300(JdbcDataSource.java:39)
        at
 org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.init(JdbcDataSource.java:240)
        ... 11 more
 Oct 8, 2009 6:30:23 PM org.apache.solr.update.DirectUpdateHandler2 rollback
 INFO: start rollback
 Oct 8, 2009 6:30:23 PM org.apache.solr.update.DirectUpdateHandler2 rollback
 INFO: end_rollback
 2009-10-08 18:31:12.149::INFO:  Shutdown hook executing
 2009-10-08 18:31:12.149::INFO:  Shutdown hook complete

 Thanks and regards,
 JK




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: DIH: Setting rows= on full-import has no effect

2009-10-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
I have raised an issue http://issues.apache.org/jira/browse/SOLR-1501

On Fri, Oct 9, 2009 at 6:10 AM, Jay Hill jayallenh...@gmail.com wrote:
 In the past setting rows=n with the full-import command has stopped the DIH
 importing at the number I passed in, but now this doesn't seem to be
 working. Here is the command I'm using:
 curl '
 http://localhost:8983/solr/indexer/mediawiki?command=full-importrows=100'

 But when 100 docs are imported the process keeps running. Here's the log
 output:

 Oct 8, 2009 5:23:32 PM org.apache.solr.handler.dataimport.DocBuilder
 buildDocument
 INFO: Indexing stopped at docCount = 100
 Oct 8, 2009 5:23:33 PM org.apache.solr.handler.dataimport.DocBuilder
 buildDocument
 INFO: Indexing stopped at docCount = 200
 Oct 8, 2009 5:23:35 PM org.apache.solr.handler.dataimport.DocBuilder
 buildDocument
 INFO: Indexing stopped at docCount = 300
 Oct 8, 2009 5:23:36 PM org.apache.solr.handler.dataimport.DocBuilder
 buildDocument
 INFO: Indexing stopped at docCount = 400
 Oct 8, 2009 5:23:38 PM org.apache.solr.handler.dataimport.DocBuilder
 buildDocument
 INFO: Indexing stopped at docCount = 500

 and so on.

 Running on the most recent nightly: 1.4-dev 823366M - jayhill - 2009-10-08
 17:31:22

 I've used that exact url in the past and the indexing stopped at the rows
 number as expected, but I haven't run the command for about two months on a
 build from back in early July.

 Here's the dih config:

  dataConfig
    dataSource
       name=dsFiles
       type=FileDataSource
       encoding=UTF-8/
    document
      entity
     name=f
     processor=FileListEntityProcessor
     baseDir=/path/to/files
     fileName=.*xml
     recursive=true
     rootEntity=false
     dataSource=null

    entity
       name=wikixml
       processor=XPathEntityProcessor
       forEach=/mediawiki/page
       url=${f.fileAbsolutePath}
       dataSource=dsFiles
       onError=skip
       
      field column=id xpath=/mediawiki/page/id/
      field column=title xpath=/mediawiki/page/title/
      field column=contributor
 xpath=/mediawiki/page/revision/contributor/username/
      field column=comment xpath=/mediawiki/page/revision/comment/
      field column=text xpath=/mediawiki/page/revision/text/

        /entity
      /entity
    /document
 /dataConfig


 -Jay




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


RE: Solr Quries

2009-10-08 Thread Pravin Karne
Thanks for your help.
Can you please provide detail configuration for solr distributed environment.
How to setup master and slave ? for this in which  file/s I have to do changes ?
What are the shard parameters ?

Can we integrate zookeeper with this ?

Please provide details for this.

Thanks in advance.
-Pravin

-Original Message-
From: Sandeep Tagore [mailto:sandeep.tag...@gmail.com]
Sent: Wednesday, October 07, 2009 4:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Quries


Hi Pravin,

1. Is solr work in distributed environment ? if yes, how to configure it?
Yep. You can achieve this with Sharding.
For example: Install and Configure Solr on two machines and declare any one
of those as master. Insert shard parameters while you index and search your
data.

2. Is solr have Hadoop support? if yes, how to setup it with Hadoop/HDFS?
(Note: I am familiar with Hadoop)
Sorry. No idea.

3. I have employee information(id, name ,address, cell no, personal info) of
1 TB ,To post(index)this data on solr server, shall I have to create xml
file with this data and then post it to solr server? Or is there any other
optimal way?  In future my data will grow upto 10 TB , then how can I index
this data ?(because creating xml is more headache )
I think, XML is not the best way. I don't suggest it. If you have that 1 TB
data in a database you can achieve this simply using full import command.
Configure your DB details in solr-config.xml and data-config.xml and add you
DB driver jar to solr lib directory. Now import the data in slices (say dept
wise, or in some category wise..). In future, you can import the data from a
DB or you can index the data directly using client-API with simple java
beans.

Hope this info helps you.

Regards,
Sandeep Tagore
--
View this message in context: 
http://www.nabble.com/Solr-Quries-tp25780371p25783891.html
Sent from the Solr - User mailing list archive at Nabble.com.


DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.