date:20100525

RE: Faceted search not working?

2010-05-25 Thread Birger Lie

Hi,
try

http://localhost:8080/solr/select/?q=YOUR-QUERYfacet=truefacet.field=title


I don't think the bolean fields is mapped to on and off :)


-birger

-Original Message-
From: Ilya Sterin [mailto:ster...@gmail.com] 
Sent: 24. mai 2010 23:11
To: solr-user@lucene.apache.org
Subject: Faceted search not working?

I'm trying to perform a faceted search without any luck.  Result set doesn't 
return any facet information...

http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title

I'm getting the result set, but no face information present?  Is there 
something else that needs to happen to turn faceting on?

I'm using latest Solr 1.4 release.  Data is indexed from the database using 
dataimporter.

Thanks.

Ilya Sterin

Tagging and excluding Filters

2010-05-25 Thread Lukas Kahwe Smith

Hi,

I am using the following solution:
http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters

However when I implemented this on I found that I cannot combine different 
filter types:
http://search.un-informed.org/search?q==t[23]=malariatm=anys=Search

The above request would generate the following Solr query:
facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonrows=21

Now when I deselect one of the checkboxes I add an fq parameters:
facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonfq={!tag%3Ddt}organisation_id:(-9)rows=21

{!tag=dt}organisation_id:(-8)

Now where I am at a loss is when I want to filter in multiple different 
sections (like filter both organisations as well as clause information type.

I tried various ways of constructing the fq prameter but I always get a parse 
error:
{!tag=dt}(organisation_id:(-8) AND information_type_id:(-1))
{!tag=dt}organisation_id:(-8) AND {!tag=dt}information_type_id:(-1)

For example:
Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 
'organisation_id:(-9) AND {!tag=dt}information_type_id:(-1)': Encountered  
} }  at line 1, column 35.

When running:
facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonfq={!tag%3Ddt}organisation_id:(-9)+AND+{!tag%3Ddt}information_type_id:(-1)rows=21}

Can someone give me a hint?

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

How well does Solr scale over large number of facet values?

2010-05-25 Thread Andy

I want to facet over a field group.

Since group is created by users, potentially there can be a huge number of 
values for group.

- Would Solr be able to handle a use case like this? Or is Solr not really 
appropriate for facet fields with a large number of values?

- I understand that I can set facet.limit to restrict the number of values 
returned for a facet field. Would this help in my case? 
Say there are 100,000 matching values for group in a search, if I set  
facet.limit to 50. would that speed up the query, or would the query still be 
slow because Solr still needs to process and sort through all the facet values 
and return the top 50 ones?

-Any tips on  how to tune Solr for large number of facet values?

Thanks.

Re: Apache or Nginx In front of SOLR?

2010-05-25 Thread Paul Dhaliwal

It depends on what kind of load you are talking about and what your
expertise is.

NGINX does perform better than apache for most people, however less people
know about NGINX than apache. If you have more than 100K searchers a day
doing a few searches each, you will benefits from NGINX. If your traffic is
lower and you know apache better, apache will do just fine.

2010/5/25 Kranti™ K K Parisa kranti.par...@gmail.com

 Dear All,

 Which is the best implementation in front of SOLR between Apache and NGINX?

 The main aspects would be
 1. Ability to handle high loads

They are both known to handle high loads just fine.

2. Resource utilizations

Apache uses more resources than NGINX in heavy loads, but I am sure apache
can be tuned.

3. Caching (can we have caching implemented in front of solr, I did
 implement SOLR caching but to the extent possible i would still reduce the
 calls to SOLR by having some caching implemented in front of SOLR to serve

You probably want to look at a reverse proxy like varnish or squid.


 the static pages whose data actually comes from SOLR)
 4. Ability to record the statistics like AWSTATS available for Apache.

This shouldn't be a concern. You can even configure tomcat or jetty to log
in apache format.



 Please suggest your thoughts/ideas.


 Best Regards,
 Kranti K K Parisa


Hope that helps,
Paul Dhaliwal

RE: Highlighting is not happening

2010-05-25 Thread Doddamani, Prakash

Hey,

I thought the Highlights would happen in the field of the documents
returned from SOLR J
But it gives new list of Highlighting at below, sorry for the confusion 

I was wondering is there a way that the fields returned itself contains
bold characters

Eg : if searched for query

doc
str field name=onereturned response which contains
bquery/b should be bold/str
/doc


Regards
Prakash

-Original Message-
From: Sascha Szott [mailto:sz...@zib.de] 
Sent: Monday, May 24, 2010 10:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Highlighting is not happening

Hi Prakash,

can you provide

1. the definition of the relevant field
2. your query
3. the definition of the relevant request handler 4. a field value that
is stored in your index and should be highlighted

-Sascha

Doddamani, Prakash wrote:
 Thanks Sascha,

 The type for fields for which I am searching are all text , and I 
 am using solr.TextField


 fieldType name=text class=solr.TextField
 positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  !-- in this example, we will only use synonyms at query time
  filter class=solr.SynonymFilterFactory
 synonyms=index_synonyms.txt ignoreCase=true expand=false/
  --
  !-- Case insensitive stop word removal.
   enablePositionIncrements=true ensures that a 'gap' is 
 left to
   allow for accurate phrase queries.
  --
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
  filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory
 synonyms=synonyms.txt ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
  filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
  /fieldType

 Regards
 Prakash


 -Original Message-
 From: Sascha Szott [mailto:sz...@zib.de]
 Sent: Monday, May 24, 2010 10:29 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Highlighting is not happening

 Hi Prakash,

 more importantly, check the field type and its associated analyzer. In

 case you use a non-tokenized type (e.g., string), highlighting will 
 not appear if only a partial field match exists (only exact matches, 
 i.e. the query coincides with the field value, will be highlighted). 
 If that's not your intent, you should at least define an tokenizer for

 the field type.

 Best,
 Sascha

 Doddamani, Prakash wrote:
 Hey Daren,
 Yes the fields for which I am searching are stored and indexed, also 
 they are returned from the query, Also it is not coming, if the 
 entire

 search keyword is part of the field.

 Thanks
 Prakash

 -Original Message-
 From: dar...@ontrenet.com [mailto:dar...@ontrenet.com]
 Sent: Monday, May 24, 2010 9:32 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Highlighting is not happening

 Check that the field you are highlighting on is stored. It won't 
 work otherwise.


 Now, this also means that the field is returned from the query. For 
 large text fields to be highlighted only, this means the entire text 
 is returned for each result.


 There is a pending feature to address this, that allows you to tell 
 Solr to NOT return a specific field (to avoid unecessary transfer of 
 large text fields in this scenario).

 Darren

 Hi



 I am using dismax request handler, I wanted to highlight the search 
 field,

 So added

 str name=hltrue/str

 I was expecting like if I search for keyword Akon resultant docs 
 wherever the Akon is available is bold.



 But I am not seeing them getting bold, could some one tell me the 
 real

 path where I should tune

 If I pass explicitly the hl=true does not work



 I have added the request handler



 requestHandler name=dismax class=solr.SearchHandler
   lst name=defaults
str name=defTypedismax/str
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=qf
 name^20.0 coming^5 playing^4 keywords^0.1
/str
 str name=bf
   rord(isclassic)^0.5 ord(listeners)^0.3
/str
 str

Re: Faceted search not working?

2010-05-25 Thread Sascha Szott


Hi Birger,

Birger Lie wrote:

I don't think the bolean fields is mapped to on and off :)

You can use true and on interchangeably.

-Sascha




-birger

-Original Message-
From: Ilya Sterin [mailto:ster...@gmail.com]
Sent: 24. mai 2010 23:11
To: solr-user@lucene.apache.org
Subject: Faceted search not working?

I'm trying to perform a faceted search without any luck.  Result set doesn't 
return any facet information...

http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title

I'm getting the result set, but no face information present?  Is there 
something else that needs to happen to turn faceting on?

I'm using latest Solr 1.4 release.  Data is indexed from the database using 
dataimporter.

Thanks.

Ilya Sterin

Re: sort by field length

2010-05-25 Thread Sascha Szott


Hi Erick,

Erick Erickson wrote:

Are you sure you want to recompute the length when sorting?
It's the classic time/space tradeoff, but I'd suggest that when
your index is big enough to make taking up some more space
a problem, it's far too big to spend the cycles calculating each
term length for sorting purposes considering you may be
sorting all the terms in your index worst-case.
Good point, thank you for the clarification. I thought that Lucene 
internally stores the field length (e.g., in order to compute the 
relevance) and getting this information at query time requires only a 
simple lookup.


-Sascha



But you could consider payloads for storing the length, although
that would still be redundant...

Best
Erick

On Mon, May 24, 2010 at 8:30 AM, Sascha Szottsz...@zib.de  wrote:


Hi folks,

is it possible to sort by field length without having to (redundantly) save
the length information in a seperate index field? At first, I thought to
accomplish this using a function query, but I couldn't find an appropriate
one.

Thanks in advance,
Sascha

Re: Highlighting is not happening

2010-05-25 Thread Sascha Szott


Hi,

to accomplish that, use the highlighting parameters hl.simple.pre and 
hl.simple.post.


By the way, there are a plenty of other parameters that affect 
highlighting. Take a look at:


http://wiki.apache.org/solr/HighlightingParameters

-Sascha

Doddamani, Prakash wrote:

Hey,

I thought the Highlights would happen in the field of the documents
returned from SOLR J
But it gives new list of Highlighting at below, sorry for the confusion

I was wondering is there a way that the fields returned itself contains
bold characters

Eg : if searched for query

doc
str field name=onereturned response which contains
bquery/b  should be bold/str
/doc


Regards
Prakash

-Original Message-
From: Sascha Szott [mailto:sz...@zib.de]
Sent: Monday, May 24, 2010 10:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Highlighting is not happening

Hi Prakash,

can you provide

1. the definition of the relevant field
2. your query
3. the definition of the relevant request handler 4. a field value that
is stored in your index and should be highlighted

-Sascha

Doddamani, Prakash wrote:

Thanks Sascha,

The type for fields for which I am searching are all text , and I
am using solr.TextField


fieldType name=text class=solr.TextField
positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  !-- in this example, we will only use synonyms at query time
  filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
  --
  !-- Case insensitive stop word removal.
   enablePositionIncrements=true ensures that a 'gap' is
left to
   allow for accurate phrase queries.
  --
  filter class=solr.StopFilterFactory
  ignoreCase=true
  words=stopwords.txt
  enablePositionIncrements=true
  /
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
  /fieldType

Regards
Prakash


-Original Message-
From: Sascha Szott [mailto:sz...@zib.de]
Sent: Monday, May 24, 2010 10:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Highlighting is not happening

Hi Prakash,

more importantly, check the field type and its associated analyzer. In



case you use a non-tokenized type (e.g., string), highlighting will
not appear if only a partial field match exists (only exact matches,
i.e. the query coincides with the field value, will be highlighted).
If that's not your intent, you should at least define an tokenizer for



the field type.

Best,
Sascha

Doddamani, Prakash wrote:

Hey Daren,
Yes the fields for which I am searching are stored and indexed, also
they are returned from the query, Also it is not coming, if the
entire



search keyword is part of the field.

Thanks
Prakash

-Original Message-
From: dar...@ontrenet.com [mailto:dar...@ontrenet.com]
Sent: Monday, May 24, 2010 9:32 PM
To: solr-user@lucene.apache.org
Subject: Re: Highlighting is not happening

Check that the field you are highlighting on is stored. It won't
work otherwise.


Now, this also means that the field is returned from the query. For
large text fields to be highlighted only, this means the entire text
is returned for each result.


There is a pending feature to address this, that allows you to tell
Solr to NOT return a specific field (to avoid unecessary transfer of
large text fields in this scenario).

Darren


Hi



I am using dismax request handler, I wanted to highlight the search
field,

So added

str name=hltrue/str

I was expecting like if I search for keyword Akon resultant docs
wherever the Akon is available is bold.



But I am not seeing them getting bold, could some one tell me the
real



path where I should tune

If I pass explicitly the hl=true does not work



I have added the request handler



requestHandler name=dismax class=solr.SearchHandler
   lst name=defaults
str name=defTypedismax/str
str name=echoParamsexplicit/str

Re: Apache or Nginx In front of SOLR?

2010-05-25 Thread Kranti™ K K Parisa

Thanks Paul, I shall continue doing some more RD with your inputs.

Best Regards,
Kranti K K Parisa



On Tue, May 25, 2010 at 12:54 PM, Paul Dhaliwal subp...@gmail.com wrote:

 It depends on what kind of load you are talking about and what your
 expertise is.

 NGINX does perform better than apache for most people, however less people
 know about NGINX than apache. If you have more than 100K searchers a day
 doing a few searches each, you will benefits from NGINX. If your traffic is
 lower and you know apache better, apache will do just fine.

 2010/5/25 Kranti™ K K Parisa kranti.par...@gmail.com

  Dear All,
 
  Which is the best implementation in front of SOLR between Apache and
 NGINX?
 
  The main aspects would be
  1. Ability to handle high loads
 
 They are both known to handle high loads just fine.

 2. Resource utilizations
 
 Apache uses more resources than NGINX in heavy loads, but I am sure apache
 can be tuned.

 3. Caching (can we have caching implemented in front of solr, I did
  implement SOLR caching but to the extent possible i would still reduce
 the
  calls to SOLR by having some caching implemented in front of SOLR to
 serve
 
 You probably want to look at a reverse proxy like varnish or squid.


  the static pages whose data actually comes from SOLR)
  4. Ability to record the statistics like AWSTATS available for Apache.
 
 This shouldn't be a concern. You can even configure tomcat or jetty to log
 in apache format.


 
  Please suggest your thoughts/ideas.
 
 
  Best Regards,
  Kranti K K Parisa
 

 Hope that helps,
 Paul Dhaliwal

Re: How well does Solr scale over large number of facet values?

2010-05-25 Thread Marc Sturlese


With the uninverted algorithm it will be very fast whatever is the number of
unique terms. But be careful with the memory because it uses quite a lot.
Using the oldest facet algorithm, if you have a lot of different terms it
will be slow.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-well-does-Solr-scale-over-large-number-of-facet-values-tp841508p841613.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with extended dismax, minus prefix (to mean NOT) and interaction with mm?

2010-05-25 Thread Erik Hatcher

This looks like a case where the extended dismax parser is creating a  
Lucene QueryParser parsed query rather than a disjunction maximum query.


A case of too much magic maybe?   Looks like this one should be  
parsed quite differently.  Try dismax and see what you get, it'll be  
quite different.


Erik

On May 24, 2010, at 11:33 AM, Bill Dueber wrote:

I'm running edismax (on both a 1.4 with patch and a branch_3x  
version) and

I'm seeing something I don't expect.

We have our mm set such that 2/2 must match and 2/3 must match  
(mm=2-1

567%

A query of
  dog cat

...gets interpreted as
 dog AND cat

But a query of
 dog cat -mouse

...gets interpreted as

 (dog AND cat) OR (dog AND NOT mouse) OR (cat AND NOT mouse)

In other words, the -mouse is being interpreted as a single token  
(NOT

mouse) to be counted for mm.

I would expect the query to interpret as:

 (dog AND cat) AND (NOT mouse)

Are my expectations out of whack? Or is this unexpected behavior?

[I've pasted the debugQuery info for a similar search below, though  
I freely

admit to not knowing how to read it]

Any thoughts on what I'm seeing here?


-Bill-

lst name=debug
str name=rawquerystringdog cat -trilogy/str
str name=querystringdog cat -trilogy/str
str name=parsedqueryallfields:dog allfields:cat
-allfields:trilogi/str
str name=parsedquery_toStringallfields:dog allfields:cat
-allfields:trilogi/str
lst name=explain
 str name=000107098
2.1741915 = (MATCH) sum of:
 1.2620605 = (MATCH) weight(allfields:dog in 3187), product of:
   0.7618881 = queryWeight(allfields:dog), product of:
 8.744003 = idf(docFreq=64, maxDocs=15)
 0.08713264 = queryNorm
   1.6564907 = (MATCH) fieldWeight(allfields:dog in 3187), product of:
 1.7320508 = tf(termFreq(allfields:dog)=3)
 8.744003 = idf(docFreq=64, maxDocs=15)
 0.109375 = fieldNorm(field=allfields, doc=3187)
 0.912131 = (MATCH) weight(allfields:cat in 3187), product of:
   0.64770865 = queryWeight(allfields:cat), product of:
 7.4335938 = idf(docFreq=240, maxDocs=15)
 0.08713264 = queryNorm
   1.4082427 = (MATCH) fieldWeight(allfields:cat in 3187), product of:
 1.7320508 = tf(termFreq(allfields:cat)=3)
 7.4335938 = idf(docFreq=240, maxDocs=15)
 0.109375 = fieldNorm(field=allfields, doc=3187)
/str
 str name=36695
2.1518915 = (MATCH) sum of:
 1.249116 = (MATCH) weight(allfields:dog in 36426), product of:
   0.7618881 = queryWeight(allfields:dog), product of:
 8.744003 = idf(docFreq=64, maxDocs=15)
 0.08713264 = queryNorm
   1.6395006 = (MATCH) fieldWeight(allfields:dog in 36426), product  
of:

 2.0 = tf(termFreq(allfields:dog)=4)
 8.744003 = idf(docFreq=64, maxDocs=15)
 0.09375 = fieldNorm(field=allfields, doc=36426)
 0.9027756 = (MATCH) weight(allfields:cat in 36426), product of:
   0.64770865 = queryWeight(allfields:cat), product of:
 7.4335938 = idf(docFreq=240, maxDocs=15)
 0.08713264 = queryNorm
   1.3937988 = (MATCH) fieldWeight(allfields:cat in 36426), product  
of:

 2.0 = tf(termFreq(allfields:cat)=4)
 7.4335938 = idf(docFreq=240, maxDocs=15)
 0.09375 = fieldNorm(field=allfields, doc=36426)
/str
 str name=38137
1.4345944 = (MATCH) sum of:
 0.832744 = (MATCH) weight(allfields:dog in 37852), product of:
   0.7618881 = queryWeight(allfields:dog), product of:
 8.744003 = idf(docFreq=64, maxDocs=15)
 0.08713264 = queryNorm
   1.0930004 = (MATCH) fieldWeight(allfields:dog in 37852), product  
of:

 1.0 = tf(termFreq(allfields:dog)=1)
 8.744003 = idf(docFreq=64, maxDocs=15)
 0.125 = fieldNorm(field=allfields, doc=37852)
 0.6018504 = (MATCH) weight(allfields:cat in 37852), product of:
   0.64770865 = queryWeight(allfields:cat), product of:
 7.4335938 = idf(docFreq=240, maxDocs=15)
 0.08713264 = queryNorm
   0.9291992 = (MATCH) fieldWeight(allfields:cat in 37852), product  
of:

 1.0 = tf(termFreq(allfields:cat)=1)
 7.4335938 = idf(docFreq=240, maxDocs=15)
 0.125 = fieldNorm(field=allfields, doc=37852)
/str
 str name=000134898
1.2629167 = (MATCH) sum of:
 0.624558 = (MATCH) weight(allfields:dog in 30673), product of:
   0.7618881 = queryWeight(allfields:dog), product of:
 8.744003 = idf(docFreq=64, maxDocs=15)
 0.08713264 = queryNorm
   0.8197503 = (MATCH) fieldWeight(allfields:dog in 30673), product  
of:

 1.0 = tf(termFreq(allfields:dog)=1)
 8.744003 = idf(docFreq=64, maxDocs=15)
 0.09375 = fieldNorm(field=allfields, doc=30673)
 0.6383587 = (MATCH) weight(allfields:cat in 30673), product of:
   0.64770865 = queryWeight(allfields:cat), product of:
 7.4335938 = idf(docFreq=240, maxDocs=15)
 0.08713264 = queryNorm
   0.9855646 = (MATCH) fieldWeight(allfields:cat in 30673), product  
of:

 1.4142135 = tf(termFreq(allfields:cat)=2)
 7.4335938 = idf(docFreq=240, maxDocs=15)
 0.09375 = fieldNorm(field=allfields, doc=30673)
/str
 str name=29964
1.25527 = (MATCH) sum

Re: How well does Solr scale over large number of facet values?

2010-05-25 Thread Marc Sturlese


Since Solr 1.4 I think the uninverted method is on by default. Anyway, you
can choose wich to use with the method param:
facet.method=fc/enum (where fc is the uninverted one)
http://wiki.apache.org/solr/SimpleFacetParameters
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/How-well-does-Solr-scale-over-large-number-of-facet-values-tp841508p841683.html
Sent from the Solr - User mailing list archive at Nabble.com.

Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread Rakhi Khatwani

Hi,
   Is there any way to get all the fields (irrespective of whether
it contains a value or null) in solrDocument.
or
Is there any way to get all the fields in schema.xml of the url link (
http://localhost:8983/solr/core0/)??

Regards,
Raakhi

Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread findbestopensource

To reterive all documents, You need to use the query/filter *FieldName:*:**
Regards
Aditya
www.findbestopensource.com


On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
   Is there any way to get all the fields (irrespective of whether
 it contains a value or null) in solrDocument.
 or
 Is there any way to get all the fields in schema.xml of the url link (
 http://localhost:8983/solr/core0/)??

 Regards,
 Raakhi

Re: How real-time are Solr/Lucene queries?

2010-05-25 Thread Grant Ingersoll

How many docs are in the batch you are pulling down?  How many docs/second do 
you expect on the index size?  How big are the docs?  What do you expect in 
terms of queries per second?  How fast do new documents need to be available on 
the local server?  How much analysis do you have to do?  Also, define Real 
Time.  You'd be surprised at the number of people I talk to who think they need 
Real Time, but then when you ask them questions like I just did, they don't 
really need it.  I've seen Solr turn around new docs in as little as 30 seconds 
on commodity hardware w/o any special engineering effort and I've seen it 
faster than that with some engineering effort.  That isn't necessarily possible 
for every application, but...

Despite the other suggestions, what you describe still looks feasible to me in 
Solr, pending the questions above (and some followups).


On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:

 Thanks for the new information. Its really great to see so many options for 
 Lucene.
 
 In my scenario there are the following pieces:
 
 1 - A local Java client with an embedded Solr instance and its own local 
 index/s.
 2 - A remote server running Solr with index/s that are more like a repository 
 that local clients query for extra goodies.
 3 - The client is also a JXTA node so it can share indexes or documents too.
 4 - There is no browser involved what so ever.
 
 My music composing application is a local client that uses configurations 
 which would become many different document types. A subset of these 
 configurations will be bundled with the application and then many more would 
 be made available via a server/s running Solr.
 
 I would not expect the queries which would be made from within the local 
 client to be returned in real-time. I would only expect such queries to be 
 made in reasonable time and returned to the client. The client would have its 
 local Lucene index system (embedded Solr using SolrJ) which would be updated 
 with the results of the query made to the Solr instance running on the remote 
 server.
 
 Then the user on the client would issue queries to the local Lucene index/s 
 to obtain results which are used to setup contexts for different aspects of 
 the client. For example: an activated context for musical scales and rhythms 
 used for creating musical notes, an activated context for rendering with 
 layout and style information for different music symbol renderer types.
 
 I'm not yet sure but it may be best to make queries against the local Lucene 
 index/s and then convert the results into some context objects, maybe an 
 array or map (I'd like to learn more about how query results can be returned 
 as arrays or maps as well). Then the tools and renderers which require the 
 information in the contexts would do any real-time lookup directly from the 
 context objects not the local or remote Lucene or Solr index/s. The local 
 client is also a JXTA node so it can share its own index/s with fellow peers.
 
 This is how I envision this happening with my limited knowledge of 
 Lucene/Solr at this time. What are your thoughts on the feasibility of such a 
 scenario?
 
 I'm just reading through the Solr reference PDF now and looking over the Solr 
 admin application. Looking at the Schema.xml it seems to be field not 
 document oriented. From my point of view I think in terms of configuration 
 types which would be documents. In the schema it seems like only fields are 
 defined and it does not matter which configuration/document they belong to? I 
 guess this is fine as long as the indexing takes into account my unique 
 document types and I can search for them as a whole as well, not only for 
 specific values across a set of indexed documents. 
 
 Also, does the schema allow me to index certain documents into specific 
 indexes or are they all just bunched together? I'd rather have unique indexes 
 for specific document types. I've just read about multiple cores running 
 under one Solr instance, is this the only way to support multiple indexes?
 
 I'm thinking of ordering the Lucene in Action v2 book which is due this month 
 and also the Solr 1.4 book. Before I do I just need to understand a few 
 things which is why I'm writing such a long message :-)
 
 Thom
 
 
 On 2010-05-21, at 2:12 AM, Ben Eliott wrote:
 
 Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
 backs onto,  is 'eventually consistent',  so given your real-time 
 requirements,  you may want to review this in the first instance, if 
 Lucandra is of interest.
 
 On 21 May 2010, at 06:12, Walter Underwood wrote:
 
 Solr is a very good engine, but it is not real-time. You can turn off the 
 caches and reduce the delays, but it is fundamentally not real-time.
 
 I work at MarkLogic, and we have a real-time transactional search engine 
 (and respository). If you are curious, contact me directly.
 
 I do like Solr for lots of applications -- I chose it when I was at

Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread findbestopensource

To reterive all documents, You need to use the query/filter *FieldName:*:**
Regards
Aditya
www.findbestopensource.com
On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi,
   Is there any way to get all the fields (irrespective of whether
 it contains a value or null) in solrDocument.
 or
 Is there any way to get all the fields in schema.xml of the url link (
 http://localhost:8983/solr/core0/)??

 Regards,
 Raakhi

Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread findbestopensource

Resending it as there is a typo error.

To reterive all documents, You need to use the query/filter FieldName:*:* .


Regards
Aditya
www.findbestopensource.com


On Tue, May 25, 2010 at 4:29 PM, findbestopensource 
findbestopensou...@gmail.com wrote:

 To reterive all documents, You need to use the query/filter *FieldName:*:*
 *
 Regards
 Aditya
 www.findbestopensource.com


 On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 Hi,
   Is there any way to get all the fields (irrespective of whether
 it contains a value or null) in solrDocument.
 or
 Is there any way to get all the fields in schema.xml of the url link (
 http://localhost:8983/solr/core0/)??

 Regards,
 Raakhi

Re: Machine utilization while indexing

2010-05-25 Thread Thijs


Hi all,

I did some further investigation and (after turning of some filters in 
yourkit) found that is was actually the machine sending the files to 
solr that was slowing things down.


At first I couldn't find this as it turned out that yourkit hides 
org.apache.* classes. When I removed this filter, it turned out that 
atleast 50% of the CPU time was taken by 
org.apache.solr.client.solrj.util.ClientUtils.writeXML(SolrInputDocument, Writer)
This was taking so much time that the commit queues where filling up on 
the client side instead of the solr server.


I have now switched back to my custom BlockingQueue with multiple 
CommonsHttpSolrServers that use the BinaryRequestWriter. And I'm now 
able to index 80 documents in 8minutes (including optimize). And 
2.9milj documents in 32 minutes(inlc. optimize).

As the StreamingUpdateSolrServer only supports XML I can't use that.

So now I wonder why BinaryRequestWriter (and BinaryUpdateRequestHandler) 
aren't turned on by default. (eps considering some threads on the 
dev-list some time ago about setting a default schema for optimum 
performance.
Also finding out about this performance enhancement wasn't easy as it's 
hardly mentioned on the Wiki. I'll see if I can update this.


Thanks for all the advise and esp the great work on SolrLucene.
Thijs


On 20-5-2010 21:34, Chris Hostetter wrote:


: StreamingUpdateSolrServer already has multiple threads and uses multiple
: connections under the covers. At least the api says ' Uses an internal

Hmmm... i think one of us missunderstands the point behind
StreamingUpdateSolrServer and it's internal threads/queues.  (it's very
possible that it's me)

my understanding is that this allows it to manage the batching of multiple
operations for you, reusing connections as it goes -- so the the
queueSize is how many individual requests it buffers before sending the
batch to Solr, and the threadCount controls how many batches it can send
in parallel (in the event that one thread is still waiting for the
response when the queue next fills up)

But if you are only using a single thread to feed SolrRequests to a single
instance of StreamingUpdateSolrServer then there can still be lots of
opportunities for Solr itself to be idle -- as i said, it's not clear to
me if you are using multiple threads to write to your
StreamingUpdateSolrServer ... even if if you reuse the same
StreamingUpdateSolrServer instance, multiple threads in your client code
may increse the throughput (assuming that at the moment the threads in
StreamingUpdateSolrServer are largely idle)

But as i said ... this is all mostly a guess.  I'm not intimatiely
familiar with solrj.


-Hoss

Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread Rakhi Khatwani

Hi Aditya,
   i can retrieve all documents. but cannot retrieve all the fields
in a document(if it does not hv any value).

For example i get a list of documents, some of the documents have some value
for title field, and others mite not contain a value for title field. in
anycase i need to get the entry for title in getFieldNames().

How do i go about that?

Regards,
Raakhi


On Tue, May 25, 2010 at 5:07 PM, findbestopensource 
findbestopensou...@gmail.com wrote:

 Resending it as there is a typo error.

 To reterive all documents, You need to use the query/filter FieldName:*:* .


 Regards
 Aditya
 www.findbestopensource.com


 On Tue, May 25, 2010 at 4:29 PM, findbestopensource 
 findbestopensou...@gmail.com wrote:

  To reterive all documents, You need to use the query/filter
 *FieldName:*:*
  *
  Regards
  Aditya
  www.findbestopensource.com
 
 
  On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com
 wrote:
 
  Hi,
Is there any way to get all the fields (irrespective of
 whether
  it contains a value or null) in solrDocument.
  or
  Is there any way to get all the fields in schema.xml of the url link (
  http://localhost:8983/solr/core0/)??
 
  Regards,
  Raakhi

Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread findbestopensource

If a field doesn't have a value, You will get NULL on retrieving it. How
could you expect a value for a field which is not provided?

You have two options, choose either one..
1. If the fieldvalue is returned NULL then display a proper error / user
defined message. Handle the error.
 2. Add a dummy value say NO_VALUE to the title field, which doesn't have
any value.

Regards
Aditya
www.findbestopensource.com




On Tue, May 25, 2010 at 5:20 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 Hi Aditya,
   i can retrieve all documents. but cannot retrieve all the fields
 in a document(if it does not hv any value).

 For example i get a list of documents, some of the documents have some
 value
 for title field, and others mite not contain a value for title field. in
 anycase i need to get the entry for title in getFieldNames().

 How do i go about that?

 Regards,
 Raakhi


 On Tue, May 25, 2010 at 5:07 PM, findbestopensource 
  findbestopensou...@gmail.com wrote:

  Resending it as there is a typo error.
 
  To reterive all documents, You need to use the query/filter FieldName:*:*
 .
 
 
  Regards
  Aditya
  www.findbestopensource.com
 
 
  On Tue, May 25, 2010 at 4:29 PM, findbestopensource 
  findbestopensou...@gmail.com wrote:
 
   To reterive all documents, You need to use the query/filter
  *FieldName:*:*
   *
   Regards
   Aditya
   www.findbestopensource.com
  
  
   On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com
  wrote:
  
   Hi,
 Is there any way to get all the fields (irrespective of
  whether
   it contains a value or null) in solrDocument.
   or
   Is there any way to get all the fields in schema.xml of the url link (
   http://localhost:8983/solr/core0/)??
  
   Regards,
   Raakhi

Re: Tagging and excluding Filters

2010-05-25 Thread Lukas Kahwe Smith


On 25.05.2010, at 08:55, Lukas Kahwe Smith wrote:

 Now when I deselect one of the checkboxes I add an fq parameters:
 facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonfq={!tag%3Ddt}organisation_id:(-9)rows=21
 
 {!tag=dt}organisation_id:(-8)
 
 Now where I am at a loss is when I want to filter in multiple different 
 sections (like filter both organisations as well as clause information type.
 
 I tried various ways of constructing the fq prameter but I always get a parse 
 error:
 {!tag=dt}(organisation_id:(-8) AND information_type_id:(-1))
 {!tag=dt}organisation_id:(-8) AND {!tag=dt}information_type_id:(-1)
 
 For example:
 Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 
 'organisation_id:(-9) AND {!tag=dt}information_type_id:(-1)': Encountered 
  } }  at line 1, column 35.
 
 When running:
 facet=truefl=*,scoresort=score+descstart=0q=(tag_ids:(23))facet.field={!ex%3Ddt}organisation_idfacet.field={!ex%3Ddt}tag_idsfacet.field={!ex%3Ddt}addressee_idsfacet.field={!ex%3Ddt}operative_phrase_idfacet.field={!ex%3Ddt}documenttype_idfacet.field={!ex%3Ddt}information_type_idfacet.field={!ex%3Ddt}legal_valuejson.nl=mapwt=jsonfq={!tag%3Ddt}organisation_id:(-9)+AND+{!tag%3Ddt}information_type_id:(-1)rows=21}


The following syntax seems to do what I want:
{!tag=dt}!(organisation_id:(8) OR information_type_id:(2))

regards,
Lukas Kahwe Smith
m...@pooteeweet.org

Re: sort by field length

2010-05-25 Thread Erick Erickson

Ah, I may have misunderstood, I somehow got it in my mind
you were talking about the length of each term (as in string length).

But if you're looking at the field length as the count of terms, that's
another question, sorry for the confusion...

I have to ask, though, why you want to sort this way? The relevance
calculations already factor in both term frequency and field length. What's
the use-case for sorting by field length given the above?

Best
Erick

On Tue, May 25, 2010 at 3:40 AM, Sascha Szott sz...@zib.de wrote:

 Hi Erick,


 Erick Erickson wrote:

 Are you sure you want to recompute the length when sorting?
 It's the classic time/space tradeoff, but I'd suggest that when
 your index is big enough to make taking up some more space
 a problem, it's far too big to spend the cycles calculating each
 term length for sorting purposes considering you may be
 sorting all the terms in your index worst-case.

 Good point, thank you for the clarification. I thought that Lucene
 internally stores the field length (e.g., in order to compute the relevance)
 and getting this information at query time requires only a simple lookup.

 -Sascha



 But you could consider payloads for storing the length, although
 that would still be redundant...

 Best
 Erick

 On Mon, May 24, 2010 at 8:30 AM, Sascha Szottsz...@zib.de  wrote:

  Hi folks,

 is it possible to sort by field length without having to (redundantly)
 save
 the length information in a seperate index field? At first, I thought to
 accomplish this using a function query, but I couldn't find an
 appropriate
 one.

 Thanks in advance,
 Sascha

Re: Faceted search not working?

2010-05-25 Thread Jean-Sebastien Vachon

Is the FacetComponent loaded at all? 

requestHandler name=standard class=solr.SearchHandler default=true
  arr name=components
  strquery/str
  strfacet/str
   /arr
/requestHandler


On 2010-05-25, at 3:32 AM, Sascha Szott wrote:

 Hi Birger,
 
 Birger Lie wrote:
 I don't think the bolean fields is mapped to on and off :)
 You can use true and on interchangeably.
 
 -Sascha
 
 
 
 -birger
 
 -Original Message-
 From: Ilya Sterin [mailto:ster...@gmail.com]
 Sent: 24. mai 2010 23:11
 To: solr-user@lucene.apache.org
 Subject: Faceted search not working?
 
 I'm trying to perform a faceted search without any luck.  Result set doesn't 
 return any facet information...
 
 http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title
 
 I'm getting the result set, but no face information present?  Is there 
 something else that needs to happen to turn faceting on?
 
 I'm using latest Solr 1.4 release.  Data is indexed from the database using 
 dataimporter.
 
 Thanks.
 
 Ilya Sterin

question about indexing...

2010-05-25 Thread Jörg Agatz

I have a work!,
i musst indexing a lot of E-Mails, so i will create a Script to generate me
a xml of the Mails.

Now is the question, what happens when i creade a field body and in this
field comes a lot of  or  like this:
Confidentiality Caution: This message and all its included content and

assets are confidential and for the individual use of the entity to
whom it is send to only. If you, the reader of this message, have
recieved this communication by error please notify me about this
immediately, by return address, and delete the message and its assets.
Thank you.
 Apropos: In eurem Footer scheint ein r zu fehlen (Headqua_r_ter).

 Snapt Pty Ltd: Stephan Plesnik schrieb:

 Headquaters:



 
 Diese E-Mail, einschließlich sämtlicher mit ihr übertragenen Dateie
 n,
 ist vertraulich und ist für die ausschließliche Verwendung durch die
 Person oder das Unternehmen vorgesehen, an die/das sie adressiert ist.
 Sollten Sie diese E-Mail fälschlicherweise erhalten haben,
 benachrichtigen Sie bitte unseren Systemverwalter
 (serv...@plesnik.de).
 Diese E-Mail wurde auf die Abwesenheit von Computerviren überprüft.
 ---
 

Or

hallo Mr. xy
thanks for greats
dear Mr. xyz


I think it dosen´t Work!
howcan i make it, so that each content inpuit in the Solr

Re: IndexSearcher and Caches

2010-05-25 Thread Rahul R

Chris,
I am using SolrIndexSearcher to get a handle to the total number of records
in the index. I am doing it like this :
int num =
Integer.parseInt((String)solrSearcher.getStatistics().get(numDocs).toString());
Please let me know if there is a better way to do this.

Mark,
I can tell you what I do in my applicaiton. We provide a tool to do the
index update and assume that the user will always use it to create/update
the index. Whenever an update happens, we notify the querying application
and it creates a new instance of SolrCore, SolrServer etc. These continue to
be shared across multiple users (as statics) till the next update happens.

Thank you.

Regards
Rahul

On Tue, May 25, 2010 at 4:18 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Thank you I found the API to get the existing SolrIndexSearcher to be
 : present in SolrCore:
 : SolrCore.getSearcher().get()

 I think perhaps you need to take 5 big steps back and explain what your
 goal is.  99.999% of all solr users should never care about that method --
 even the 99.9% of the folks writing java code and using EmbeddedSolr
 should never ever have a need to call those -- so what exactly is it you
 are doing, and how did you get along hte path you find yourself on?

 this thread started with some fairly innoculous questions about how caches
 worked in regardes to new searchers -- which is all fine and dandy, those
 concepts that solr users should be aware of ... in the abstract.  you
 should almost never be instantiating those IndexSearchers or Caches
 yourself.

 Stick with teh SolrServer abstraction provided by SolrJ...

 http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

 http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrServer.html


 -Hoss

Re: Faceted search not working?

2010-05-25 Thread Sascha Szott


Hi,

please note, that the FacetComponent is one of the six search components 
that are automatically associated with solr.SearchHandler (this holds 
also for the QueryComponent).


Another note: By using name=components all default components will be 
replaced by the components you explicitly mentioned (i.e., 
QueryComponent and FacetComponent in your example). To avoid this, use 
name=last-components instead.


-Sascha

Jean-Sebastien Vachon wrote:

Is the FacetComponent loaded at all?

requestHandler name=standard class=solr.SearchHandler default=true
   arr name=components
   strquery/str
   strfacet/str
/arr
/requestHandler


On 2010-05-25, at 3:32 AM, Sascha Szott wrote:


Hi Birger,

Birger Lie wrote:

I don't think the bolean fields is mapped to on and off :)

You can use true and on interchangeably.

-Sascha




-birger

-Original Message-
From: Ilya Sterin [mailto:ster...@gmail.com]
Sent: 24. mai 2010 23:11
To: solr-user@lucene.apache.org
Subject: Faceted search not working?

I'm trying to perform a faceted search without any luck.  Result set doesn't 
return any facet information...

http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title

I'm getting the result set, but no face information present?  Is there 
something else that needs to happen to turn faceting on?

I'm using latest Solr 1.4 release.  Data is indexed from the database using 
dataimporter.

Thanks.

Ilya Sterin

Re: question about indexing...

2010-05-25 Thread Erik Hatcher

Well, you'll just have to create valid XML, either encoding some  
characters or using CDATA sections.


Erik

On May 25, 2010, at 10:06 AM, Jörg Agatz wrote:


I have a work!,
i musst indexing a lot of E-Mails, so i will create a Script to  
generate me

a xml of the Mails.

Now is the question, what happens when i creade a field body and  
in this

field comes a lot of  or  like this:
Confidentiality Caution: This message and all its included content and

assets are confidential and for the individual use of the entity to
whom it is send to only. If you, the reader of this message, have
recieved this communication by error please notify me about this
immediately, by return address, and delete the message and its assets.
Thank you.

Apropos: In eurem Footer scheint ein r zu fehlen (Headqua_r_ter).

Snapt Pty Ltd: Stephan Plesnik schrieb:


Headquaters:








Diese E-Mail, einschließlich sämtlicher mit ihr übertragenen Dateie
n,
ist vertraulich und ist für die ausschließliche Verwendung durch die
Person oder das Unternehmen vorgesehen, an die/das sie adressiert  
ist.

Sollten Sie diese E-Mail fälschlicherweise erhalten haben,
benachrichtigen Sie bitte unseren Systemverwalter
(serv...@plesnik.de).
Diese E-Mail wurde auf die Abwesenheit von Computerviren überprüft.
---



Or

hallo Mr. xy
thanks for greats
dear Mr. xyz


I think it dosen´t Work!
howcan i make it, so that each content inpuit in the Solr

Re: question about indexing...

2010-05-25 Thread Jörg Agatz

ok, done..

But now i dosent find any word in the CDATA field.
i make :

field name=P_CONTENT_ITEMS_COMMENT![CDATA[
Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere
Ruhe haben.
bich du er sie es/b
Ha ha Ha ha ha ha ha ha ha ha
]]/field

it is a string field Multivalued..

King

Re: question about indexing...

2010-05-25 Thread Erik Hatcher

You have to provide more details than that.  We need to know the field  
definition for that named field, the corresponding field type  
definition, and the exact request you're making to Solr that you think  
should find this document.


And most importantly, did you commit/  :)

Erik

On May 25, 2010, at 11:22 AM, Jörg Agatz wrote:


ok, done..

But now i dosent find any word in the CDATA field.
i make :

field name=P_CONTENT_ITEMS_COMMENT![CDATA[
Hallo leute. mein name ist dein name und wir wollen eigentlich nur  
unsere

Ruhe haben.
bich du er sie es/b
Ha ha Ha ha ha ha ha ha ha ha
]]/field

it is a string field Multivalued..

King

Re: question about indexing...

2010-05-25 Thread Jörg Agatz

i create a new Index, but nothing Change.

 field name=COMMENT type=string indexed=true stored=true
multiValued=true/





field name=COMMENT
![CDATA[
Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere
Ruhe haben.
bich du er sie es/b
Ha ha Ha ha ha ha ha ha ha ha
]]/field

I search for :

 *:* 
I fond it

i search vor hallo Hallo hallo* Hallo*or some other content from the
CDATA field i dosent.

Re: caching on unique queries

2010-05-25 Thread Chris Hostetter


: Pretty much every one of my queries is going to be unique. However, the 
: query is fairly complex and also contains both unique and non-unique 
: data. In the query, some fields will be unique (e.g description), but 
: other fields will be fairly common (e.g. category). If we could use 
: those common fields as filters, it would be easy to use the filter 
: cache. I could just separate the filters and let the filter cache do its 
: thing. Unfortunately, due to the nature of our application, pretty much 
: every field is just a boost.
...
: Is there anyway to cache part of the query? Or basically cache 
: subqueries? I have my own request handler, so I am willing to write the 
: necessary code. I am fearful that the best performance may be to just 
: turn off caching.

One thing a custom plugin could possibly do in cases like this is to use 
the filterCache to cache the DocSets corripsonding to the re-used 
portions of your queries (on a graunular level) and then wrap those 
DocSets in a Query facade to build them up into a big BooleanQuery -- 
which you should explicitly make sure Solr does not cache (because the 
Query objects will be *huge* if they contain all those DocSets wrapped up)

Note: i did this once a *long* time ago and it wokred out ok, but this may 
be a lot harder now that we have per segment searching/scoring -- i'm not 
sure that the Query/Scorer API gives you everything you need to be able to 
return segment based docIds from a global DocSet.



-Hoss

Help me understand query syntax of subqueries

2010-05-25 Thread Tigi Scramble

Any idea why this query returns 0 records:
sexual assault AND (-obama)
while this one returns 1400 ?
sexual assault AND -(obama)


Some debug info:

sexual assault AND (-obama), translates to:  +text:sexual assault
+(-text:obama), returns 0 records

sexual assault AND -(obama), translates to:  +text:sexual assault
-text:obama, returns 1400 records

sexual assault AND obama,translates to:  +text:sexual assault
+text:obama, return 53 records

(-obama), translates to: -text:obama, returns 716295 records
-(obama), translates to: -text:obama, returns 716295 records


I am using Solr 1.4
qt:standard
QParser:LuceneQParser

Thanks,
Mis

Re: How real-time are Solr/Lucene queries?

2010-05-25 Thread Jason Rutherglen

The main issue is if you're using facets, which are currently
inefficient for the realtime use case because they're created on the
entire set of segment/readers.  Field caches in Lucene are per segment
and so don't have this problem.

On Tue, May 25, 2010 at 4:09 AM, Grant Ingersoll gsing...@apache.org wrote:
 How many docs are in the batch you are pulling down?  How many docs/second do 
 you expect on the index size?  How big are the docs?  What do you expect in 
 terms of queries per second?  How fast do new documents need to be available 
 on the local server?  How much analysis do you have to do?  Also, define Real 
 Time.  You'd be surprised at the number of people I talk to who think they 
 need Real Time, but then when you ask them questions like I just did, they 
 don't really need it.  I've seen Solr turn around new docs in as little as 30 
 seconds on commodity hardware w/o any special engineering effort and I've 
 seen it faster than that with some engineering effort.  That isn't 
 necessarily possible for every application, but...

 Despite the other suggestions, what you describe still looks feasible to me 
 in Solr, pending the questions above (and some followups).


 On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:

 Thanks for the new information. Its really great to see so many options for 
 Lucene.

 In my scenario there are the following pieces:

 1 - A local Java client with an embedded Solr instance and its own local 
 index/s.
 2 - A remote server running Solr with index/s that are more like a 
 repository that local clients query for extra goodies.
 3 - The client is also a JXTA node so it can share indexes or documents too.
 4 - There is no browser involved what so ever.

 My music composing application is a local client that uses configurations 
 which would become many different document types. A subset of these 
 configurations will be bundled with the application and then many more would 
 be made available via a server/s running Solr.

 I would not expect the queries which would be made from within the local 
 client to be returned in real-time. I would only expect such queries to be 
 made in reasonable time and returned to the client. The client would have 
 its local Lucene index system (embedded Solr using SolrJ) which would be 
 updated with the results of the query made to the Solr instance running on 
 the remote server.

 Then the user on the client would issue queries to the local Lucene index/s 
 to obtain results which are used to setup contexts for different aspects of 
 the client. For example: an activated context for musical scales and rhythms 
 used for creating musical notes, an activated context for rendering with 
 layout and style information for different music symbol renderer types.

 I'm not yet sure but it may be best to make queries against the local Lucene 
 index/s and then convert the results into some context objects, maybe an 
 array or map (I'd like to learn more about how query results can be returned 
 as arrays or maps as well). Then the tools and renderers which require the 
 information in the contexts would do any real-time lookup directly from the 
 context objects not the local or remote Lucene or Solr index/s. The local 
 client is also a JXTA node so it can share its own index/s with fellow peers.

 This is how I envision this happening with my limited knowledge of 
 Lucene/Solr at this time. What are your thoughts on the feasibility of such 
 a scenario?

 I'm just reading through the Solr reference PDF now and looking over the 
 Solr admin application. Looking at the Schema.xml it seems to be field not 
 document oriented. From my point of view I think in terms of configuration 
 types which would be documents. In the schema it seems like only fields are 
 defined and it does not matter which configuration/document they belong to? 
 I guess this is fine as long as the indexing takes into account my unique 
 document types and I can search for them as a whole as well, not only for 
 specific values across a set of indexed documents.

 Also, does the schema allow me to index certain documents into specific 
 indexes or are they all just bunched together? I'd rather have unique 
 indexes for specific document types. I've just read about multiple cores 
 running under one Solr instance, is this the only way to support multiple 
 indexes?

 I'm thinking of ordering the Lucene in Action v2 book which is due this 
 month and also the Solr 1.4 book. Before I do I just need to understand a 
 few things which is why I'm writing such a long message :-)

 Thom


 On 2010-05-21, at 2:12 AM, Ben Eliott wrote:

 Further to earlier note re Lucandra.  I note that Cassandra, which Lucandra 
 backs onto,  is 'eventually consistent',  so given your real-time 
 requirements,  you may want to review this in the first instance, if 
 Lucandra is of interest.

 On 21 May 2010, at 06:12, Walter Underwood wrote:

 Solr is a very good engine, but it is not

Re: Problem with extended dismax, minus prefix (to mean NOT) and interaction with mm?

2010-05-25 Thread Chris Hostetter

: I'm running edismax (on both a 1.4 with patch and a branch_3x version) and
: I'm seeing something I don't expect.
...
:  str name=rawquerystringdog cat -trilogy/str
:  str name=querystringdog cat -trilogy/str
:  str name=parsedqueryallfields:dog allfields:cat
: -allfields:trilogi/str
:  str name=parsedquery_toStringallfields:dog allfields:cat
: -allfields:trilogi/str

Hmmm... something is really odd here -- are you sure you are using edismax 
as your query parser? ... because with Solr 1.4 and edismax you should 
still be seeing DisjunctionMaxQuery show up in your parsedquery 
output.

that said: i can definitely confirm that i see a descrepency between 
dismax and edismax in how they deal with the mm param (on trunk and in 
1.4)...

MM is ignored...
http://localhost:8983/solr/select?debugQuery=truedefType=edismaxqf=textq=xxx+yyy+zzz+-1234mm=2

MM is used...
http://localhost:8983/solr/select?debugQuery=truedefType=dismaxqf=textq=xxx+yyy+zzz+-1234mm=2

...the negative clause definitely seems to be what triggers it.

I've added this to SOLR-1553 (edismax is still considered an open issue, 
it was only experimental in Solr 1.4)






-Hoss

Does SOLR provide a java class to perform url-encoding

2010-05-25 Thread JohnRodey


I would like to leverage on whatever SOLR provides to properly url-encode a
search string.

For example a user enters:
mr. bill oh no

The URL submitted by the admin page is:
http://localhost:8983/solr/select?indent=onversion=2.2q=%22mr.+bill%22+oh+nofq=start=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl=

Since the admin page uses it I would image that this functionality is there,
but having some trouble finding it.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p842660.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Does SOLR provide a java class to perform url-encoding

2010-05-25 Thread Sean Timm


Java provides one.  You probably want to use utf-8 as the encoding scheme.

http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html

Note you also will want to strip or escape character that are meaningful 
in the Solr/Lucene query syntax.

http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Escaping%20Special%20Characters

-Sean

On 5/25/2010 1:20 PM, JohnRodey wrote:

I would like to leverage on whatever SOLR provides to properly url-encode a
search string.

For example a user enters:
mr. bill oh no

The URL submitted by the admin page is:
http://localhost:8983/solr/select?indent=onversion=2.2q=%22mr.+bill%22+oh+nofq=start=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl=

Since the admin page uses it I would image that this functionality is there,
but having some trouble finding it.

Re: How real-time are Solr/Lucene queries?

2010-05-25 Thread Thomas J. Buhr

My documents are all quite small if not down right tiny, there is not much 
analysis to do. I plan to mainly use Solr for indexing application 
configuration data which there is a lot of and I have all pre-formated. Since 
it is a music application there are many score templates, scale and rhythm 
strings, notation symbol skins, etc. Then there are slightly more usual things 
to index like application help pages and tutorials.

In terms of queries per second there will be a lot being fired by our painter. 
In our application data is flowing into a painter who in turn delegates 
specific painting tasks to renderer objects. These renderer objects then make 
many queries extremely fast to the embedded Solr indexes for data they need, 
such as layout and style values. 

Believe me there is a lot of detailed data involved in music notation and 
abstracting it into configurations in the form of index documents is a good way 
to manage such data. Further, the data in the form of documents work as a form 
of plugins so that alternate configurations for different notation types can be 
added to the index. Then via simple search it is possible to dialup a certain 
set of documents which contain all the details of a given notation. Mean while 
the renderer objects remain generic and are just reconfigured with the 
different indexed configuration documents.

Will making many fast queries from renderers to an embedded local Solr index 
slow my painting down?

Thom


On 2010-05-25, at 6:09 AM, Grant Ingersoll wrote:

 How many docs are in the batch you are pulling down?  How many docs/second do 
 you expect on the index size?  How big are the docs?  What do you expect in 
 terms of queries per second?  How fast do new documents need to be available 
 on the local server?  How much analysis do you have to do?  Also, define Real 
 Time.  You'd be surprised at the number of people I talk to who think they 
 need Real Time, but then when you ask them questions like I just did, they 
 don't really need it.  I've seen Solr turn around new docs in as little as 30 
 seconds on commodity hardware w/o any special engineering effort and I've 
 seen it faster than that with some engineering effort.  That isn't 
 necessarily possible for every application, but...
 
 Despite the other suggestions, what you describe still looks feasible to me 
 in Solr, pending the questions above (and some followups).
 
 
 On May 21, 2010, at 4:05 AM, Thomas J. Buhr wrote:
 
 Thanks for the new information. Its really great to see so many options for 
 Lucene.
 
 In my scenario there are the following pieces:
 
 1 - A local Java client with an embedded Solr instance and its own local 
 index/s.
 2 - A remote server running Solr with index/s that are more like a 
 repository that local clients query for extra goodies.
 3 - The client is also a JXTA node so it can share indexes or documents too.
 4 - There is no browser involved what so ever.
 
 My music composing application is a local client that uses configurations 
 which would become many different document types. A subset of these 
 configurations will be bundled with the application and then many more would 
 be made available via a server/s running Solr.
 
 I would not expect the queries which would be made from within the local 
 client to be returned in real-time. I would only expect such queries to be 
 made in reasonable time and returned to the client. The client would have 
 its local Lucene index system (embedded Solr using SolrJ) which would be 
 updated with the results of the query made to the Solr instance running on 
 the remote server.
 
 Then the user on the client would issue queries to the local Lucene index/s 
 to obtain results which are used to setup contexts for different aspects of 
 the client. For example: an activated context for musical scales and rhythms 
 used for creating musical notes, an activated context for rendering with 
 layout and style information for different music symbol renderer types.
 
 I'm not yet sure but it may be best to make queries against the local Lucene 
 index/s and then convert the results into some context objects, maybe an 
 array or map (I'd like to learn more about how query results can be returned 
 as arrays or maps as well). Then the tools and renderers which require the 
 information in the contexts would do any real-time lookup directly from the 
 context objects not the local or remote Lucene or Solr index/s. The local 
 client is also a JXTA node so it can share its own index/s with fellow peers.
 
 This is how I envision this happening with my limited knowledge of 
 Lucene/Solr at this time. What are your thoughts on the feasibility of such 
 a scenario?
 
 I'm just reading through the Solr reference PDF now and looking over the 
 Solr admin application. Looking at the Schema.xml it seems to be field not 
 document oriented. From my point of view I think in terms of configuration 
 types which would be documents. In the schema

Re: Using solrJ to get all fields in a particular schema/index

2010-05-25 Thread Chris Hostetter


:Is there any way to get all the fields (irrespective of whether
: it contains a value or null) in solrDocument.

no.  a document only has Field instances for the fields which it has 
values for.  it's also not a feature that would even be theoretically 
posisbly to add, becuase of dynamicFields.  If you have even one single 
dynmaicField declaration, then there is an infinite number of possible 
fields.

: Is there any way to get all the fields in schema.xml of the url link (
: http://localhost:8983/solr/core0/)??

Take a look at http://localhost:8983/solr/core0/admin/luke?show=schema ... 
it can programaticly return details about the schema (including all the 
fields and dynamicFields) to your application.



-Hoss

Re: Faceted search not working?

2010-05-25 Thread Ilya Sterin

Sascha thanks for the response, here is the output...

?xml version=1.0 encoding=UTF-8?
response
  lst name=responseHeader
int name=status0/int
int name=QTime0/int
lst name=params
  str name=wtxml/str
  str name=qtitle:*/str
  str name=fltitle/str
/lst
  /lst
  result name=response numFound=3 start=0
doc
  str name=titleBaseball game/str
/doc
doc
  str name=titleSoccer game/str
/doc
doc
  str name=titleFootball game/str
/doc
  /result
/response


On Mon, May 24, 2010 at 5:39 PM, Sascha Szott sz...@zib.de wrote:
 Hi Ilya,

 Ilya Sterin wrote:

 I'm trying to perform a faceted search without any luck.  Result set
 doesn't return any facet information...

 http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title

 I'm getting the result set, but no face information present?  Is there
 something else that needs to happen to turn faceting on?

 No.

 What does http://localhost:8080/solr/select/?q=title:*fl=titlewt=xml
 return?

 -Sascha

Re: Faceted search not working? (RESOLVED)

2010-05-25 Thread Ilya Sterin

Ah, the issue was explicitly specifying components...

arr name=components
  strquery/str
/arr

I don't remember changing this during default install, commenting this
out enabled faceted search component.

Thanks all for the help.

Ilya

On Tue, May 25, 2010 at 10:38 AM, Sascha Szott sz...@zib.de wrote:
 Hi,

 please note, that the FacetComponent is one of the six search components
 that are automatically associated with solr.SearchHandler (this holds also
 for the QueryComponent).

 Another note: By using name=components all default components will be
 replaced by the components you explicitly mentioned (i.e., QueryComponent
 and FacetComponent in your example). To avoid this, use
 name=last-components instead.

 -Sascha

 Jean-Sebastien Vachon wrote:

 Is the FacetComponent loaded at all?

 requestHandler name=standard class=solr.SearchHandler default=true
   arr name=components
       strquery/str
       strfacet/str
    /arr
 /requestHandler


 On 2010-05-25, at 3:32 AM, Sascha Szott wrote:

 Hi Birger,

 Birger Lie wrote:

 I don't think the bolean fields is mapped to on and off :)

 You can use true and on interchangeably.

 -Sascha



 -birger

 -Original Message-
 From: Ilya Sterin [mailto:ster...@gmail.com]
 Sent: 24. mai 2010 23:11
 To: solr-user@lucene.apache.org
 Subject: Faceted search not working?

 I'm trying to perform a faceted search without any luck.  Result set
 doesn't return any facet information...

 http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title

 I'm getting the result set, but no face information present?  Is there
 something else that needs to happen to turn faceting on?

 I'm using latest Solr 1.4 release.  Data is indexed from the database
 using dataimporter.

 Thanks.

 Ilya Sterin

Re: Does SOLR provide a java class to perform url-encoding

2010-05-25 Thread JohnRodey


Thanks Sean, that was exactly what I need.  One question though...

How to correctly retain the Solr specific characters.
I tried adding escape chars but URLEncoder doesn't seem to care about that:
Example: 
String s1 = \mr. bill\ oh n?;
String s2 = \\\mr. bill\\\ oh n\\?;
String encoded1 = URLEncoder.encode(s1, UTF-8);
String encoded2 = URLEncoder.encode(s2, UTF-8);
System.out.println(encoded1);
System.out.println(encoded2);
Output:
%22mr.+bill%22+oh+n%3F
%5C%22mr.+bill%5C%22+oh+n%5C%3F

Should I allow the URLEncoder to translate s1, then replace %22 with , %3F
with ?, and so on?
Or is there a better way?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p842744.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: SOLR-343 date facet mincount patch

2010-05-25 Thread Umesh_


Hoss,

I was able to successfully apply the path Solr-343 and even after applying
the patch, date facet minCount does not work. Appropriate part of response
are as given below:


 [responseHeader] = object(SolrObject)#107 (3) {
[status] = int(0)
[QTime] = int(4)
[params] = object(SolrObject)#108 (18) {
  [facet.date.start] = string(17) NOW/YEAR-200YEARS
  [facet] = string(4) true
  [indent] = string(2) on
  [facet.date] = string(11) r_take_date
  [wt] = string(3) xml
  [f.Instrument.facet.mincount] = string(1) 1
  [version] = string(3) 2.2
  [rows] = string(2) 20
  [f.r_take_date.facet.mincount] = string(1) 1
  [f.Target_Audience.facet.mincount] = string(1) 1
  [start] = string(1) 0
  [q] = string(3) *:*
  [f.Language.facet.mincount] = string(1) 1
  [f.Location.facet.mincount] = string(1) 1
  [facet.field] = array(5) {
[0] = string(4) Type
[1] = string(10) Instrument
[2] = string(8) Language [responseHeader] =
object(SolrObject)#107 (3) {
[status] = int(0)
[QTime] = int(4)
[params] = object(SolrObject)#108 (18) {
  [facet.date.start] = string(17) NOW/YEAR-200YEARS
  [facet] = string(4) true
  [indent] = string(2) on
  [facet.date] = string(11) r_take_date
  [wt] = string(3) xml
  [f.Instrument.facet.mincount] = string(1) 1
  [version] = string(3) 2.2
  [rows] = string(2) 20
  [f.r_take_date.facet.mincount] = string(1) 1
  [f.Target_Audience.facet.mincount] = string(1) 1
  [start] = string(1) 0
  [q] = string(3) *:*
  [f.Language.facet.mincount] = string(1) 1
  [f.Location.facet.mincount] = string(1) 1
  [facet.field] = array(5) {
[0] = string(4) Type
[1] = string(10) Instrument
[2] = string(8) Language
[3] = string(8) Location
[4] = string(15) Target_Audience
  }
  [facet.date.gap] = string(6) +8YEAR
  [f.Type.facet.mincount] = string(1) 1
  [facet.date.end] = string(8) NOW/YEAR
}

[3] = string(8) Location
[4] = string(15) Target_Audience
  }
  [facet.date.gap] = string(6) +8YEAR
  [f.Type.facet.mincount] = string(1) 1
  [facet.date.end] = string(8) NOW/YEAR
}

Facet info-

 [facet_counts] = object(SolrObject)#130 (3) {
[facet_queries] = object(SolrObject)#131 (0) {
}
[facet_fields] = object(SolrObject)#132 (5) {
  [Type] = object(SolrObject)#133 (26) {
[Instrumental] = int(1673)
[Vocal] = int(977)
[Spoken] = int(38)
[tenor vocal] = int(6)
[baritone vocal] = int(4)
[cornet] = int(4)
[soprano vocal] = int(3)
[bass vocal] = int(2)
[flute] = int(2)
[whistling] = int(2)
[bagpipes] = int(1)
[barrel piano] = int(1)
[bass trombone] = int(1)
[chimes] = int(1)
[clarinet] = int(1)
[contralto] = int(1)
[euphonium] = int(1)
[mandolin] = int(1)
[piano] = int(1)
[piccolo] = int(1)
[saxophone] = int(1)
[trombone] = int(1)
[trumpet] = int(1)
[violin] = int(1)
[violoncello] = int(1)
[xylophone] = int(1)
  }
  [Instrument] = object(SolrObject)#134 (13) {
[ baritone horn] = int(54)
[ bassoon] = int(54)
[ cornets] = int(54)
[ piccolo] = int(54)
[ tubas] = int(54)
[clarinets] = int(54)
[ traps ] = int(39)
[ trombones] = int(39)
[ French horns] = int(33)
[ horns] = int(21)
[ oboe] = int(18)
[ traps] = int(15)
[ trombones ] = int(15)
  }
  [Language] = object(SolrObject)#135 (4) {
[Italian] = int(43)
[French] = int(13)
[Polish] = int(8)
[Spanish] = int(8)
  }
  [Location] = object(SolrObject)#136 (6) {
[Camden, New Jersey [unconfirmed]] = int(1555)
[Philadelphia, Pennsylvania [unconfirmed]] = int(979)
[Camden, New Jersey] = int(101)
[New York, New York] = int(29)
[Camden, New Jersey. Church Bldg.] = int(19)
[Philadelphia, Pennsylvania] = int(1)
  }
  [Target_Audience] = object(SolrObject)#137 (5) {
[Scottish] = int(3)
[Jewish] = int(2)
[Bohemian (Czech)] = int(1)
[Polish] = int(1)
[Swedish] = int(1)
  }
}
[facet_dates] = object(SolrObject)#138 (1) {
  [r_take_date] = object(SolrObject)#139 (27) {
[1810-01-01T00:00:00Z] = int(0)
[1818-01-01T00:00:00Z] = int(0)
[1826-01-01T00:00:00Z] = int(0)
[1834-01-01T00:00:00Z] = int(0)
[1842-01-01T00:00:00Z] = int(0)
[1850-01-01T00:00:00Z] = int(0)
[1858-01-01T00:00:00Z] = int(0)
[1866-01-01T00:00:00Z] = int(0)
[1874-01-01T00:00:00Z] = int(0)
[1882-01-01T00:00:00Z] = int(0)
[1890-01-01T00:00:00Z] = int(0)
[1898-01-01T00:00:00Z] = int(0)

Re: SOLR-343 date facet mincount patch

2010-05-25 Thread Umesh_


Chris,

Please ignore the repeated response header due to typo in the previous
message.

~Umesh
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Re-SOLR-343-date-facet-mincount-patch-tp789556p842863.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing stalls reads

2010-05-25 Thread Lance Norskog

This sounds like you have the same solrconfig for the slave and the
master? You should turn off autoCommit on the slave. Only the master
should autoCommit.

You should set up the ReplicationHandler. This moves index updates
from the indexer to the query server.

http://www.lucidimagination.com/search/document/CDRG_ch10_10.3.3.5?q=replication

http://www.lucidimagination.com/search/out?u=http%3A%2F%2Fwiki.apache.org%2Fsolr%2FSolrReplication

On Mon, May 24, 2010 at 2:33 AM, Manish N m1n...@live.com wrote:

 Hey,

 I'm using solr 1.4  I've a master / slave setup, I use the slave for all my 
 read operations  commits are scheduled every 20 mins or every 1 docs. 
 Now I think slave shouldn't build index but fetch ones created on Master, but 
 I see it creating indexes, during which all read  stalls.

 Now I don't think thats a common behavior or is there any other way to stop 
 this ?

 Also how do i stop slave from removing the old indexes till AutoWarming is 
 done ? is there a way to achieve this ?

 Thnx n Regards,

 - Manish

 _
 The amazing world in sharp snaps
 http://news.in.msn.com/gallery/archive.aspx



-- 
Lance Norskog
goks...@gmail.com

Solr read-only core

2010-05-25 Thread Yao


Is there a way to open a Solr index/core in read-only mode? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-read-only-core-tp843049p843049.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr read-only core

2010-05-25 Thread Markus Jelsma

Hi,

 

I'd guess there are two ways in doing this but i've never seen any 
solrconfig.xml file having any directives that explicitly do not allow for 
updates.

 

You'd either have a proxy in front that simply won't allow any other HTTP 
method than GET and HEAD, or you could remove the update request handler from 
your solrconfig.xml file. I've never tried the latter but i'd figure that 
without a request handler to accommodate updates, no updates can be made.

 

Cheers,
 
-Original message-
From: Yao y...@ford.com
Sent: Tue 25-05-2010 21:49
To: solr-user@lucene.apache.org; 
Subject: Solr read-only core


Is there a way to open a Solr index/core in read-only mode? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-read-only-core-tp843049p843049.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: IndexSearcher and Caches

2010-05-25 Thread Lance Norskog

The stats.jsp page walks the internal JMX beans. It prints out the
numbers of documents among other things. I would look at how that
works instead of writing your own thing for the internal APIs.

They may have changed from Solr 1.3 to 1.4 and will change further for
1.5 (4.0 is the new name?).

On Tue, May 25, 2010 at 7:11 AM, Rahul R rahul.s...@gmail.com wrote:
 Chris,
 I am using SolrIndexSearcher to get a handle to the total number of records
 in the index. I am doing it like this :
 int num =
 Integer.parseInt((String)solrSearcher.getStatistics().get(numDocs).toString());
 Please let me know if there is a better way to do this.

 Mark,
 I can tell you what I do in my applicaiton. We provide a tool to do the
 index update and assume that the user will always use it to create/update
 the index. Whenever an update happens, we notify the querying application
 and it creates a new instance of SolrCore, SolrServer etc. These continue to
 be shared across multiple users (as statics) till the next update happens.

 Thank you.

 Regards
 Rahul

 On Tue, May 25, 2010 at 4:18 AM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:


 : Thank you I found the API to get the existing SolrIndexSearcher to be
 : present in SolrCore:
 : SolrCore.getSearcher().get()

 I think perhaps you need to take 5 big steps back and explain what your
 goal is.  99.999% of all solr users should never care about that method --
 even the 99.9% of the folks writing java code and using EmbeddedSolr
 should never ever have a need to call those -- so what exactly is it you
 are doing, and how did you get along hte path you find yourself on?

 this thread started with some fairly innoculous questions about how caches
 worked in regardes to new searchers -- which is all fine and dandy, those
 concepts that solr users should be aware of ... in the abstract.  you
 should almost never be instantiating those IndexSearchers or Caches
 yourself.

 Stick with teh SolrServer abstraction provided by SolrJ...

 http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer

 http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrServer.html


 -Hoss






-- 
Lance Norskog
goks...@gmail.com

Re: question about indexing...

2010-05-25 Thread Lance Norskog

Change type=string to type=text. This causes the field to be
analyzed and then searching on words finds the document.



On Tue, May 25, 2010 at 8:34 AM, Jörg Agatz joerg.ag...@googlemail.com wrote:
 i create a new Index, but nothing Change.

  field name=COMMENT type=string indexed=true stored=true
 multiValued=true/





 field name=COMMENT
![CDATA[
 Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere
 Ruhe haben.
 bich du er sie es/b
 Ha ha Ha ha ha ha ha ha ha ha
 ]]/field

 I search for :

  *:* 
 I fond it

 i search vor hallo Hallo hallo* Hallo*or some other content from the
 CDATA field i dosent.




-- 
Lance Norskog
goks...@gmail.com

Enhancing Solr relevance functions through predefined constants

2010-05-25 Thread Prasanna R

Hi all,

I have a suggestion for improving relevance functions in Solr by way of
providing access to a set of pre-defined constants in Solr queries.
Specifically, the number of documents indexed, the number of unique terms in
a field, the total number of terms in a field, etc. are some of the
query-time constants that I believe can be made use of in function queries
as well as boosted queries to aid in the relevance calculations.

One of the tips provided in the Solr 1.4 Enterprise search server book
relating to using function queries is this -  If your data changes in ways
causing you to alter the constants in your function queries, then consider
implementing a periodic automated test of your Solr data to ensure that the
data fits within expected bounds.

I believe that having access to some of the constants mentioned above will
help in coming up with dynamic boost values that adapts as the underlying
data changes. I think this makes sense given that one of the basic relevancy
scoring metric - idf - is directly influenced by the number of documents
indexed.

I can imagine some of these constants being useful in Function queries and
Boosted Queries but am not able to think of a neat little usage example.

I request you all to provide feedback, comments on this idea to help
evaluate if it is worth creating an enhancement jira item for the same.

Thanks,

Prasanna

Re: Debugging - DIH Delta Queries-

2010-05-25 Thread Chris Hostetter


: Subject: Debugging - DIH Delta Queries- 
: References:
: 1659766275.5213.1274376509278.javamail.r...@vicenza.dmz.lexum.pri
: In-Reply-To:
: 1659766275.5213.1274376509278.javamail.r...@vicenza.dmz.lexum.pri

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking




-Hoss

Re: Solr Cell and encrypted pdf files

2010-05-25 Thread Chris Hostetter


: I can't seem to get solr cell to index password protected pdf files.
: I can't figure out how to pass the password to tika and looking at
: ExtractingDocumentLoader,
: it doesn't seem to pass any pdf password related metadata to the tika parser.

I suspect you are correct, i don't think anyone has ever submitted a patch 
to enable Solr take advantage of this functionality in Tika.

if you have suggestions on how to implement it, please open a Jira issue 
(even if you don't have a patch to contribute, suggestions about how it 
might make sense to implement it from someone who has looked at the code 
and has some ideas may help inspire someoen else to work on a patch)


-Hoss

Re: question about indexing...

2010-05-25 Thread Erick Erickson

Don't forget to re-index after you make the change Lance suggested...

Erick

On Tue, May 25, 2010 at 4:51 PM, Lance Norskog goks...@gmail.com wrote:

 Change type=string to type=text. This causes the field to be
 analyzed and then searching on words finds the document.



 On Tue, May 25, 2010 at 8:34 AM, Jörg Agatz joerg.ag...@googlemail.com
 wrote:
  i create a new Index, but nothing Change.
 
   field name=COMMENT type=string indexed=true stored=true
  multiValued=true/
 
 
 
 
 
  field name=COMMENT
 ![CDATA[
  Hallo leute. mein name ist dein name und wir wollen eigentlich nur unsere
  Ruhe haben.
  bich du er sie es/b
  Ha ha Ha ha ha ha ha ha ha ha
  ]]/field
 
  I search for :
 
   *:* 
  I fond it
 
  i search vor hallo Hallo hallo* Hallo*or some other content from
 the
  CDATA field i dosent.
 



 --
 Lance Norskog
 goks...@gmail.com

Re: Solr Delta Queries

2010-05-25 Thread Chris Hostetter


: field name=indexed_timestamp type=date indexed=true stored=true 
default=NOW multiValued=false/

: For some reason when doing delta indexing via DIH, this field is not being 
updated.
: 
: Are timestamp fields updated during DELTA updates?

timestamp fields aren't treated any differnetly then any other field -- as 
far as Solr is concerned this just date field that happens to have a 
default value specified in case the client adding documents doesn't 
specify a value for this field -- in your case the client is DIH.

One thing that isn't clear from the way your worded your question is 
wether you realize that when DIH does a delta-import only new documents 
matching your deltaQuery are updated in the index -- all other existing 
documents are left alone (with their old value for the 
indexed_timestamp field) ... however you should be able to see that any 
*new* documents have a value for the indexed_timestamp field.

Perhaps the documents you are looking at where this field is not being 
updated weren't actually updated as part of the deltaQuery?  if you look 
at the output from loading DIH in your browser, it will tell you how many 
documents were processed as a result of your last delta-import, and the 
log files will show you the uniqueKey of each doc so you can see exactly 
what was updated.

-Hoss

Re: solr caches from external caching system like memcached

2010-05-25 Thread Chris Hostetter


:   Is it possible to use solr caches such as query cache , filter cache
: and document cache from external caching system   like memcached as it
: has several advantages such as centralized caching system  and reducing the
: pause time  of JVM 's garbage collection  as we can assign less memory to
: jvm .

No.

The purpose of those solr caches is to micro-cache those objects that 
are used at a very low level in memory.  In an external cache system 
network overhead and object serialization become a factor in performance.  
using them in a micro cache aspect doesn't make sense -- at that point 
you're probably better off using something like an HTTP proxy cache to do 
macro caching at the level of the entire HTTP request/response.


-Hoss

Re: Solr highlighter and custom queries?

2010-05-25 Thread Chris Hostetter


: Actually, its not as much a Solr problem as a Lucene one, as it turns 
: out, the WeightedSpanTermExtractor is in Lucene and not Solr.
: 
: Why they decided to only highlight queries that are in Lucene I don't 
: know, but what I did to solve this problem was simply to make my queries 
: extends a Lucene query instead of just Query.

I am not very well informed on highlighting, but as i understand it the 
Span based Highlighter is specificly designed to deal with position 
based information that depends on dealing either with SpanQueries or with 
well known query types where that information can be faked.  

However, i believe the more traditional highlighter (using 
QueryTermExtractor) was able to deal with highlighting any query tat 
implemented extractTerms(Set) so perhaps something about the way you are 
using the highlighter is triggering the use of WeightedSpanTermExtractor 
instead of QueryTermExtractor?


-Hoss

Re: Full Import failed

2010-05-25 Thread Chris Hostetter


: yes i am running 1.5, Any idea how we can run Solr 1.4 using Java 1.5

Solr 1.4 works just fine with Java 1.5 -- even when Using the 
DataImportHandler.

there are some features of DIH like the  ScriptTransformer that requires 
java 1.6, but that's not your issue...

:  Last I encountered that exception was with the usage of String.isEmpty
:  which is a 1.6 novelty.

...the line in question in the stack trace provided has nothign to do with 
String.isEmpty.

 Caused by: java.lang.NoSuchMethodError: isEmpty
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:391)

the object in question is a DocWrapper which inherits from 
SolrInputDocument which defines isEmpty.  if you are getting this error it 
suggests that something is wonky with your classpath, and you probably 
have multiple versions of some solr jars getting included by mistake -- in 
particular an old copy of the solr-common jar where SolrInputDocument is 
defined.



-Hoss

Re: Full Import failed

2010-05-25 Thread Mohamed Parvez

I am just using the sor.war file that came with the Solr 1.4 download on
weblogic.
did not add any jar or remove any jar



On Tue, May 25, 2010 at 9:54 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : yes i am running 1.5, Any idea how we can run Solr 1.4 using Java 1.5

 Solr 1.4 works just fine with Java 1.5 -- even when Using the
 DataImportHandler.

 there are some features of DIH like the  ScriptTransformer that requires
 java 1.6, but that's not your issue...

 :  Last I encountered that exception was with the usage of String.isEmpty
 :  which is a 1.6 novelty.

 ...the line in question in the stack trace provided has nothign to do with
 String.isEmpty.

  Caused by: java.lang.NoSuchMethodError: isEmpty
  at
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:391)

 the object in question is a DocWrapper which inherits from
 SolrInputDocument which defines isEmpty.  if you are getting this error it
 suggests that something is wonky with your classpath, and you probably
 have multiple versions of some solr jars getting included by mistake -- in
 particular an old copy of the solr-common jar where SolrInputDocument is
 defined.



 -Hoss

57 matches

Mail list logo