Re: how to selectively sort records keeping some at the bottom always.. ?

2009-08-30 Thread Preetam Rao
Thanks Yonik. It was very useful.

On Sat, Aug 29, 2009 at 3:11 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Thu, Aug 27, 2009 at 10:29 AM, Preetam Raoblogathan@gmail.com
 wrote:
  Hi,
  If I have documents of type a, b and c but when I sort by some criteria,
  lets say date,
  can I make documents of kind c always appear at the bottom ?

 One way is to simply use sorting.
 You could have a string field called type_c with
 sortMissingFirst=true (see the example schema)
 Index yes for all documents that are of type_c
 Then to sort by date, use sort=type_c desc, date desc

 If one needed to put type c docs at the top or bottom, then index 1
 for type c and 2 for other types, and then sort asc or desc as
 needed.

 -Yonik
 http://www.lucidimagination.com


  So effectively I want one kind of records always appear at the bottom
 since
  they don't have valid data,
  whether sort is ascending or descending;
 
  Would a function query help here ? Or is it even possible ?
 
  Thanks
  Preetam
 



how to selectively sort records keeping some at the bottom always.. ?

2009-08-27 Thread Preetam Rao
Hi,
If I have documents of type a, b and c but when I sort by some criteria,
lets say date,
can I make documents of kind c always appear at the bottom ?

So effectively I want one kind of records always appear at the bottom since
they don't have valid data,
whether sort is ascending or descending;

Would a function query help here ? Or is it even possible ?

Thanks
Preetam


Re: performance implications on using lots of values in fq

2008-07-24 Thread Preetam Rao
I don't have much idea on performance of these many fqs,  since I have
usually used very small number of fqs. But passing my thoughts hoping it
might help. (since I did not see any response :-)

a) the filter cache size needs to be more, so that fqs can be cached. If a
fq is not in cache, AFAIK, each fq produces one lucene query.
b) If fqs are in cache, the operations involving fq reduces to intersecting
the N bit sets where N is number of fqs.
In the worst case, N fqs boil down to N lucene queries and N bitset
intersections.

Just a wild guess - if you are doing something with radius search or similar
search involving lat/longs, you can try using LocalSolr, which takes care of
all the details for you.

--
Preetam

On Wed, Jul 23, 2008 at 11:58 PM, briand [EMAIL PROTECTED] wrote:


 I have documents in SOLR such that each document contains one to many
 points
 (latitude and longitudes).   Currently we store the multiple points for a
 given document in the db and query the db to find all of the document ids
 around a given point first.   Once we have the list of ids, we populate the
 fq with those ids and the q value and send that off to SOLR to do a search.
 In the longest query to SOLR we're populating about 450 ids into the fq
 parameter at this time.   I was wondering if anyone knows the performance
 implications of passing so many ids into the fq and when it would
 potentially be a problem for SOLR?   Currently the query passing in 450 ids
 is not a problem at all and returns in less than a second.   Thanks.
 --
 View this message in context:
 http://www.nabble.com/performance-implications-on-using-lots-of-values-in-fq-tp18617397p18617397.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Solr/Lucene search term stats

2008-07-22 Thread Preetam Rao
hi,

try using faceted search,
http://wiki.apache.org/solr/SimpleFacetParameters

something like facet=truefacet.query=title:(web2.0 OR ajax)

facet.query - gives the number of matching documents for a query.
You can run the examples in the above link and see how it works..

You can also try using facet.field, which enumerates all the terms found in
a given field and also tells how many documnetss contained each term.

For both the above, the set of documents it acts on are the results of q. So
if you want get the facets for all documents, try q=*:*

On Tue, Jul 22, 2008 at 1:43 PM, Sunil [EMAIL PROTECTED] wrote:

 Hi All,

 I am working on a module using Solr, where I want to get the stats of
 each keyword found in each field.

 If my search term is: (title:(web2.0 OR ajax) OR
 description:(web2.0 OR ajax))

 Then I want to know how many times web2.0/ajax were found in title or
 description.

 Any suggestion on how to get this information (apart from  hl=true
 variable).


 Thanks,
 Sunil





Re: Filter by Type increases search results.

2008-07-18 Thread Preetam Rao
I have used fq the way it is with dismax and it works fine.
fq is standard parameter and not specific to dismax.
So type:idea should work correctly.

-
Preetam

On Fri, Jul 18, 2008 at 11:30 AM, chris sleeman [EMAIL PROTECTED]
wrote:

  btw, this *seems* to only work for me with standard search handler.
 dismax
 and fq: dont' seem to get along nicely...

 Wouldnt the dismax parser consider the filter query parameter as type
 idea
 and not value idea for solr field - type?
 I guess thats the reason this query doesnt work with dismax, the way it
 works with the standard search handler. You can add a debugQuery=true
 parameter to check the actual parsed query.

 -Chris

 On Tue, Jul 15, 2008 at 10:47 PM, Yonik Seeley [EMAIL PROTECTED] wrote:

  On Tue, Jul 15, 2008 at 11:10 AM, Norberto Meijome [EMAIL PROTECTED]
  wrote:
   On Tue, 15 Jul 2008 18:07:43 +0530
   Preetam Rao [EMAIL PROTECTED] wrote:
  
   When I say filter, I meant q=fishfq=type:idea
  
   btw, this *seems* to only work for me with standard search handler.
  dismax and fq: dont' seem to get along nicely... but maybe, it is just
 late
  and i'm not testing it properly..
 
  It should work the same... the only thing dismax does differently now
  is change the type of the base query to dismax.
 
  -Yonik
 



 --
 Bill Cosby  - Advertising is the most fun you can have with your clothes
 on.



Re: Multiple query fields in DisMax handler

2008-07-17 Thread Preetam Rao
If I understand the question correctly, you can provide init params, default
params and invariant params in the appropriate request handler section in
solrconfig.xml.
So you can create a standard request handler with name dismaxL, whose
defType is dismax and set all parameters in defaults section.


Preetam

On Thu, Jul 17, 2008 at 4:35 PM, chris sleeman [EMAIL PROTECTED]
wrote:

 Thanks a lot..this is, more or less, what i was looking for.

 However, is there a way to pre-configure the dismax query parser, with
 parameters like qf, pf, boost etc., in solr-config.xml, rather than doing
 so
 at query time. So my actual query would look like - 
 http://localhost:8983/solr/select?q=
 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true
 
 queryfq={!dismaxL}CAdebugQuery=true
 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true
 ,
 where dismaxL refers to a query parser defined in solrconfig, with all the
 necessary parameters. The q parameter would then use the default dismax
 parser defined for the handler and fq would use dismaxL.

 Regards,
 Chris

 On Thu, Jul 17, 2008 at 5:15 AM, Erik Hatcher [EMAIL PROTECTED]
 wrote:

  On Jul 16, 2008, at 7:38 PM, Ryan McKinley wrote:
 
  (assuming you are using 1.3-dev), you could use the dismax query parser
  syntax for the fq param.  I think it is something like:
  fq=!dismaxyour query
 
 
  The latest committed syntax is:
 
{!dismax qf=}your query
 
  For example, with the sample data: 
 
 http://localhost:8983/solr/select?q=*:*fq={!dismax%20qf=%22name%22}ipoddebugQuery=truehttp://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true
 
 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true
 
  
 
   I can't find the syntax now (Yonik?)
 
  but I don't know how you could pull out the qf,pf,etc fields for the fq
  portion vs the q portion.
 
 
  You can add parameters like the qf above, within the {!dismax ... } area.
 
 Erik
 
 


 --
 Bill Cosby  - Advertising is the most fun you can have with your clothes
 on.



Re: Multiple query fields in DisMax handler

2008-07-17 Thread Preetam Rao
I see that a QParser takes local params (those given via {!...} )as well as
request params. It sets the lookup chain as local followed be request
params. AFAIK, the request param lookup chain  is set up as -
those given in the url explicitly, then invariants, then defaults gievn in
solrconfig for the corresponding request handler.

Since you are not using dismax params for the main query and just want them
to be available for the Dismax parser, there are no conflicts and I think
you can set the qf, bf etc in the named standard request handler that you
are configuring in solrconfig and dismax query parser will automatically
pick it up.

--
Preetam


On Thu, Jul 17, 2008 at 5:48 PM, Erik Hatcher [EMAIL PROTECTED]
wrote:

 A custom QParserPlugin could be created and implement an #init(NamedList)
 which you could parameterize via it's solrconfig.xml configuration.   That
 would be one way.   Another trick, I think, would be to use request
 parameter substitution.  The javadocs here might lead you to what you're
 after:

 
 http://lucene.apache.org/solr/api/org/apache/solr/search/NestedQParserPlugin.html
 

 I've not tinkered with this stuff myself other than a bit of trying to grok
 the capabilities Yonik built into this stuff, so having folks post back
 their experiences would be helpful to us all :)

Erik



 On Jul 17, 2008, at 8:11 AM, chris sleeman wrote:

 What I actually meant was whether or not I could create a configuration
 for
 a dismax query parser and then refer to it in my filter query. I already
 have a standard request handler with a dismax deftype for my query
 field.
 I wanted to use another dismax parser for the fq param, on the lines of
 what
 Ryan and Erik had suggested. Just dont want to specify all the params for
 this dismax at query time.

 My actual query would then simply look like - 
 http://localhost:8983/solr/select?q=*:*fq={!dismaxL}CAhttp://localhost:8983/solr/select?q=*:*fq=%7B%21dismaxL%7DCA,
 instead of
 specifying all the qf, pf, etc fields as part of the dismax syntax within
 the query.

 Regards,
 Chris

 On Thu, Jul 17, 2008 at 5:18 PM, Preetam Rao [EMAIL PROTECTED]
 wrote:

  If I understand the question correctly, you can provide init params,
 default
 params and invariant params in the appropriate request handler section in
 solrconfig.xml.
 So you can create a standard request handler with name dismaxL, whose
 defType is dismax and set all parameters in defaults section.

 
 Preetam

 On Thu, Jul 17, 2008 at 4:35 PM, chris sleeman [EMAIL PROTECTED]
 wrote:

  Thanks a lot..this is, more or less, what i was looking for.

 However, is there a way to pre-configure the dismax query parser, with
 parameters like qf, pf, boost etc., in solr-config.xml, rather than
 doing
 so
 at query time. So my actual query would look like - 
 http://localhost:8983/solr/select?q=


 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true


  queryfq={!dismaxL}CAdebugQuery=true


 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true

 ,

 where dismaxL refers to a query parser defined in solrconfig, with all

 the

 necessary parameters. The q parameter would then use the default dismax
 parser defined for the handler and fq would use dismaxL.

 Regards,
 Chris

 On Thu, Jul 17, 2008 at 5:15 AM, Erik Hatcher 

 [EMAIL PROTECTED]

 wrote:

  On Jul 16, 2008, at 7:38 PM, Ryan McKinley wrote:

  (assuming you are using 1.3-dev), you could use the dismax query

 parser

 syntax for the fq param.  I think it is something like:
 fq=!dismaxyour query


 The latest committed syntax is:

  {!dismax qf=}your query

 For example, with the sample data: 


  
 http://localhost:8983/solr/select?q=*:*fq={!dismax%20qf=%22namehttp://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name
 %22}ipoddebugQuery=true
 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true
 
 

 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true


 


 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true




 I can't find the syntax now (Yonik?)


 but I don't know how you could pull out the qf,pf,etc fields for the

 fq

 portion vs the q portion.


 You can add parameters like the qf above, within the {!dismax ... }

 area.


  Erik




 --
 Bill Cosby  - Advertising is the most fun you can have with your
 clothes
 on.





 --
 Yogi Berra  - A nickel ain't worth a dime anymore.





Re: Multiple query fields in DisMax handler

2008-07-17 Thread Preetam Rao
Oops.. this will only help you configure only the defaults common to the
main dismax query as well as the fq dismax query.

For creating two sets of dismax parsers which are named and want to read
params from solrconfig, I think one can extend the dismaxQParser's currently
empty init() method to setup the param hierarchy during the query time such
that the lookup chain becomes - local params, then qparser init params and
then request params. It is along similar lines to Eric's suggestion. But I
am not sure if we want to create a custom parser for this or make it a
general behavior..

---
preetam

On Thu, Jul 17, 2008 at 6:21 PM, Preetam Rao [EMAIL PROTECTED]
wrote:

 I see that a QParser takes local params (those given via {!...} )as well as
 request params. It sets the lookup chain as local followed be request
 params. AFAIK, the request param lookup chain  is set up as -
 those given in the url explicitly, then invariants, then defaults gievn in
 solrconfig for the corresponding request handler.

 Since you are not using dismax params for the main query and just want them
 to be available for the Dismax parser, there are no conflicts and I think
 you can set the qf, bf etc in the named standard request handler that you
 are configuring in solrconfig and dismax query parser will automatically
 pick it up.

 --
 Preetam



 On Thu, Jul 17, 2008 at 5:48 PM, Erik Hatcher [EMAIL PROTECTED]
 wrote:

 A custom QParserPlugin could be created and implement an #init(NamedList)
 which you could parameterize via it's solrconfig.xml configuration.   That
 would be one way.   Another trick, I think, would be to use request
 parameter substitution.  The javadocs here might lead you to what you're
 after:

 
 http://lucene.apache.org/solr/api/org/apache/solr/search/NestedQParserPlugin.html
 

 I've not tinkered with this stuff myself other than a bit of trying to
 grok the capabilities Yonik built into this stuff, so having folks post back
 their experiences would be helpful to us all :)

Erik



 On Jul 17, 2008, at 8:11 AM, chris sleeman wrote:

 What I actually meant was whether or not I could create a configuration
 for
 a dismax query parser and then refer to it in my filter query. I already
 have a standard request handler with a dismax deftype for my query
 field.
 I wanted to use another dismax parser for the fq param, on the lines of
 what
 Ryan and Erik had suggested. Just dont want to specify all the params for
 this dismax at query time.

 My actual query would then simply look like - 
 http://localhost:8983/solr/select?q=*:*fq={!dismaxL}CAhttp://localhost:8983/solr/select?q=*:*fq=%7B%21dismaxL%7DCA,
 instead of
 specifying all the qf, pf, etc fields as part of the dismax syntax within
 the query.

 Regards,
 Chris

 On Thu, Jul 17, 2008 at 5:18 PM, Preetam Rao [EMAIL PROTECTED]
 wrote:

  If I understand the question correctly, you can provide init params,
 default
 params and invariant params in the appropriate request handler section
 in
 solrconfig.xml.
 So you can create a standard request handler with name dismaxL, whose
 defType is dismax and set all parameters in defaults section.

 
 Preetam

 On Thu, Jul 17, 2008 at 4:35 PM, chris sleeman [EMAIL PROTECTED]
 
 wrote:

  Thanks a lot..this is, more or less, what i was looking for.

 However, is there a way to pre-configure the dismax query parser, with
 parameters like qf, pf, boost etc., in solr-config.xml, rather than
 doing
 so
 at query time. So my actual query would look like - 
 http://localhost:8983/solr/select?q=


 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true


  queryfq={!dismaxL}CAdebugQuery=true


 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true

  ,

 where dismaxL refers to a query parser defined in solrconfig, with all

 the

 necessary parameters. The q parameter would then use the default dismax
 parser defined for the handler and fq would use dismaxL.

 Regards,
 Chris

 On Thu, Jul 17, 2008 at 5:15 AM, Erik Hatcher 

 [EMAIL PROTECTED]

 wrote:

  On Jul 16, 2008, at 7:38 PM, Ryan McKinley wrote:

  (assuming you are using 1.3-dev), you could use the dismax query

 parser

  syntax for the fq param.  I think it is something like:
 fq=!dismaxyour query


 The latest committed syntax is:

  {!dismax qf=}your query

 For example, with the sample data: 


  
 http://localhost:8983/solr/select?q=*:*fq={!dismax%20qf=%22namehttp://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name
 %22}ipoddebugQuery=true
 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true
 
 

 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true


 


 http://localhost:8983/solr/select?q=*:*fq=%7B%21dismax%20qf=%22name%22%7DipoddebugQuery=true




 I can't find the syntax now (Yonik?)


 but I don't know how you could pull out the qf,pf,etc

Dismax request handler and sub phrase matches... suggestion for another handler..

2008-07-15 Thread Preetam Rao
Hi,

Apologies if you are receiving it second time...having tough time with mail
server..

I take a user entered query as it is and run it with dismax query handler.
The documents fields have been filled from structured data, where different
fields have different attributes like number of beds, number of baths, city
name etc. A sample user query would look like 3 bed homes in new york. I
would like this to match against city:new york and beds:3 beds. When I use
dismax handler with boosts and tie parameter, I do not always get the most
relevant top 10 results because there seem to be many factors in play one of
which is not being able to recognize the presence of sub phrases and
secondly not being able to ignore unwanted matches in unwanted fields.

What are your thoughts on having one more request handler like dismax, but
which uses a sub-phrase query instead of dismax query ?
It would also provide the below parameters, on per field basis, to help
customize the behavior of the request handler, and give more flexibility in
different scenarios.
.
phraseBoost - how better is a 3 word sub phrase match than 2 word sub phrase
match
useOnlyMaxMatch - If many sub phrases match in the field, only the best
score is used.
ignoreDuplicates - If a field has duplicate matches, pick only one match for
scoring.
matchOnlyOneField - if match is found in the first field, remove the matched
terms while querying the other fields. For example, for me city match is
more important than in other fields. So,, I do not want thenew in new york
to match all other fields and skew the results, which is what i am seeing
with dismax, irrespective of the high boosts.
ignoreSomeLuceneScorefactors - Ignore the lucene tf, idf, query norm or any
such criteria which is not needed for this field., since if I want exact
matches only, they are really not important. They also seem to play a big
role in me not being to get most relevant top 10 results.

I see this handler might be useful in the below use cases -
a) data is mostly exact in that, I am not trying to search on free text
like, mails, reviews, articles, web pages etc
b) numbers and their binding are important
c) exact phrase or sub phrase matches are more important than rankings
derived from tf, idf, query norm etc.
d) need to make sure that in some cases some fields affect the scoring and
in some they don't. I found this was the most difficult task, to trace the
noise matches from the required ones for my use case.

Your thoughts and suggestions on alternatives are welcome.

Have also posted a question on sub phrase matching in lucene-user which is
not related to having a solr handler with additional features like
sub-phrase matching, for user entered queries.

Thanks
Preetam


Re: Dismax request handler and sub phrase matches... suggestion for another handler..

2008-07-15 Thread Preetam Rao
I agree. If we do decide to implement another kind of request handler, it
should be through StandardRequesthandler def type attribute, which selects
the registered QParser which generates appropriate queries for lucene.


Preetam

On Tue, Jul 15, 2008 at 3:59 PM, Erik Hatcher [EMAIL PROTECTED]
wrote:


 On Jul 15, 2008, at 4:45 AM, Preetam Rao wrote:

 What are your thoughts on having one more request handler like dismax, but
 which uses a sub-phrase query instead of dismax query ?


 It'd be better to just implement a QParser(Plugin) such that the
 StandardRequestHandler can use it (defType=dismax, for example).

 No need to have additional actual request handlers just to swap out query
 parsing logic anymore.

Erik




Re: Filter by Type increases search results.

2008-07-15 Thread Preetam Rao
Hi Matt,

Other than applying one more fq, is everything else remains same between the
two queries, like q and all other parameters ?

My understanding is that, fq is an intersection on the set of results
returned from q. So it should always be a subset of results returned from q.
So if one uses just q, and other uses q and fq, for the same q, the second
will have equal or less number of documents.


Preetam

On Tue, Jul 15, 2008 at 4:10 PM, matt connolly [EMAIL PROTECTED] wrote:


 I'm using Solr with a Drupal site, and one of the fields in the schema is
 type.

 In my example development site, searching for the word fish returns 2
 documents, one type='story', and the other type='idea'.

 If I filter by type:idea then I get 9 results, the correct first result,
 followed by 8 results that are of type='idea' but do not use the word
 fish
 at all. I have completely disabled synonyms (and rebuilt indexes) and this
 makes no difference.

 Any ideas why filtering the type results in more search documents matched?
 --
 View this message in context:
 http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18462188.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Filter by Type increases search results.

2008-07-15 Thread Preetam Rao
Hi Matt,

When I say filter, I meant q=fishfq=type:idea

What you are trying is a boolean OR of defaultsearchfield.:fish OR
type:idea.

Its not a filter, its an OR. Obviously you will get a union of results...

--
Preetam

On Tue, Jul 15, 2008 at 5:37 PM, matt connolly [EMAIL PROTECTED] wrote:


 Yes, the same, except for the filter.

 For example:

 http://localhost:8983/solr/select?q=fish
 returns:
 result name=response numFound=2 start=0etc (followed by 2
 docs)

 http://localhost:8983/solr/select?q=fish+type:idea
 returns:
 result name=response numFound=9 start=0. (followed by 9
 docs)


 -Matt


 Preetam Rao wrote:
 
  Hi Matt,
 
  Other than applying one more fq, is everything else remains same between
  the
  two queries, like q and all other parameters ?
 
  My understanding is that, fq is an intersection on the set of results
  returned from q. So it should always be a subset of results returned from
  q.
  So if one uses just q, and other uses q and fq, for the same q, the
 second
  will have equal or less number of documents.
 
  
  Preetam
 
 

 --
 View this message in context:
 http://www.nabble.com/Filter-by-Type-increases-search-results.-tp18462188p18463448.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: estimating memory needed for solr instances...

2008-07-10 Thread Preetam Rao
Oops. Sorry for the typo. I will be careful next time.
Thanks a lot for digging out the old thread :-)
 It was helpful.

Should we remove the option useFilterForSortedQuery altogether if its not
being used anymore ?

---
Preetam

On Fri, Jul 11, 2008 at 2:10 AM, Chris Harris [EMAIL PROTECTED] wrote:

 I didn't know what option was being referred to here, but I eventually
 figured it out. If anyone else was confused, the option is called
 useFilterForSortedQuery, you can set it via solrconfig.xml, and, at
 least according to Yonik in late 2006, you probably won't want to
 enable it even if you *do* sort by something other than score:


 http://www.nabble.com/try-setting-useFilterForSortedQuery-to-false-td7822871.html#a7822871

 Cheers,
 Chris

 On Wed, Jul 9, 2008 at 12:00 AM, Preetam Rao [EMAIL PROTECTED]
 wrote:
 
  Since we do not sort the results, the sort will be by score which
 eliminates
  the option userFiterFprSortedQuerries.



estimating memory needed for solr instances...

2008-07-09 Thread Preetam Rao
Hi,

Since we plan to share the same box among multiple solr instances on a 16gb
RAM multi core box, Need to estimate how much memory we need for our
application.

The index size is on disk  2.4G with close to 3 million documents. The plan
is to use dismax query with some fqs.
Since we do not sort the results, the sort will be by score which eliminates
the option userFiterFprSortedQuerries.
Thus assuming all q's will use query result cache and all fqs will use
filter caches the below is what i am thinking.

I would like to know how to relate the index size on disk to its memory size
?
Would it be safe to assume gven the disk size of 2.4g, that we can have ram
size for whole index plus 1g for any other overhead plus the cache size
which comes to 150MB  (calculation below). Thus making it around 4g.

cache size calculation -

query result cache - size = 50K;
since we paginate the results and each page has 10 items and assuming each
user will at the max see 3 pages, per query
we will set queryResultWindowSize to 30. Assuming this, for 50k querries we
will use up 5* 30 bits = 187K asuming results are stored in bitset.

we use few common fqs, lets say 200. Assuming each returns around 30k
documents, it adds to 200 * 3 bits  = 750K.

If we use document cache of size 20K, assuming each document size is around
5k at the max, it will take up 2 * 5= 100MB.

Thus we can increase the cache more drastically and still it will use up
only 150MB or less.

Is this reasoning on cache's correct ?

Thanks
Preetam


Re: estimating memory needed for solr instances...

2008-07-09 Thread Preetam Rao
Thanks for the responses, Ian, Jacob.

While I could not locate the previous thread, this is what I understand..

While we can fine tune the cache parameters and other stuff which we can
directly control, with respect to index files the key is to give enough RAM
and let the the OS do its best with respect to keeping the index file in
memory,

--
Preetam

On Thu, Jul 10, 2008 at 7:12 AM, Ian Connor [EMAIL PROTECTED] wrote:

 I would guess so also to a point. After you run out of RAM, indexing
 also takes a hit. I have noticed on a 2Gb machine when the index gets
 over 2Gb, my indexing rate when down from 100/s to 40/s. After
 reaching 4Gb it was down to 10/s. I am trying now with a 8Gb machine
 to see how far I get through my data before slowing down.

 On Wed, Jul 9, 2008 at 7:56 PM, Jacob Singh [EMAIL PROTECTED] wrote:
  My total guess is that indexing is CPU bound, and searching is RAM bound.
 
  Best,
  Jacob
  Ian Connor wrote:
  There was a thread a while ago, that suggested just need to factor in
  the index's total size (Mike Klaas I think was the author). It was
  suggested having the RAM is enough and the OS will cache the files as
  needed to give you the performance boost needed.
 
  If I misread the thread, please chime in - but it seems having enough
  RAM is the key to performance.
 
  On Wed, Jul 9, 2008 at 3:00 AM, Preetam Rao [EMAIL PROTECTED]
 wrote:
  Hi,
 
  Since we plan to share the same box among multiple solr instances on a
 16gb
  RAM multi core box, Need to estimate how much memory we need for our
  application.
 
  The index size is on disk  2.4G with close to 3 million documents. The
 plan
  is to use dismax query with some fqs.
  Since we do not sort the results, the sort will be by score which
 eliminates
  the option userFiterFprSortedQuerries.
  Thus assuming all q's will use query result cache and all fqs will use
  filter caches the below is what i am thinking.
 
  I would like to know how to relate the index size on disk to its memory
 size
  ?
  Would it be safe to assume gven the disk size of 2.4g, that we can have
 ram
  size for whole index plus 1g for any other overhead plus the cache size
  which comes to 150MB  (calculation below). Thus making it around 4g.
 
  cache size calculation -
  
  query result cache - size = 50K;
  since we paginate the results and each page has 10 items and assuming
 each
  user will at the max see 3 pages, per query
  we will set queryResultWindowSize to 30. Assuming this, for 50k
 querries we
  will use up 5* 30 bits = 187K asuming results are stored in bitset.
 
  we use few common fqs, lets say 200. Assuming each returns around 30k
  documents, it adds to 200 * 3 bits  = 750K.
 
  If we use document cache of size 20K, assuming each document size is
 around
  5k at the max, it will take up 2 * 5= 100MB.
 
  Thus we can increase the cache more drastically and still it will use
 up
  only 150MB or less.
 
  Is this reasoning on cache's correct ?
 
  Thanks
  Preetam
 
 
 
 
 
 



Re: Integrate Solr with Tomcat in Linux

2008-07-08 Thread Preetam Rao
set the solr home folder such that-

If you are using jndi name for solr.home or command line argument for
solr.home, then it will look for conf and lib folders under that folder.

If you are not using jndi name, then it looks for solr/conf and solr/lib
folders under current directory which is the directory you started tomcat
from.

You can get the conf and lib folders from the distributions example folder
also

Hope this helps

Thanks
Preetam

On Wed, Jul 9, 2008 at 9:28 AM, Noble Paul നോബിള്‍ नोब्ळ् 
[EMAIL PROTECTED] wrote:

 The context 'solr' is not  initialized. The most likely reson is that
 you have not set the solr.home correctly.
 --Noble

 On Wed, Jul 9, 2008 at 3:24 AM, sandeep kaur [EMAIL PROTECTED]
 wrote:
 
  Hi,
 
  As i am running tomcat after copying the solr files to appropriate tomcat
 directories, i am getting the followin error in the catalina log:
 
  Jul 8, 2008 10:30:02 PM org.apache.catalina.core.AprLifecycleListener
 init
  INFO: The Apache Tomcat Native library which allows optimal performance
 in production environments was not found on the java.library.path:
 /usr/java/jdk1.6.0_06
 
 /jre/lib/i386/client:/usr/java/jdk1.6.0_06/jre/lib/i386:/usr/java/jdk1.6.0_06/jr
  e/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
  Jul 8, 2008 10:30:02 PM org.apache.coyote.http11.Http11Protocol init
  INFO: Initializing Coyote HTTP/1.1 on http-8080
  Jul 8, 2008 10:30:02 PM org.apache.catalina.startup.Catalina load
  INFO: Initialization processed in 285 ms
  Jul 8, 2008 10:30:02 PM org.apache.catalina.core.StandardService start
  INFO: Starting service Catalina
  Jul 8, 2008 10:30:02 PM org.apache.catalina.core.StandardEngine start
  INFO: Starting Servlet Engine: Apache Tomcat/6.0.9
  Jul 8, 2008 10:30:02 PM org.apache.solr.servlet.SolrDispatchFilter init
  INFO: SolrDispatchFilter.init()
  Jul 8, 2008 10:30:02 PM org.apache.solr.core.Config getInstanceDir
  INFO: Using JNDI solr.home: /home/user_name/softwares
  Jul 8, 2008 10:30:02 PM org.apache.solr.core.Config setInstanceDir
  INFO: Solr home set to '/home/user_name/softwares/'
  Jul 8, 2008 10:30:02 PM org.apache.catalina.core.StandardContext start
  SEVERE: Error filterStart
  Jul 8, 2008 10:30:02 PM org.apache.catalina.core.StandardContext start
  SEVERE: Context [/solr] startup failed due to previous errors
  Jul 8, 2008 10:30:03 PM org.apache.coyote.http11.Http11Protocol start
  INFO: Starting Coyote HTTP/1.1 on http-8080
  Jul 8, 2008 10:30:03 PM org.apache.jk.common.ChannelSocket init
  INFO: JK: ajp13 listening on /0.0.0.0:8009
  Jul 8, 2008 10:30:03 PM org.apache.jk.server.JkMain start
  INFO: Jk running ID=0 time=0/30  config=null
  Jul 8, 2008 10:30:03 PM org.apache.catalina.startup.Catalina start
  INFO: Server startup in 589 ms
 
  In the browser while typing http://localhost:8080/solr/admin
 
  i am getting the following error
 
  HTTP Status 404 - /solr/admin
 
  type Status report
 
  message /solr/admin
 
  description The requested resource (/solr/admin) is not available.
  Apache Tomcat/6.0.9
 
  Could anyone please suggest how to resolve this error.
 
  Thanks,
  Sandip
 
 
  --- On Tue, 8/7/08, Shalin Shekhar Mangar [EMAIL PROTECTED]
 wrote:
 
  From: Shalin Shekhar Mangar [EMAIL PROTECTED]
  Subject: Re: Integrate Solr with Tomcat in Linux
  To: solr-user@lucene.apache.org, [EMAIL PROTECTED]
  Date: Tuesday, 8 July, 2008, 4:40 PM
  Take a look at http://wiki.apache.org/solr/SolrTomcat
 
  Please avoid replying to an older message when you're
  starting a new topic.
 
  On Tue, Jul 8, 2008 at 4:36 PM, sandeep kaur
  [EMAIL PROTECTED]
  wrote:
 
   Hi,
  
I have solr with jetty as server application running
  on Linux.
  
   Could anyone please tell me the changes i need to make
  to integrate Tomcat
   with solr on Linux.
  
   Thanks,
   Sandip
  
   --- On Mon, 7/7/08, Benson Margulies
  [EMAIL PROTECTED] wrote:
  
From: Benson Margulies
  [EMAIL PROTECTED]
Subject: Re: js client
To: [EMAIL PROTECTED], solr-user
  solr-user@lucene.apache.org
Date: Monday, 7 July, 2008, 11:43 PM
The Javascript should have the right URL
  automatically if
you get it from
the ?js URL.
   
Anyway, I think I was the first person to say
'stupid' about that WSDL in
the sample.
   
I'm not at all clear on what you are doing at
  this
point.
   
Please send along  the URL that works for you in
  soapUI and
the URL that
works for you in the
  script.../script
element.
   
   
   
   
On Mon, Jul 7, 2008 at 5:54 AM, Christine Karman
[EMAIL PROTECTED]
wrote:
   
 On Sun, 2008-07-06 at 10:25 -0400, Benson
  Margulies
wrote:
  In the sample, it is a relative URL to
  the web
service endpoint. The
  sample starts from a stupid WSDL with
  silly names
for the service and
  the port.

 I'm sorry about using the word
  stupid.

 
  Take your endpoint deployment URL, the
  very URL
that is logged when
  your