date:20090514

Re: Solr memory requirements?

2009-05-14 Thread vivek sar

Otis,

 We are not running master-slave configuration. We get very few
searches(admin only) in a day so we didn't see the need of
replication/snapshot. This problem is with one Solr instance managing
4 cores (each core 200 million records). Both indexing and searching
is performed by the same Solr instance.

What are .tii files used for? I see this file under only one core.

Still looking for what gets loaded in heap by Solr (during load time,
indexing, and searching) and stays there. I see most of these are
tenured objects and not getting released by GC - will post profile
records tomorrow.

Thanks,
-vivek





On Wed, May 13, 2009 at 6:34 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 There is constant mixing of indexing concepts and searching concepts in this 
 thread.  Are you having problems on the master (indexing) or on the slave 
 (searching)?


 That .tii is only 20K and you said this is a large index?  That doesn't smell 
 right...

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: vivek sar vivex...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wednesday, May 13, 2009 5:12:00 PM
 Subject: Re: Solr memory requirements?

 Otis,

 In that case, I'm not sure why Solr is taking up so much memory as
 soon as we start it up. I checked for .tii file and there is only one,

 -rw-r--r--  1 search  staff  20306 May 11 21:47 
 ./20090510_1/data/index/_3au.tii

 I have all the cache disabled - so that shouldn't be a problem too. My
 ramBuffer size is only 64MB.

 I read note on sorting,
 http://wiki.apache.org/solr/SchemaDesign?highlight=(sort), and see
 something related to FieldCache. I don't see this as parameter defined
 in either solrconfig.xml or schema.xml. Could this be something that
 can load things in memory at startup? How can we disable it?

 I'm trying to find out if there is a way to tell how much memory Solr
 would consume and way to cap it.

 Thanks,
 -vivek




 On Wed, May 13, 2009 at 1:50 PM, Otis Gospodnetic
 wrote:
 
  Hi,
 
  Sorting is triggered by the sort parameter in the URL, not a 
  characteristic of
 a field. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 4:42:16 PM
  Subject: Re: Solr memory requirements?
 
  Thanks Otis.
 
  Our use case doesn't require any sorting or faceting. I'm wondering if
  I've configured anything wrong.
 
  I got total of 25 fields (15 are indexed and stored, other 10 are just
  stored). All my fields are basic data type - which I thought are not
  sorted. My id field is unique key.
 
  Is there any field here that might be getting sorted?
 
 
  required=true omitNorms=true compressed=false/
 
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  default=NOW/HOUR  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  compressed=false/
 
  omitNorms=true compressed=false/
 
  compressed=false/
 
  default=NOW/HOUR omitNorms=true/
 
 
 
 
  omitNorms=true multiValued=true/
 
  Thanks,
  -vivek
 
  On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
  wrote:
  
   Hi,
   Some answers:
   1) .tii files in the Lucene index.  When you sort, all distinct values 
   for
 the
  field(s) used for sorting.  Similarly for facet fields.  Solr caches.
   2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
 consume
  during indexing.  There is no need to commit every 50K docs unless you 
  want
 to
  trigger snapshot creation.
   3) see 1) above
  
   1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
 going
  to fly. :)
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
  
   - Original Message 
   From: vivek sar
   To: solr-user@lucene.apache.org
   Sent: Wednesday, May 13, 2009 3:04:46 PM
   Subject: Solr memory requirements?
  
   Hi,
  
     I'm pretty sure this has been asked before, but I couldn't find a
   complete answer in the forum archive. Here are my questions,
  
   1) When solr starts up what does it loads up in the memory? Let's say
   I've 4 cores with each core 50G in size. When Solr comes up how much
   of it would be loaded in memory?
  
   2) How much memory is required during index time? If I'm committing
   50K records at a time (1 record = 1KB) using solrj, how much memory do
   I need to give to Solr.
  
   3) Is there a minimum memory requirement by Solr to maintain a certain
   size index? Is there any benchmark on this?
  
   Here are some of my

Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread KK

I want to know the maximum no of cores supported by Solr. 1000s or may be
millions all under one solr instance ?
Also I want to know how to redirect a particular query to a particular core.
Actually I'm querying solr from Ajax, so I think there must be some request
parameter that says which core we want to query, right? Can some one tell me
how to do this, any good pointers on the same will be helpful as well.
Thank you.

--kk

Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread Shishir Jain

http://wiki.apache.org/solr/CoreAdmin
Best regards,
Shishir

On Thu, May 14, 2009 at 1:58 PM, KK dioxide.softw...@gmail.com wrote:

 I want to know the maximum no of cores supported by Solr. 1000s or may be
 millions all under one solr instance ?
 Also I want to know how to redirect a particular query to a particular
 core.
 Actually I'm querying solr from Ajax, so I think there must be some request
 parameter that says which core we want to query, right? Can some one tell
 me
 how to do this, any good pointers on the same will be helpful as well.
 Thank you.

 --kk

Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread Noble Paul നോബിള്‍ नोब्ळ्

there is no hard limit on the no:of cores. it is limited by your
system's ability to open files and the resources.
the queries are automatically sent to appropriate core if your url is

htt://host:port/corename/select

On Thu, May 14, 2009 at 1:58 PM, KK dioxide.softw...@gmail.com wrote:
 I want to know the maximum no of cores supported by Solr. 1000s or may be
 millions all under one solr instance ?
 Also I want to know how to redirect a particular query to a particular core.
 Actually I'm querying solr from Ajax, so I think there must be some request
 parameter that says which core we want to query, right? Can some one tell me
 how to do this, any good pointers on the same will be helpful as well.
 Thank you.

 --kk




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Delete documents from index with dataimport

2009-05-14 Thread Andrew McCombe

Hi

Yes I'd like the document deleted from Solr and yes, there is a unique
document id field in Solr.

Regards
Andrew

Andrew

2009/5/13 Fergus McMenemie fer...@twig.me.uk:
Hi

Is it possible, through dataimport handler to remove an existing
document from the Solr index?

I import/update from my database where the active field is true.
However, if the client then set's active to false, the document stays
in the Solr index and doesn't get removed.

Regards
Andrew

 Yes but only in the latest trunk. If your active field is false
 do you want to see the document deleted? Do you have another field
 which is a unique ID for the document?

 Fergus
 --

 ===
 Fergus McMenemie               Email:fer...@twig.me.uk
 Techmore Ltd                   Phone:(UK) 07721 376021

 Unix/Mac/Intranets             Analyst Programmer
 ===

UK Solr users meeting?

2009-05-14 Thread Colin Hammond

I was wondering if there is an interest in a UK (South East) solr user 
group meeting


Please let me know if you are interested.  I am happy to organize.

Regards,

Colin

Re: UK Solr users meeting?

2009-05-14 Thread Fergus McMenemie

I was wondering if there is an interest in a UK (South East) solr user 
group meeting

Please let me know if you are interested.  I am happy to organize.

Regards,

Colin

Yes Very interested. I am in lincolnshire.
-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===

Re: Delete documents from index with dataimport

2009-05-14 Thread Fergus McMenemie

Hi

Yes I'd like the document deleted from Solr and yes, there is a unique
document id field in Solr.


I that case try the following. Create a field in the entity:-
  field column=$deleteDocById   
 regex=^false$  
 replaceWith=${jc.id} sourceColName=active/   

Notes. 
1) the entity is assumed to have name=jc.
2) the uniqueKey field is assumed to called id.
3) the entity needs to have transformer=RegexTransformer



2009/5/13 Fergus McMenemie fer...@twig.me.uk:
Hi

Is it possible, through dataimport handler to remove an existing
document from the Solr index?

I import/update from my database where the active field is true.
However, if the client then set's active to false, the document stays
in the Solr index and doesn't get removed.

Regards
Andrew

 Yes but only in the latest trunk. If your active field is false
 do you want to see the document deleted? Do you have another field
 which is a unique ID for the document?

 Fergus

-- 

===
Fergus McMenemie   Email:fer...@twig.me.uk
Techmore Ltd   Phone:(UK) 07721 376021

Unix/Mac/Intranets Analyst Programmer
===

Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread KK

Thank you very much. Got the point.
One off the track question, can we automate the creation of new cores[it
requires manually editing the solr.xml file as I know, and what about the
location of core index directory, do we need to point that manually as
well].
After going through the wiki what I found is we've to mention the names of
cores in solr.xml. I want to automate the process in such a way that when a
user registers[ on say my site for the service], we'll create a coresponding
core for the same user and with a specific core id[unique for this user
only] so that the user will be given a search interface that will redirect
all searches for this user to http://host:port/unique core name for this
user/select
Will apprecite any ideas on this.

Thanks,
KK.

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 there is no hard limit on the no:of cores. it is limited by your
 system's ability to open files and the resources.
 the queries are automatically sent to appropriate core if your url is

 htt://host:port/corename/select

 On Thu, May 14, 2009 at 1:58 PM, KK dioxide.softw...@gmail.com wrote:
  I want to know the maximum no of cores supported by Solr. 1000s or may be
  millions all under one solr instance ?
  Also I want to know how to redirect a particular query to a particular
 core.
  Actually I'm querying solr from Ajax, so I think there must be some
 request
  parameter that says which core we want to query, right? Can some one tell
 me
  how to do this, any good pointers on the same will be helpful as well.
  Thank you.
 
  --kk
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

RE: Autocommit blocking adds? AutoCommit Speedup?

2009-05-14 Thread Gargate, Siddharth

Hi all,
I am also facing the same issue where autocommit blocks all
other requests. I having around 1,00,000 documents with average size of
100K each. It took more than 20 hours to index. 
I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25.
Do I need more configuration changes?
Also I see that memory usage goes to peak level of heap specified(6 GB
in my case). Looks like Solr spends most of the time in GC. 
According to my understanding, fix for Solr-1155 would be that commit
will run in background and new documents will be queued in the memory.
But I am afraid of the memory consumption by this queue if commit takes
much longer to complete.

Thanks,
Siddharth

-Original Message-
From: jayson.minard [mailto:jayson.min...@gmail.com] 
Sent: Saturday, May 09, 2009 10:45 AM
To: solr-user@lucene.apache.org
Subject: Re: Autocommit blocking adds? AutoCommit Speedup?


First cut of updated handler now in:
https://issues.apache.org/jira/browse/SOLR-1155

Needs review from those that know Lucene better, and double check for
errors
in locking or other areas of the code.  Thanks.

--j


jayson.minard wrote:
 
 Can we move this to patch files within the JIRA issue please.  Will
make
 it easier to review and help out a as a patch to current trunk.
 
 --j
 
 
 Jim Murphy wrote:
 
 
 
 Yonik Seeley-2 wrote:
 
 ...your code snippit elided and edited below ...
 
 
 
 
 Don't take this code as correct (or even compiling) but is this the
 essence?  I moved shared access to the writer inside the read lock
and
 kept the other non-commit bits to the write lock.  I'd need to
rethink
 the locking in a more fundamental way but is this close to idea? 
 
 
 
  public void commit(CommitUpdateCommand cmd) throws IOException {
 
 if (cmd.optimize) {
   optimizeCommands.incrementAndGet();
 } else {
   commitCommands.incrementAndGet();
 }
 
 Future[] waitSearcher = null;
 if (cmd.waitSearcher) {
   waitSearcher = new Future[1];
 }
 
 boolean error=true;
 iwCommit.lock();
 try {
   log.info(start +cmd);
 
   if (cmd.optimize) {
 closeSearcher();
 openWriter();
 writer.optimize(cmd.maxOptimizeSegments);
   }
 finally {
   iwCommit.unlock();
  }
 
 
   iwAccess.lock(); 
   try
  {
   writer.commit();
  }
  finally
  {
   iwAccess.unlock(); 
  }
 
   iwCommit.lock(); 
   try
  {
   callPostCommitCallbacks();
   if (cmd.optimize) {
 callPostOptimizeCallbacks();
   }
   // open a new searcher in the sync block to avoid opening it
   // after a deleteByQuery changed the index, or in between
deletes
   // and adds of another commit being done.
   core.getSearcher(true,false,waitSearcher);
 
   // reset commit tracking
   tracker.didCommit();
 
   log.info(end_commit_flush);
 
   error=false;
 }
 finally {
   iwCommit.unlock();
   addCommands.set(0);
   deleteByIdCommands.set(0);
   deleteByQueryCommands.set(0);
   numErrors.set(error ? 1 : 0);
 }
 
 // if we are supposed to wait for the searcher to be registered,
then
 we should do it
 // outside of the synchronized block so that other update
operations
 can proceed.
 if (waitSearcher!=null  waitSearcher[0] != null) {
try {
 waitSearcher[0].get();
   } catch (InterruptedException e) {
 SolrException.log(log,e);
   } catch (ExecutionException e) {
 SolrException.log(log,e);
   }
 }
   }
 
 
 
 
 
 

-- 
View this message in context:
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp2
3435224p23457422.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: master/slave failure scenario

2009-05-14 Thread nk 11

Ok so the VIP will point to the new master. but what makes a slave promoted
to a master? Only the fact that it will receive add/update requests?
And I suppose that this hot promotion is possible only if the slave is
convigured as master also...

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 ideally , we don't do that.
 you can just keep the master host behind a VIP so if you wish to
 change the master make the VIP point to the new host

 On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass...@gmail.com wrote:
  This is more interesting.Such a procedure would involve taking down and
  reconfiguring the slave?
 
  On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot btal...@aeriagames.com
 wrote:
 
  Or ...
 
  1. Promote existing slave to new master
  2. Add new slave to cluster
 
 
 
 
  -Bryan
 
 
 
 
 
  On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
 
   - Migrate configuration files from old master (or backup) to new
 master.
  - Replicate from a slave to the new master.
  - Resume indexing to new master.
 
  -Jay
 
  On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com wrote:
 
   Nice.
  What if the master fails permanently (like a disk crash...) and the
 new
  master is a clean machine?
  2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
   On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com
 wrote:
 
  Hello
 
  I'm kind of new to Solr and I've read about replication, and the
 fact
 
  that a
 
  node can act as both master and slave.
  I a replica fails and then comes back on line I suppose that it will
 
  resyncs
 
  with the master.
 
  right
 
 
  But what happnes if the master fails? A slave that is configured as
 
  master
 
  will kick in? What if that slave is not yes fully sync'ed with the
 
  failed
 
  master and has old data?
 
  if the master fails you can't index the data. but the slaves will
  continue serving the requests with the last index. You an bring back
  the master up and resume indexing.
 
 
  What happens when the original master comes back on line? He will
 
  remain
 
  a
 
  slave because there is another node with the master role?
 
  Thank you!
 
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 
 
 
 



 --
  -
 Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr vs Sphinx

2009-05-14 Thread Michael McCandless

On Wed, May 13, 2009 at 12:33 PM, Grant Ingersoll gsing...@apache.org wrote:
 I've contacted
 others in the past who have done comparisons and after one round of
 emailing it was almost always clear that they didn't know what best
 practices are for any given product and thus were doing things
 sub-optimally.

While I agree, one should properly match  tune all apps they are
testing (for a fair comparison), we in turn must set out-of-the-box
defaults (in Lucene and Solr) that get you as close to the best
practices as possible.

We don't always do that, and I think we should do better.

My most recent example of this is BooleanQuery's performance.  It
turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable
performance gain (27% on my most recent test) for OR queries.

So why haven't we enabled this by default, already?  (As far as I can
tell it's functionally equivalent, as long as the Collector can accept
out-of-order docs, which our core collectors can).

We can't expect the other camp to discover that this obscure setting
must be set, to maximize Lucene's OR query performance.

Mike

Re: Solr vs Sphinx

2009-05-14 Thread Andrey Klochkov



 My most recent example of this is BooleanQuery's performance.  It
 turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable
 performance gain (27% on my most recent test) for OR queries.


Mike,

Can you please point me to some information concerning allowDocsOutOfOrder?
What's this at all?


-- 
Andrew Klochkov

Query syntax

2009-05-14 Thread Radha C.

Hello List,

 

I need to search the multiple values from the same field. I am having the
following syntax 

I am thinking of the first option. Can anyone tell me which one is correct
syntax? 

 

 Q=+title:=test +site_id:=22 3000676 566644

 Q=+title:=test +site_id:=22 3000676 566644

 Q=+title:=test +site_id:=22 +site_id=:3000676

 

Thanks,

Radha.C

Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-14 Thread Jack Godwin

20+ hours? I index 3 million records in 3 hours.  Is your auto commit
causing a snapshot?  What do you have listed in the events.

Jack

On 5/14/09, Gargate, Siddharth sgarg...@ptc.com wrote:
 Hi all,
   I am also facing the same issue where autocommit blocks all
 other requests. I having around 1,00,000 documents with average size of
 100K each. It took more than 20 hours to index.
 I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25.
 Do I need more configuration changes?
 Also I see that memory usage goes to peak level of heap specified(6 GB
 in my case). Looks like Solr spends most of the time in GC.
 According to my understanding, fix for Solr-1155 would be that commit
 will run in background and new documents will be queued in the memory.
 But I am afraid of the memory consumption by this queue if commit takes
 much longer to complete.

 Thanks,
 Siddharth

 -Original Message-
 From: jayson.minard [mailto:jayson.min...@gmail.com]
 Sent: Saturday, May 09, 2009 10:45 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Autocommit blocking adds? AutoCommit Speedup?


 First cut of updated handler now in:
 https://issues.apache.org/jira/browse/SOLR-1155

 Needs review from those that know Lucene better, and double check for
 errors
 in locking or other areas of the code.  Thanks.

 --j


 jayson.minard wrote:

 Can we move this to patch files within the JIRA issue please.  Will
 make
 it easier to review and help out a as a patch to current trunk.

 --j


 Jim Murphy wrote:



 Yonik Seeley-2 wrote:

 ...your code snippit elided and edited below ...




 Don't take this code as correct (or even compiling) but is this the
 essence?  I moved shared access to the writer inside the read lock
 and
 kept the other non-commit bits to the write lock.  I'd need to
 rethink
 the locking in a more fundamental way but is this close to idea?



  public void commit(CommitUpdateCommand cmd) throws IOException {

 if (cmd.optimize) {
   optimizeCommands.incrementAndGet();
 } else {
   commitCommands.incrementAndGet();
 }

 Future[] waitSearcher = null;
 if (cmd.waitSearcher) {
   waitSearcher = new Future[1];
 }

 boolean error=true;
 iwCommit.lock();
 try {
   log.info(start +cmd);

   if (cmd.optimize) {
 closeSearcher();
 openWriter();
 writer.optimize(cmd.maxOptimizeSegments);
   }
 finally {
   iwCommit.unlock();
  }


   iwAccess.lock();
   try
  {
   writer.commit();
  }
  finally
  {
   iwAccess.unlock();
  }

   iwCommit.lock();
   try
  {
   callPostCommitCallbacks();
   if (cmd.optimize) {
 callPostOptimizeCallbacks();
   }
   // open a new searcher in the sync block to avoid opening it
   // after a deleteByQuery changed the index, or in between
 deletes
   // and adds of another commit being done.
   core.getSearcher(true,false,waitSearcher);

   // reset commit tracking
   tracker.didCommit();

   log.info(end_commit_flush);

   error=false;
 }
 finally {
   iwCommit.unlock();
   addCommands.set(0);
   deleteByIdCommands.set(0);
   deleteByQueryCommands.set(0);
   numErrors.set(error ? 1 : 0);
 }

 // if we are supposed to wait for the searcher to be registered,
 then
 we should do it
 // outside of the synchronized block so that other update
 operations
 can proceed.
 if (waitSearcher!=null  waitSearcher[0] != null) {
try {
 waitSearcher[0].get();
   } catch (InterruptedException e) {
 SolrException.log(log,e);
   } catch (ExecutionException e) {
 SolrException.log(log,e);
   }
 }
   }







 --
 View this message in context:
 http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp2
 3435224p23457422.html
 Sent from the Solr - User mailing list archive at Nabble.com.



-- 
Sent from my mobile device

Date field

2009-05-14 Thread Jack Godwin

Does anyone know if there is still a bug in date fields?  I'm having a
problem boosting documents by date in solr 1.3

Thank,
Jack

-- 
Sent from my mobile device

Re: Query syntax

2009-05-14 Thread Shalin Shekhar Mangar

On Thu, May 14, 2009 at 5:20 PM, Radha C. cra...@ceiindia.com wrote:

 I need to search the multiple values from the same field. I am having the
 following syntax

 I am thinking of the first option. Can anyone tell me which one is correct
 syntax?

  Q=+title:=test +site_id:=22 3000676 566644

  Q=+title:=test +site_id:=22 3000676 566644

  Q=+title:=test +site_id:=22 +site_id=:3000676


None of the above. That := is not a valid syntax. The request parameter
should be a lower cased q. The + character signifies must occur
similar to a boolean AND.

Should title:test must match? Should all of 22, 3000676 etc be
present in site_id or just one match is alright?
-- 
Regards,
Shalin Shekhar Mangar.

RE: Query syntax

2009-05-14 Thread Radha C.

Thanks for your reply. 

 

Yes by mistaken I added := in place of : . The title should match and the
site_id should match any of these 23243455 , 245, 3457676 .

 

 

 

  _  

From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Thursday, May 14, 2009 5:43 PM
To: solr-user@lucene.apache.org; cra...@ceiindia.com
Subject: Re: Query syntax

 

On Thu, May 14, 2009 at 5:20 PM, Radha C. cra...@ceiindia.com wrote:

I need to search the multiple values from the same field. I am having the
following syntax

I am thinking of the first option. Can anyone tell me which one is correct
syntax?

 Q=+title:=test +site_id:=22 3000676 566644

 Q=+title:=test +site_id:=22 3000676 566644

 Q=+title:=test +site_id:=22 +site_id=:3000676




None of the above. That := is not a valid syntax. The request parameter
should be a lower cased q. The + character signifies must occur
similar to a boolean AND.

Should title:test must match? Should all of 22, 3000676 etc be
present in site_id or just one match is alright?
-- 
Regards,
Shalin Shekhar Mangar.

Re: Query syntax

2009-05-14 Thread Shalin Shekhar Mangar

In that case, the following will work:

q=+title:test +site_id:(23243455 245 3457676)

On Thu, May 14, 2009 at 5:35 PM, Radha C. cra...@ceiindia.com wrote:

 Thanks for your reply.



 Yes by mistaken I added := in place of : . The title should match and the
 site_id should match any of these 23243455 , 245, 3457676 .







  _

 From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
 Sent: Thursday, May 14, 2009 5:43 PM
 To: solr-user@lucene.apache.org; cra...@ceiindia.com
 Subject: Re: Query syntax



 On Thu, May 14, 2009 at 5:20 PM, Radha C. cra...@ceiindia.com wrote:

 I need to search the multiple values from the same field. I am having the
 following syntax

 I am thinking of the first option. Can anyone tell me which one is correct
 syntax?

  Q=+title:=test +site_id:=22 3000676 566644

  Q=+title:=test +site_id:=22 3000676 566644

  Q=+title:=test +site_id:=22 +site_id=:3000676




 None of the above. That := is not a valid syntax. The request parameter
 should be a lower cased q. The + character signifies must occur
 similar to a boolean AND.

 Should title:test must match? Should all of 22, 3000676 etc be
 present in site_id or just one match is alright?
 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.

Re: master/slave failure scenario

2009-05-14 Thread nk 11

oh, so the configuration must be manualy changed?
Can't something be passed at (re)start time?


   2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com wrote:
  Ok so the VIP will point to the new master. but what makes a slave
 promoted
  to a master? Only the fact that it will receive add/update requests?
  And I suppose that this hot promotion is possible only if the slave is
  convigured as master also...
 right.. By default you can setup all slaves to be master also. It does
 not cost anything if it is not serving any requests.

 so , if you have such a setting you will have to disable that slave to
 be a slave and restart it and you will have to make the VIP point to
 this new slave as master.

 so hot promotion is still not possible.
  
  2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  ideally , we don't do that.
  you can just keep the master host behind a VIP so if you wish to
  change the master make the VIP point to the new host
 
  On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass...@gmail.com
 wrote:
   This is more interesting.Such a procedure would involve taking down
 and
   reconfiguring the slave?
  
   On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
   btal...@aeriagames.comwrote:
  
   Or ...
  
   1. Promote existing slave to new master
   2. Add new slave to cluster
  
  
  
  
   -Bryan
  
  
  
  
  
   On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:
  
- Migrate configuration files from old master (or backup) to new
   master.
   - Replicate from a slave to the new master.
   - Resume indexing to new master.
  
   -Jay
  
   On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com
 wrote:
  
Nice.
   What if the master fails permanently (like a disk crash...) and
 the
   new
   master is a clean machine?
   2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
  
On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com
   wrote:
  
   Hello
  
   I'm kind of new to Solr and I've read about replication, and the
   fact
  
   that a
  
   node can act as both master and slave.
   I a replica fails and then comes back on line I suppose that it
   will
  
   resyncs
  
   with the master.
  
   right
  
  
   But what happnes if the master fails? A slave that is configured
 as
  
   master
  
   will kick in? What if that slave is not yes fully sync'ed with
 the
  
   failed
  
   master and has old data?
  
   if the master fails you can't index the data. but the slaves will
   continue serving the requests with the last index. You an bring
 back
   the master up and resume indexing.
  
  
   What happens when the original master comes back on line? He
 will
  
   remain
  
   a
  
   slave because there is another node with the master role?
  
   Thank you!
  
  
  
  
   --
   -
   Noble Paul | Principal Engineer| AOL | http://aol.com
  
  
  
  
  
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 



 --
  -
 Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr vs Sphinx

2009-05-14 Thread Grant Ingersoll

Totally agree on optimizing out of the box experience, it's just never  
a one size fits all thing.  And we have to be very careful about micro- 
benchmarks driving these settings.  Currently, many of us use  
Wikipedia, but that's just one doc set and I'd venture to say most  
Solr users do not have docs that look anything like Wikipedia.  One of  
the things the Open Relevance project (http://wiki.apache.org/lucene-java/OpenRelevance 
, see the discussion on gene...@lucene.a.o) should aim to do is bring  
in a variety of test collections, from lots of different genres.  This  
will help both with relevance and with speed testing.


-Grant

On May 14, 2009, at 6:47 AM, Michael McCandless wrote:

On Wed, May 13, 2009 at 12:33 PM, Grant Ingersoll  
gsing...@apache.org wrote:

I've contacted
others in the past who have done comparisons and after one round of
emailing it was almost always clear that they didn't know what best
practices are for any given product and thus were doing things
sub-optimally.


While I agree, one should properly match  tune all apps they are
testing (for a fair comparison), we in turn must set out-of-the-box
defaults (in Lucene and Solr) that get you as close to the best
practices as possible.

We don't always do that, and I think we should do better.

My most recent example of this is BooleanQuery's performance.  It
turns out, if you setAllowDocsOutOfOrder(true), it yields a sizable
performance gain (27% on my most recent test) for OR queries.

So why haven't we enabled this by default, already?  (As far as I can
tell it's functionally equivalent, as long as the Collector can accept
out-of-order docs, which our core collectors can).

We can't expect the other camp to discover that this obscure setting
must be set, to maximize Lucene's OR query performance.

Mike


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: master/slave failure scenario

2009-05-14 Thread Noble Paul നോബിള്‍ नोब्ळ्

yeah there is a hack
https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708316

On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com wrote:
sorry for the mail. I wanted to hit reply :(

On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com wrote:

oh, so the configuration must be manualy changed?
Can't something be passed at (re)start time?

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

On Thu, May 14, 2009 at 4:07 PM, nk 11 nick.cass...@gmail.com wrote:
Ok so the VIP will point to the new master. but what makes a slave
promoted
to a master? Only the fact that it will receive add/update requests?
And I suppose that this hot promotion is possible only if the slave
is
convigured as master also...
right.. By default you can setup all slaves to be master also. It does
not cost anything if it is not serving any requests.

so , if you have such a setting you will have to disable that slave to
be a slave and restart it and you will have to make the VIP point to
this new slave as master.

so hot promotion is still not possible.

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

ideally , we don't do that.
you can just keep the master host behind a VIP so if you wish to
change the master make the VIP point to the new host

On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass...@gmail.com
wrote:
This is more interesting.Such a procedure would involve taking down
and
reconfiguring the slave?

On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
btal...@aeriagames.comwrote:

Or ...

1. Promote existing slave to new master
2. Add new slave to cluster

-Bryan

On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:

- Migrate configuration files from old master (or backup) to new
master.
- Replicate from a slave to the new master.
- Resume indexing to new master.

-Jay

On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com
wrote:

Nice.
What if the master fails permanently (like a disk crash...) and
the
new
master is a clean machine?
2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

On Wed, May 13, 2009 at 12:10 PM, nk 11 nick.cass...@gmail.com
wrote:

Hello

I'm kind of new to Solr and I've read about replication, and
the
fact

that a

node can act as both master and slave.
I a replica fails and then comes back on line I suppose that it
will

resyncs

with the master.

right

But what happnes if the master fails? A slave that is
configured as

master

will kick in? What if that slave is not yes fully sync'ed with
the

failed

master and has old data?

if the master fails you can't index the data. but the slaves
will
continue serving the requests with the last index. You an bring
back
the master up and resume indexing.

What happens when the original master comes back on line? He
will

remain

slave because there is another node with the master role?

Thank you!

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Custom Servlet Filter, Where to put filter-mappings

2009-05-14 Thread Erik Hatcher


I like Grant's suggestion as the simplest solution.

As for XML merging and XSLT, I really wouldn't want to go that route  
personally, but one solution that comes close to that is to template  
web.xml with some substitution tags and use Ant's ability to replace  
tokens.  So we could put in @FILTER@ and @FILTER_MAPPING@ placeholders  
in web.xml and pull in the replacements from fragment files.  But even  
with all of these fancy options available, I'd still just use the  
alternate web.xml technique that Grant proposed.


Erik


On May 13, 2009, at 10:55 PM, Jacob Singh wrote:


HI Grant,

That's not a bad idea... I could try that.  I was also looking at  
cactus:

http://jakarta.apache.org/cactus/integration/ant/index.html

It has an ant task to merge XML.  Could this be a contrib-crawl add- 
on?


Alternately, do you know of any xslt templates built for this?  Could
write one, but that's a fair bit of work to support everything.
Perhaps an xslt task combined with a contrib-crawl would do the trick?

Best,
-J

On Wed, May 13, 2009 at 6:07 PM, Grant Ingersoll  
gsing...@apache.org wrote:
Hmmm, maybe we need to think about someway to hook this into the  
build
process or make it easier to just drop it into the conf or lib  
dirs.  I'm no
web.xml expert, but I'm sure you're not the first one to want to do  
this

kind of thing.

The easiest way _might_ be to patch build.xml to take a property  
for the
location of the web.xml, defaulting to the current Solr one.  Then,  
people
who want to use their own version could just pass in - 
Dweb.xml=path to my
web.xml.  The downside to this is that it may cause problems for  
us devs
when users ask questions about strange behavior and it turns out  
they have

mucked up the web.xml

FYI: dist-war is in build.xml, not common-build.xml.

-Grant

On May 12, 2009, at 5:52 AM, Jacob Singh wrote:


Hi folks,

I just wrote a Servlet Filter to handle authentication for our
service.  Here's what I did:

1. Created a dir in contrib
2. Put my project in there, I took the dataimporthandler build.xml  
as

an example and modified it to suit my needs.  Worked great!
3. ant dist now builds my jar and includes it

I now need to modify web.xml to add my filter-mapping, init params,
etc.  How can I do this cleanly?  Or do I need to manually open up  
the

archive and edit it and then re-war it?

In common-build I don't see a target for dist-war, so don't see  
how it

is possible...

Thanks!
Jacob

--

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using

Solr/Lucene:
http://www.lucidimagination.com/search






--

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com

Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread Noble Paul നോബിള്‍ नोब्ळ्

Solr already supports this .
please refer this
http://wiki.apache.org/solr/CoreAdmin#head-7ca1b98a9df8b8ca0dcfbfc49940ed5ac98c4a08

ensure that your solr.xml is persistent
http://wiki.apache.org/solr/CoreAdmin#head-7508c24c6e2dadad2dfea39b2fba045062481da8

On Thu, May 14, 2009 at 3:43 PM, KK dioxide.softw...@gmail.com wrote:
 Thank you very much. Got the point.
 One off the track question, can we automate the creation of new cores[it
 requires manually editing the solr.xml file as I know, and what about the
 location of core index directory, do we need to point that manually as
 well].
 After going through the wiki what I found is we've to mention the names of
 cores in solr.xml. I want to automate the process in such a way that when a
 user registers[ on say my site for the service], we'll create a coresponding
 core for the same user and with a specific core id[unique for this user
 only] so that the user will be given a search interface that will redirect
 all searches for this user to http://host:port/unique core name for this
 user/select
 Will apprecite any ideas on this.

 Thanks,
 KK.

 2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 there is no hard limit on the no:of cores. it is limited by your
 system's ability to open files and the resources.
 the queries are automatically sent to appropriate core if your url is

 htt://host:port/corename/select

 On Thu, May 14, 2009 at 1:58 PM, KK dioxide.softw...@gmail.com wrote:
  I want to know the maximum no of cores supported by Solr. 1000s or may be
  millions all under one solr instance ?
  Also I want to know how to redirect a particular query to a particular
 core.
  Actually I'm querying solr from Ajax, so I think there must be some
 request
  parameter that says which core we want to query, right? Can some one tell
 me
  how to do this, any good pointers on the same will be helpful as well.
  Thank you.
 
  --kk
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr vs Sphinx

2009-05-14 Thread Marvin Humphrey

On Thu, May 14, 2009 at 06:47:01AM -0400, Michael McCandless wrote:
 While I agree, one should properly match  tune all apps they are
 testing (for a fair comparison), we in turn must set out-of-the-box
 defaults (in Lucene and Solr) that get you as close to the best
 practices as possible.

So, should Lucene use the non-compound file format by default because some
idiot's sloppy benchmarks might run a smidge faster, even though that will
cause many users to run out of file descriptors?

Anyone doing comparative benchmarking who doesn't submit their code to the
support list for the software under review is either a dolt or a propagandist.

Good benchmarking is extremely difficult, like all experimental science.  If
there isn't ample evidence that the benchmarker appreciates that, their tests
aren't worth a second thought.  If you don't avail yourself of the help of
experts when assembling your experiment, you are unserious.

Richard Feynman:

...if you're doing an experiment, you should report everything that you
think might make it invalid - not only what you think is right about it:
other causes that could possibly explain your results; and things you
thought of that you've eliminated by some other experiment, and how they
worked - to make sure the other fellow can tell they have been eliminated.

Marvin Humphrey

Re: master/slave failure scenario

2009-05-14 Thread nk 11

wow! that was just a couple of days old!
thanks as lot!
2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

yeah there is a hack

https://issues.apache.org/jira/browse/SOLR-1154?focusedCommentId=12708316page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12708316

On Thu, May 14, 2009 at 6:07 PM, nk 11 nick.cass...@gmail.com wrote:
sorry for the mail. I wanted to hit reply :(

On Thu, May 14, 2009 at 3:37 PM, nk 11 nick.cass...@gmail.com wrote:

oh, so the configuration must be manualy changed?
Can't something be passed at (re)start time?

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

so , if you have such a setting you will have to disable that slave to
be a slave and restart it and you will have to make the VIP point to
this new slave as master.

so hot promotion is still not possible.

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

ideally , we don't do that.
you can just keep the master host behind a VIP so if you wish to
change the master make the VIP point to the new host

On Wed, May 13, 2009 at 10:52 PM, nk 11 nick.cass...@gmail.com
wrote:
This is more interesting.Such a procedure would involve taking
down
and
reconfiguring the slave?

On Wed, May 13, 2009 at 7:55 PM, Bryan Talbot
btal...@aeriagames.comwrote:

Or ...

1. Promote existing slave to new master
2. Add new slave to cluster

-Bryan

On May 13, 2009, at May 13, 9:48 AM, Jay Hill wrote:

- Migrate configuration files from old master (or backup) to new
master.
- Replicate from a slave to the new master.
- Resume indexing to new master.

-Jay

On Wed, May 13, 2009 at 4:26 AM, nk 11 nick.cass...@gmail.com
wrote:

Nice.
What if the master fails permanently (like a disk crash...) and
the
new
master is a clean machine?
2009/5/13 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

On Wed, May 13, 2009 at 12:10 PM, nk 11
nick.cass...@gmail.com
wrote:

Hello

I'm kind of new to Solr and I've read about replication, and
the
fact

that a

node can act as both master and slave.
I a replica fails and then comes back on line I suppose that
it
will

resyncs

with the master.

right

But what happnes if the master fails? A slave that is
configured as

master

will kick in? What if that slave is not yes fully sync'ed
with
the

failed

master and has old data?

if the master fails you can't index the data. but the slaves
will
continue serving the requests with the last index. You an
bring
back
the master up and resume indexing.

What happens when the original master comes back on line? He
will

remain

slave because there is another node with the master role?

Thank you!

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

RE: Autocommit blocking adds? AutoCommit Speedup?

2009-05-14 Thread jayson.minard

Siddharth,

The settings you have in your solrconfig for ramBufferSizeMB and
maxBufferedDocs control how much memory may be used during indexing besides
any overhead with the documents being in-flight at a given moment
(deserialized into memory but not yet handed to lucene). There are
streaming versions of the client/server that help with that as well by
trying to process them as they arrive.

The patch SOLR-1155 does not add more memory use, but rather lets the
threads proceed through to Lucene without blocking within Solr as often. So
instead of a stuck thread holding the documents in memory they will be
moving threads doing the same.

So the buffer sizes mentioned above along with the amount of documents you
send at a time will push your memory footprint. Send smaller batches (less
efficient) or stream; or make sure you have enough memory for the amount of
docs you send at a time.

For indexing I slow my commits down if there is no need for the documents to
become available for query right away. For pure indexing, a long autoCommit
time and large max document count ebfore auto committing helps. Committing
isn't what flushes them out of memory, it is what makes the on-disk version
part of the overall index. Over committing will slow you way down.
Especially if you have any listeners on the commits doing a lot of work
(i.e. Solr distribution).

Also, if you are querying on the indexer that can eat memory and compete
with the memory you are trying to reserve for indexing. So a split model of
indexing and querying on different instances lets you tune each the best;
but then you have a gap in time from indexing to querying as the trade-off.

It is hard to say what is going on with GC without knowing what garbage
collection settings you are passing to the VM, and what version of the Java
VM you are using. Which garbage collector are you using and what tuning
parameters?

I tend to use Parallel GC on my indexers with GC Overhead limit turned off
allowing for some pauses (which users don't see on a back-end indexer) but
good GC with lower heap fragmentation. I tend to use concurrent mark and
sweep GC on my query slaves with tuned incremental mode and pacing which is
a low pause collector taking advantage of the cores on my servers and can
incrementally keep up with the needs of a query slave.

-- Jayson

Gargate, Siddharth wrote:

Hi all,
I am also facing the same issue where autocommit blocks all
other requests. I having around 1,00,000 documents with average size of
100K each. It took more than 20 hours to index.
I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25.
Do I need more configuration changes?
Also I see that memory usage goes to peak level of heap specified(6 GB
in my case). Looks like Solr spends most of the time in GC.
According to my understanding, fix for Solr-1155 would be that commit
will run in background and new documents will be queued in the memory.
But I am afraid of the memory consumption by this queue if commit takes
much longer to complete.

Thanks,
Siddharth

--
View this message in context:
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23540569.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Autocommit blocking adds? AutoCommit Speedup?

2009-05-14 Thread jayson.minard

Indexing speed comes down to a lot of factors. The settings as talked about
above, VM settings, the size of the documents, how many are sent at a time,
how active you can keep the indexer (i.e. one thread sending documents lets
the indexer relax whereas N threads keeps pressure on the indexer), how
often you commit and of course the hardware you are running on. Disk I/O is
a big factor along with having enough cores and memory to buffer and process
the documents.

Comparing two sets of numbers is tough. We have indexes that range from
indexing a few million an hour up through 18-20M per hour in a indexing
cluster for distributed search.

--j

Jack Godwin wrote:

20+ hours? I index 3 million records in 3 hours. Is your auto commit
causing a snapshot? What do you have listed in the events.

Jack

On 5/14/09, Gargate, Siddharth sgarg...@ptc.com wrote:
Hi all,
I am also facing the same issue where autocommit blocks all
other requests. I having around 1,00,000 documents with average size of
100K each. It took more than 20 hours to index.
I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25.
Do I need more configuration changes?
Also I see that memory usage goes to peak level of heap specified(6 GB
in my case). Looks like Solr spends most of the time in GC.
According to my understanding, fix for Solr-1155 would be that commit
will run in background and new documents will be queued in the memory.
But I am afraid of the memory consumption by this queue if commit takes
much longer to complete.

Thanks,
Siddharth

--
View this message in context:
http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp23435224p23540643.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr vs Sphinx

2009-05-14 Thread Michael McCandless

On Thu, May 14, 2009 at 6:51 AM, Andrey Klochkov
akloch...@griddynamics.com wrote:

 Can you please point me to some information concerning allowDocsOutOfOrder?
 What's this at all?

There is this cryptic static setter (in Lucene):

  BooleanQuery.setAllowDocsOutOfOrder(boolean)

It defaults to false, which means BooleanScorer2 will always be used
to compute hits for a BooleanQuery.  When set to true, BooleanScorer
will instead be used, when possible.  BooleanScorer gets better
performance, but it collects docs out of order, which for some
external collectors might cause a problem.

All of Lucene's core collectors work fine with out-of-order collection
(but I'm not sure about Solr's collectors).

If you experiment with this, please post back with your results!

Mike

Additional metadata when using Solr Cell

2009-05-14 Thread rossputin


Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My curl
post has a URL like:

../solr/update/extract?ext.idx.attr=trueext.def.fl=textext.literal.id=123ext.literal.author=Somebody

Sure enough I see in the server logs:

params={ext.def.fl=textext.literal.id=123ext.idx.attr=trueext.literal.author=Somebody}

I am trying to get my field back in the results from a query:

../solr/select?indent=onversion=2.2q=hellostart=0rows=10fl=author%2Cscoreqt=standardwt=standardexplainOther=hl.fl=

I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this field?

Thanks in advance for your help,

 -- Ross
-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Max no of solr cores supported and how to restrict a query to a particular core?

2009-05-14 Thread KK

Thank you very much. LOL, Its in the same wiki I was told to go through.
I've a question regarding creating ofsolr cores on the fly. The wiki says,

.Creates a new core and register it. If persistence is enabled
(persist=true), the configuration for this new core will be saved in
'solr.xml'. If a core with the same name exists, while the new created
core is initializing, the old one will continue to accept requests. Once
it has finished, all new request will go to the new core, and the old
core will be unloaded.

So I've to wait for some time [say a couple of secs, may be less than that]
before I start adding pages to that core. I think this is the way to handle
it , otherwise some content which should have been indexed by the new core,
will get indexed by the existing core[as the wiki says], which I don't want
to happen. Any other ideas for handling the same.


Thanks,
KK.

2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 Solr already supports this .
 please refer this

 http://wiki.apache.org/solr/CoreAdmin#head-7ca1b98a9df8b8ca0dcfbfc49940ed5ac98c4a08

 ensure that your solr.xml is persistent

 http://wiki.apache.org/solr/CoreAdmin#head-7508c24c6e2dadad2dfea39b2fba045062481da8

 On Thu, May 14, 2009 at 3:43 PM, KK dioxide.softw...@gmail.com wrote:
  Thank you very much. Got the point.
  One off the track question, can we automate the creation of new cores[it
  requires manually editing the solr.xml file as I know, and what about the
  location of core index directory, do we need to point that manually as
  well].
  After going through the wiki what I found is we've to mention the names
 of
  cores in solr.xml. I want to automate the process in such a way that when
 a
  user registers[ on say my site for the service], we'll create a
 coresponding
  core for the same user and with a specific core id[unique for this user
  only] so that the user will be given a search interface that will
 redirect
  all searches for this user to http://host:port/unique core name for
 this
  user/select
  Will apprecite any ideas on this.
 
  Thanks,
  KK.
 
  2009/5/14 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com
 
  there is no hard limit on the no:of cores. it is limited by your
  system's ability to open files and the resources.
  the queries are automatically sent to appropriate core if your url is
 
  htt://host:port/corename/select
 
  On Thu, May 14, 2009 at 1:58 PM, KK dioxide.softw...@gmail.com wrote:
   I want to know the maximum no of cores supported by Solr. 1000s or may
 be
   millions all under one solr instance ?
   Also I want to know how to redirect a particular query to a particular
  core.
   Actually I'm querying solr from Ajax, so I think there must be some
  request
   parameter that says which core we want to query, right? Can some one
 tell
  me
   how to do this, any good pointers on the same will be helpful as well.
   Thank you.
  
   --kk
  
 
 
 
  --
  -
  Noble Paul | Principal Engineer| AOL | http://aol.com
 
 



 --
 -
 Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Custom Servlet Filter, Where to put filter-mappings

2009-05-14 Thread Jacob Singh

I found a very elegant (I think) solution to this.

I'll post a patch today or tomorrow.

Best,
-Jacob

On Thu, May 14, 2009 at 6:22 PM, Erik Hatcher
e...@ehatchersolutions.com wrote:
 I like Grant's suggestion as the simplest solution.

 As for XML merging and XSLT, I really wouldn't want to go that route
 personally, but one solution that comes close to that is to template web.xml
 with some substitution tags and use Ant's ability to replace tokens.  So we
 could put in @FILTER@ and @FILTER_MAPPING@ placeholders in web.xml and pull
 in the replacements from fragment files.  But even with all of these fancy
 options available, I'd still just use the alternate web.xml technique that
 Grant proposed.

        Erik


 On May 13, 2009, at 10:55 PM, Jacob Singh wrote:

 HI Grant,

 That's not a bad idea... I could try that.  I was also looking at cactus:
 http://jakarta.apache.org/cactus/integration/ant/index.html

 It has an ant task to merge XML.  Could this be a contrib-crawl add-on?

 Alternately, do you know of any xslt templates built for this?  Could
 write one, but that's a fair bit of work to support everything.
 Perhaps an xslt task combined with a contrib-crawl would do the trick?

 Best,
 -J

 On Wed, May 13, 2009 at 6:07 PM, Grant Ingersoll gsing...@apache.org
 wrote:

 Hmmm, maybe we need to think about someway to hook this into the build
 process or make it easier to just drop it into the conf or lib dirs.  I'm
 no
 web.xml expert, but I'm sure you're not the first one to want to do this
 kind of thing.

 The easiest way _might_ be to patch build.xml to take a property for the
 location of the web.xml, defaulting to the current Solr one.  Then,
 people
 who want to use their own version could just pass in -Dweb.xml=path to
 my
 web.xml.  The downside to this is that it may cause problems for us devs
 when users ask questions about strange behavior and it turns out they
 have
 mucked up the web.xml

 FYI: dist-war is in build.xml, not common-build.xml.

 -Grant

 On May 12, 2009, at 5:52 AM, Jacob Singh wrote:

 Hi folks,

 I just wrote a Servlet Filter to handle authentication for our
 service.  Here's what I did:

 1. Created a dir in contrib
 2. Put my project in there, I took the dataimporthandler build.xml as
 an example and modified it to suit my needs.  Worked great!
 3. ant dist now builds my jar and includes it

 I now need to modify web.xml to add my filter-mapping, init params,
 etc.  How can I do this cleanly?  Or do I need to manually open up the
 archive and edit it and then re-war it?

 In common-build I don't see a target for dist-war, so don't see how it
 is possible...

 Thanks!
 Jacob

 --

 +1 510 277-0891 (o)
 +91  33 7458 (m)

 web: http://pajamadesign.com

 Skype: pajamadesign
 Yahoo: jacobsingh
 AIM: jacobsingh
 gTalk: jacobsi...@gmail.com

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search





 --

 +1 510 277-0891 (o)
 +91  33 7458 (m)

 web: http://pajamadesign.com

 Skype: pajamadesign
 Yahoo: jacobsingh
 AIM: jacobsingh
 gTalk: jacobsi...@gmail.com





-- 

+1 510 277-0891 (o)
+91  33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com

Re: Additional metadata when using Solr Cell

2009-05-14 Thread Grant Ingersoll


what does /admin/luke show for fields and terms in the fields?

On May 14, 2009, at 10:03 AM, rossputin wrote:



Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My  
curl

post has a URL like:

../solr/update/extract? 
ext 
.idx 
.attr 
=trueext.def.fl=textext.literal.id=123ext.literal.author=Somebody


Sure enough I see in the server logs:

params 
= 
{ext 
.def 
.fl 
= 
textext.literal.id=123ext.idx.attr=trueext.literal.author=Somebody}


I am trying to get my field back in the results from a query:

../solr/select? 
indent=onversion=2.2q=hellostart=0rows=10fl=author 
%2Cscoreqt=standardwt=standardexplainOther=hl.fl=


I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this  
field?


Thanks in advance for your help,

-- Ross
--
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Additional metadata when using Solr Cell

2009-05-14 Thread rossputin


There is no reference to the author field I am trying to set.. I am using the
latest nightly download.

 -- Ross


Grant Ingersoll-6 wrote:
 
 what does /admin/luke show for fields and terms in the fields?
 
 On May 14, 2009, at 10:03 AM, rossputin wrote:
 

 Hi.

 I am indexing a PDF document with the ExtractingRequestHandler.  My  
 curl
 post has a URL like:

 ../solr/update/extract? 
 ext 
 .idx 
 .attr 
 =trueext.def.fl=textext.literal.id=123ext.literal.author=Somebody

 Sure enough I see in the server logs:

 params 
 = 
 {ext 
 .def 
 .fl 
 = 
 textext.literal.id=123ext.idx.attr=trueext.literal.author=Somebody}

 I am trying to get my field back in the results from a query:

 ../solr/select? 
 indent=onversion=2.2q=hellostart=0rows=10fl=author 
 %2Cscoreqt=standardwt=standardexplainOther=hl.fl=

 I see the score in the results 'doc' but no reference to author.

 Can anyone advise on what I am forgetting to do, to get hold of this  
 field?

 Thanks in advance for your help,

 -- Ross
 -- 
 View this message in context:
 http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 --
 Grant Ingersoll
 http://www.lucidimagination.com/
 
 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
 using Solr/Lucene:
 http://www.lucidimagination.com/search
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Additional metadata when using Solr Cell

2009-05-14 Thread Grant Ingersoll


Do you have an author field in your schema?

On May 14, 2009, at 10:31 AM, rossputin wrote:



There is no reference to the author field I am trying to set.. I am  
using the

latest nightly download.

-- Ross


Grant Ingersoll-6 wrote:


what does /admin/luke show for fields and terms in the fields?

On May 14, 2009, at 10:03 AM, rossputin wrote:



Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My
curl
post has a URL like:

../solr/update/extract?
ext
.idx
.attr
=trueext.def.fl=textext.literal.id=123ext.literal.author=Somebody

Sure enough I see in the server logs:

params
=
{ext
.def
.fl
=
text 
ext.literal.id=123ext.idx.attr=trueext.literal.author=Somebody}


I am trying to get my field back in the results from a query:

../solr/select?
indent=onversion=2.2q=hellostart=0rows=10fl=author
%2Cscoreqt=standardwt=standardexplainOther=hl.fl=

I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this
field?

Thanks in advance for your help,

-- Ross
--
View this message in context:
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541256.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search





--
View this message in context: 
http://www.nabble.com/Additional-metadata-when-using-Solr-Cell-tp23541256p23541857.html
Sent from the Solr - User mailing list archive at Nabble.com.



--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

AW: AW: Geographical search based on latitude and longitude

2009-05-14 Thread Norman Leutner

Hi Grant, thanks for the reply.

Is the logic for a function query that calculates distances that Yonik 
mentioned (gdist(position,101.2,234.3))
already implemented?

This could be either very inaccurate or load intense.

If the logic isn't done until now maybe I can prepare it.

Norman


-Ursprüngliche Nachricht-
Von: Grant Ingersoll [mailto:gsing...@apache.org] 
Gesendet: Dienstag, 12. Mai 2009 19:43
An: solr-user@lucene.apache.org
Betreff: Re: AW: Geographical search based on latitude and longitude

Yes, that is part of it, but there is more to it.  See Yonik's comment  
about needs further down.


On May 12, 2009, at 7:36 AM, Norman Leutner wrote:

 So are you using boundary box to find results within a given range(km)
 like mentioned here: 
 http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html 
  ?


 Best regards

 Norman Leutner
 all2e GmbH

 -Ursprüngliche Nachricht-
 Von: Grant Ingersoll [mailto:gsing...@apache.org]
 Gesendet: Dienstag, 12. Mai 2009 13:18
 An: solr-user@lucene.apache.org
 Betreff: Re: Geographical search based on latitude and longitude

 See https://issues.apache.org/jira/browse/SOLR-773.  In other words,
 we're working on it and would love some help!

 -Grant

 On May 12, 2009, at 7:12 AM, Norman Leutner wrote:

 Hi together,

 I'm new to Solr and want to port a geographical range search from
 MySQL to Solr.

 Currently I'm using some mathematical functions (based on GRS80
 modell) directly within MySQL to calculate
 the actual distance from the locations within the database to a
 current location (lat and long are known):

 $query=SELECT street, zip, city, state, country, .
 $radius.*ACOS(cos(RADIANS(latitude))*cos(.
 $theta.)*(sin(RADIANS(longitude))*sin(.$phi.)
 +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(.
 $theta.)) AS Distance FROM ezgis_position WHERE .
 $radius.*ACOS(cos(RADIANS(latitude))*cos(.
 $theta.)*(sin(RADIANS(longitude))*sin(.$phi.)
 +cos(RADIANS(longitude))*cos(.$phi.))+sin(RADIANS(latitude))*sin(.
 $theta.)) = .$range. ORDER BY Distance;

 This works pretty fine and fast. Due to we want to include this
 within our Solr search result I would like to have a attribute like
 actual_distance within the result. Is there a way to use those
 functions like (radians, sin, acos,...) directly within Solr?

 Thanks in advance for any feedback
 Norman Leutner

 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
 using Solr/Lucene:
 http://www.lucidimagination.com/search


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Re: Solr vs Sphinx

2009-05-14 Thread gdeconto




Yonik Seeley-2 wrote:
 
 It's probably the case that every search engine out there is faster
 than Solr at one thing or another, and that Solr is faster or better
 at some other things.
 
 I prefer to spend my time improving Solr rather than engage in
 benchmarking wars... and Solr 1.4 will have a ton of speed
 improvements over Solr 1.3.
 
 -Yonik
 http://www.lucidimagination.com
 
 

Solr is very fast even with 1.3 and the developers have done an incredible
job.

However, maybe the next Solr improvement should be the creation of a
configuration manager and/or automated tuning tool.  I know that optimizing
Solr performance can be time consuming and sometimes frustrating.


-- 
View this message in context: 
http://www.nabble.com/Solr-vs-Sphinx-tp23524676p23544492.html
Sent from the Solr - User mailing list archive at Nabble.com.

CommonsHttpSolrServer vs EmbeddedSolrServer

2009-05-14 Thread sachin78


What is the difference between EmbeddedSolrServer and CommonsHttpSolrServer.
Which is the preferred server to use?

In some blog i read that EmbeddedSolrServer  is 50% faster than
CommonsHttpSolrServer,then why do we need to use CommonsHttpSolrServer.

Can anyone please guide me the right path/way.So that i pick the right
implementation.

Thanks in advance.

--Sachin
-- 
View this message in context: 
http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Replication master+slave

2009-05-14 Thread Bryan Talbot


https://issues.apache.org/jira/browse/SOLR-1167



-Bryan




On May 13, 2009, at May 13, 7:20 PM, Otis Gospodnetic wrote:



Bryan, maybe it's time to stick this in JIRA?
http://wiki.apache.org/solr/HowToContribute

Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Bryan Talbot btal...@aeriagames.com
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 10:11:21 PM
Subject: Re: Replication master+slave

I think the patch I included earlier covers solr core, but it looks  
like at
least some other extensions (DIH) create and use their own XML  
parser.  So, if
this functionality is to extend to all XML files, those will need  
similar

patches.

Here's one for DIH:

--- src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java

(revision 774137)
+++ src/main/java/org/apache/solr/handler/dataimport/ 
DataImporter.java  (working

copy)
@@ -148,8 +148,10 @@
  void loadDataConfig(String configFile) {

try {
-  DocumentBuilder builder = DocumentBuilderFactory.newInstance()
-  .newDocumentBuilder();
+  DocumentBuilderFactory dbf =  
DocumentBuilderFactory.newInstance();

+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  DocumentBuilder builder = dbf.newDocumentBuilder();
  Document document = builder.parse(new InputSource(new  
StringReader(

  configFile)));



The only down side I can see to this is it doesn't offer very  
expressive
conditional inclusion: the file is included if it's present  
otherwise fallback
inclusions can be used.  It's also specific to XML files and  
obviously won't
work for other types of configuration files.  However, it is simple  
and

effective.


-Bryan




On May 13, 2009, at May 13, 6:36 PM, Otis Gospodnetic wrote:



Coincidentally, from
http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ 
 :


Hadoop configuration files now support XInclude elements for  
including
portions of another configuration file (HADOOP-4944). This  
mechanism allows you

to make configuration files more modular and reusable.


So others are doing it, too.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Bryan Talbot
To: solr-user@lucene.apache.org
Sent: Wednesday, May 13, 2009 11:26:41 AM
Subject: Re: Replication master+slave

I see that Nobel's final comment in SOLR-1154 is that config  
files need to be
able to include snippets from external files.  In my limited  
testing, a

simple

patch to enable XInclude support seems to work.



--- src/java/org/apache/solr/core/Config.java   (revision 774137)
+++ src/java/org/apache/solr/core/Config.java   (working copy)
@@ -100,8 +100,10 @@
if (lis == null) {
  lis = loader.openConfig(name);
}
-  javax.xml.parsers.DocumentBuilder builder =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
-  doc = builder.parse(lis);
+  javax.xml.parsers.DocumentBuilderFactory dbf =
DocumentBuilderFactory.newInstance();
+  dbf.setNamespaceAware(true);
+  dbf.setXIncludeAware(true);
+  doc = dbf.newDocumentBuilder().parse(lis);

  DOMUtil.substituteProperties(doc, loader.getCoreProperties());
} catch (ParserConfigurationException e)  {



This allows a clause like this to include the contents of  
replication.xml if

it

exists.  If it's not found an exception will be thrown.



href=http://localhost:8983/solr/corename/admin/file/?file=replication.xml 


   xmlns:xi=http://www.w3.org/2001/XInclude;



If the file is optional and no exception should be thrown if the  
file is
missing, simply include a fallback action: in this case the  
fallback is empty

and does nothing.



href=http://localhost:8983/solr/forum_en/admin/file/?file=replication.xml 


   xmlns:xi=http://www.w3.org/2001/XInclude;




-Bryan




On May 12, 2009, at May 12, 8:05 PM, Jian Han Guo wrote:

I was looking at the same problem, and had a discussion with  
Noble. You can

use a hack to achieve what you want, see

https://issues.apache.org/jira/browse/SOLR-1154

Thanks,

Jianhan


On Tue, May 12, 2009 at 5:13 PM, Bryan Talbot wrote:

So how are people managing solrconfig.xml files which are  
largely the same

other than differences for replication?

I don't think it's a good thing to maintain two copies of the  
same file
and I'd like to avoid that.  Maybe enabling the XInclude  
feature in
DocumentBuilders would make it possible to modularize  
configuration files

to

make this possible?






http://java.sun.com/j2se/1.5.0/docs/api/javax/xml/parsers/DocumentBuilderFactory.html#setXIncludeAware(boolean)



-Bryan





On May 12, 2009, at May 12, 11:43 AM, Shalin Shekhar Mangar  
wrote:


On Tue, May 12, 2009 at 10:42 PM, Bryan Talbot

wrote:


For replication in 1.4, the wiki at
http://wiki.apache.org/solr/SolrReplication says that a node  
can be both

the master and a slave:

A node can act as both master and slave. In that case

Powered by Solr

2009-05-14 Thread Terence Gannon

I was intending to make an entry to the 'Powered by Solr' page, so I
created a Wiki account and logged in.  When I go to that page, it
shows it as being 'immutable', which I take as meaning I can't edit
it.  Is there someone I can send the information to who can do the
edit?  Or perhaps there is some sort of trick to editing that page?
Thanks for your help, and apologies in advance if this is a silly
question...

Terence

Re: Powered by Solr

2009-05-14 Thread Yonik Seeley

On Thu, May 14, 2009 at 1:54 PM, Terence Gannon porfa...@gmail.com wrote:
 I was intending to make an entry to the 'Powered by Solr' page, so I
 created a Wiki account and logged in.  When I go to that page, it
 shows it as being 'immutable', which I take as meaning I can't edit
 it.

Did you try hitting refresh on your browser after you logged in?

-Yonik
http://www.lucidimagination.com

Re: Solr vs Sphinx

2009-05-14 Thread Michael McCandless

On Thu, May 14, 2009 at 9:07 AM, Marvin Humphrey mar...@rectangular.com wrote:
 Richard Feynman:

...if you're doing an experiment, you should report everything that you
think might make it invalid - not only what you think is right about it:
other causes that could possibly explain your results; and things you
thought of that you've eliminated by some other experiment, and how they
worked - to make sure the other fellow can tell they have been eliminated.

Excellent quote!

 So, should Lucene use the non-compound file format by default because some
 idiot's sloppy benchmarks might run a smidge faster, even though that will
 cause many users to run out of file descriptors?

No, I don't think we should change that default.

Nor (for example) can we switch to SweetSpotSimilarity by default,
even though it seems to improve relevance, because it requires
app-dependent configuration.

Nor should we set IndexWriter's RAM buffer to 1 GB.  Etc.

But when there is a choice that has near zero downside and improves
performance (like my example), we should make the switch.

Making IndexReader.open return a readOnly reader is another example
(... which we plan to do in 3.0).

Every time Lucene or Solr has a default built-in setting, we should
think carefully about how to set it.

 Anyone doing comparative benchmarking who doesn't submit their code to the
 support list for the software under review is either a dolt or a propagandist.

 Good benchmarking is extremely difficult, like all experimental science.  If
 there isn't ample evidence that the benchmarker appreciates that, their tests
 aren't worth a second thought.  If you don't avail yourself of the help of
 experts when assembling your experiment, you are unserious.

Agreed.

Mike

Re: Solr memory requirements?

2009-05-14 Thread vivek sar

I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson erickerick...@gmail.com wrote:
 Warning: I'm wy out of my competency range when I comment
 on SOLR, but I've seen the statement that string fields are NOT
 tokenized while text fields are, and I notice that almost all of your fields
 are string type.

 Would someone more knowledgeable than me care to comment on whether
 this is at all relevant? Offered in the spirit that sometimes there are
 things
 so basic that only an amateur can see them G

 Best
 Erick

 On Wed, May 13, 2009 at 4:42 PM, vivek sar vivex...@gmail.com wrote:

 Thanks Otis.

 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.

 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.

 Is there any field here that might be getting sorted?

  field name=id type=long indexed=true stored=true
 required=true omitNorms=true compressed=false/

   field name=atmps type=integer indexed=false stored=true
 compressed=false/
   field name=bcid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=cmpcd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=ctry type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=dlt type=date indexed=false stored=true
 default=NOW/HOUR  compressed=false/
   field name=dmn type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=eaddr type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=emsg type=string indexed=false stored=true
 compressed=false/
   field name=erc type=string indexed=false stored=true
 compressed=false/
   field name=evt type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=from type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lfid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=prsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=rc type=string indexed=false stored=true
 compressed=false/
   field name=rmcd type=string indexed=false stored=true
 compressed=false/
   field name=rmscd type=string indexed=false stored=true
 compressed=false/
   field name=scd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=sip type=string indexed=false stored=true
 compressed=false/
   field name=ts type=date indexed=true stored=false
 default=NOW/HOUR omitNorms=true/


   !-- catchall field, containing all other searchable text fields
 (implemented
        via copyField further on in this schema  --
   field name=all type=text_ws indexed=true stored=false
 omitNorms=true multiValued=true/

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic
 otis_gospodne...@yahoo.com wrote:
 
  Hi,
  Some answers:
  1) .tii files in the Lucene index.  When you sort, all distinct values
 for the field(s) used for sorting.  Similarly for facet fields.  Solr
 caches.
  2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will
 consume during indexing.  There is no need to commit every 50K docs unless
 you want to trigger snapshot creation.
  3) see 1) above
 
  1.5 billion docs per instance where each doc is cca 1KB?  I doubt that's
 going to fly. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 
  - Original Message 
  From: vivek sar vivex...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Wednesday, May 13, 2009 3:04:46 PM
  Subject: Solr memory requirements?
 
  Hi,
 
    I'm pretty sure this has been asked before, but I couldn't find a
  complete answer in the forum archive. Here are my questions,
 
  1) When solr starts up what does it loads up in the memory? Let's say
  I've 4 cores with each core 50G in size. When Solr comes up how much
  of it would be loaded in memory?
 
  2) How much memory is required during index time? If I'm committing
  50K records at a time (1 record = 1KB) using solrj, how much memory do
  I need to give to Solr.
 
  3) Is there a minimum memory requirement by Solr to maintain a certain
  size index? Is there any benchmark on this?
 
  Here are some of my configuration from solrconfig.xml,
 
  1) 64
  2) All the caches (under query tag) are commented out
  3) Few others,
        a)  true    ==
  would this require

Re: Powered by Solr

2009-05-14 Thread Terence Gannon

 Did you try hitting refresh on your browser after you logged in?

Wow, I really should have known that...thank you for your patient reply, Yonik.

Regards...Terence

replication of lucene-write.lock file

2009-05-14 Thread Bryan Talbot



When using solr 1.4 replication, I see that the lucene-write.lock file  
is being replicated to slaves.  I'm importing data from a db every 5  
minutes using cron to trigger a DIH delta-import.  Replication polls  
every 60 seconds and the master is configured to take a snapshot  
(replicateAfter) commit.


Why should the lock file be replicated to slaves?

The lock file isn't stale on the master and is absent unless the delta- 
import is in process.  I've not tried it yet, but with the lock file  
replicated, it seems like promotion of a slave to a master in a  
failure recovery scenario requires the manual removal of the lock file.




-Bryan

Re: CommonsHttpSolrServer vs EmbeddedSolrServer

2009-05-14 Thread Eric Pugh

CommonsHttpSolrServer is how you access Solr from a Java client via  
HTTP.  You can connect to a Solr running anywhere   
EmbeddedSolrServer starts up Solr internally, and connects directly,  
all in a single JVM...  Embedded may be faster, the jury is out, but  
you have to have your Solr server and your Solr client on the same  
box...   Unless you really need it, I would start with  
CommonsHttpSolrServer, it's easier to configure and get going with and  
more flexible.


Eric


On May 14, 2009, at 1:30 PM, sachin78 wrote:



What is the difference between EmbeddedSolrServer and  
CommonsHttpSolrServer.

Which is the preferred server to use?

In some blog i read that EmbeddedSolrServer  is 50% faster than
CommonsHttpSolrServer,then why do we need to use  
CommonsHttpSolrServer.


Can anyone please guide me the right path/way.So that i pick the right
implementation.

Thanks in advance.

--Sachin
--
View this message in context: 
http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html
Sent from the Solr - User mailing list archive at Nabble.com.



-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: CommonsHttpSolrServer vs EmbeddedSolrServer

2009-05-14 Thread Ryan McKinley

right -- which one you pick will depend more on your runtime
environment then anything else.

If you need to hit a server (on a different machine)
CommonsHttpSolrServer is your only option.

If you are running an embedded application -- where your custom code
lives in the same JVM as solr -- you can use EmbeddedSolrServer. The
nice thing is that since they are the same interface, you can change
later.

The performance comments on the wiki can be a bit misleading -- yes,
in some cases embedded could be faster, but that may depend on how you
are sending things -- are you sending 1000s of single document
requests really fast? If so, try sending a bunch of documents
together in one request.

Also consider using the StreamingHttpSolrServer (https://issues.apache.org/jira/browse/SOLR-906
) -- it has a few quirks, but can be much faster.

In any case, as long as you program against the SolrServer interface,
then you could swap the implementation as needed.

ryan

On May 14, 2009, at 3:35 PM, Eric Pugh wrote:

CommonsHttpSolrServer is how you access Solr from a Java client via
HTTP. You can connect to a Solr running anywhere
EmbeddedSolrServer starts up Solr internally, and connects directly,
all in a single JVM... Embedded may be faster, the jury is out, but
you have to have your Solr server and your Solr client on the same
box... Unless you really need it, I would start with
CommonsHttpSolrServer, it's easier to configure and get going with
and more flexible.

Eric

On May 14, 2009, at 1:30 PM, sachin78 wrote:

What is the difference between EmbeddedSolrServer and
CommonsHttpSolrServer.

Which is the preferred server to use?

In some blog i read that EmbeddedSolrServer is 50% faster than
CommonsHttpSolrServer,then why do we need to use
CommonsHttpSolrServer.

Can anyone please guide me the right path/way.So that i pick the
right

implementation.

Thanks in advance.

--Sachin
--
View this message in context:
http://www.nabble.com/CommonsHttpSolrServer-vs-EmbeddedSolrServer-tp23545281p23545281.html
Sent from the Solr - User mailing list archive at Nabble.com.

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 |
http://www.opensourceconnections.com
Free/Busy: http://tinyurl.com/eric-cal

Re: Solr vs Sphinx

2009-05-14 Thread Mike Klaas



On 14-May-09, at 9:46 AM, gdeconto wrote:


Solr is very fast even with 1.3 and the developers have done an  
incredible

job.

However, maybe the next Solr improvement should be the creation of a
configuration manager and/or automated tuning tool.  I know that  
optimizing

Solr performance can be time consuming and sometimes frustrating.


Making Solr more self-service has been a theme we have had and  
should strive to move toward.  In some respects, extreme  
configurability is a liability, if considerable tweaking and  
experimentation is needed to achieve optimum results.  You can't  
expect everyone to put in the investment to develop the expertise.


That said, it is very difficult to come up with appropriate auto- 
tuning heuristics that don't fail.  It almost calls for a level higher  
than Solr that you could hint what you want to do with the field  
(sort, facet, etc.), and it makes the field definitions  
appropriately.  The problem with such abstractions is that they are  
invariably leaky, and thus diagnosing problems requires similar  
expertise as omitting the abstraction step in the first place.


Getting this trade-off right is one of the central problems of  
computer science.


-Mike

Re: Solr memory requirements?

2009-05-14 Thread vivek sar

Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num #instances #bytes  class name
--
   1:   3890855  222608992  [C
   2:   3891673  155666920  java.lang.String
   3:   3284341  131373640  org.apache.lucene.index.TermInfo
   4:   3334198  106694336  org.apache.lucene.index.Term
   5:   271   26286496  [J
   6:16   26273936  [Lorg.apache.lucene.index.Term;
   7:16   26273936  [Lorg.apache.lucene.index.TermInfo;
   8:320512   15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9: 10335   11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar vivex...@gmail.com wrote:
 I don't know if field type has any impact on the memory usage - does it?

 Our use cases require complete matches, thus there is no need of any
 analysis in most cases - does it matter in terms of memory usage?

 Also, is there any default caching used by Solr if I comment out all
 the caches under query in solrconfig.xml? I also don't have any
 auto-warming queries.

 Thanks,
 -vivek

 On Wed, May 13, 2009 at 4:24 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 Warning: I'm wy out of my competency range when I comment
 on SOLR, but I've seen the statement that string fields are NOT
 tokenized while text fields are, and I notice that almost all of your fields
 are string type.

 Would someone more knowledgeable than me care to comment on whether
 this is at all relevant? Offered in the spirit that sometimes there are
 things
 so basic that only an amateur can see them G

 Best
 Erick

 On Wed, May 13, 2009 at 4:42 PM, vivek sar vivex...@gmail.com wrote:

 Thanks Otis.

 Our use case doesn't require any sorting or faceting. I'm wondering if
 I've configured anything wrong.

 I got total of 25 fields (15 are indexed and stored, other 10 are just
 stored). All my fields are basic data type - which I thought are not
 sorted. My id field is unique key.

 Is there any field here that might be getting sorted?

  field name=id type=long indexed=true stored=true
 required=true omitNorms=true compressed=false/

   field name=atmps type=integer indexed=false stored=true
 compressed=false/
   field name=bcid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=cmpcd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=ctry type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=dlt type=date indexed=false stored=true
 default=NOW/HOUR  compressed=false/
   field name=dmn type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=eaddr type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=emsg type=string indexed=false stored=true
 compressed=false/
   field name=erc type=string indexed=false stored=true
 compressed=false/
   field name=evt type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=from type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lfid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=lsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=prsid type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=rc type=string indexed=false stored=true
 compressed=false/
   field name=rmcd type=string indexed=false stored=true
 compressed=false/
   field name=rmscd type=string indexed=false stored=true
 compressed=false/
   field name=scd type=string indexed=true stored=true
 omitNorms=true compressed=false/
   field name=sip type=string indexed=false stored=true
 compressed=false/
   field

Re: Solr vs Sphinx

2009-05-14 Thread Mark Miller


Michael McCandless wrote:
So why haven't we enabled this by default, already? 

Why isn't Lucene done already :)

- Mark

Search Query Questions

2009-05-14 Thread Chris Miller


I have two questions:

1) How do I search for ALL items? For example, I provide a sort query  
parameter of updated and a rows query parameter of 10 to limit the  
query results. I still have to provide a search query, of course. What  
if I want to provide a list of ALL results that match this? Or, in  
this case, the most recent 10 updated documents?


2) How do I search for all documents with a field that has data? For  
example, I have a field foo that is optional and multi-valued. How  
do I search for documents that have this field set to anything.


Thanks,

Chris Miller
ServerMotion
www.servermotion.com

Re: Search Query Questions

2009-05-14 Thread Chris Miller


Oh, one more question

3) Is there a way to effectively do a GROUP BY? For example, if I have  
a document that has a photoID attached to it, is there a way to return  
a set of results that does not duplicate the photoID field?


Thanks,

Chris Miller
ServerMotion
www.servermotion.com



On May 14, 2009, at 7:46 PM, Chris Miller wrote:


I have two questions:

1) How do I search for ALL items? For example, I provide a sort  
query parameter of updated and a rows query parameter of 10 to  
limit the query results. I still have to provide a search query, of  
course. What if I want to provide a list of ALL results that match  
this? Or, in this case, the most recent 10 updated documents?


2) How do I search for all documents with a field that has data? For  
example, I have a field foo that is optional and multi-valued. How  
do I search for documents that have this field set to anything.


Thanks,

Chris Miller
ServerMotion
www.servermotion.com

Re: Additional metadata when using Solr Cell

2009-05-14 Thread Mark Miller


rossputin wrote:

Hi.

I am indexing a PDF document with the ExtractingRequestHandler.  My curl
post has a URL like:

../solr/update/extract?ext.idx.attr=trueext.def.fl=textext.literal.id=123ext.literal.author=Somebody

Sure enough I see in the server logs:

params={ext.def.fl=textext.literal.id=123ext.idx.attr=trueext.literal.author=Somebody}

I am trying to get my field back in the results from a query:

../solr/select?indent=onversion=2.2q=hellostart=0rows=10fl=author%2Cscoreqt=standardwt=standardexplainOther=hl.fl=

I see the score in the results 'doc' but no reference to author.

Can anyone advise on what I am forgetting to do, to get hold of this field?

Thanks in advance for your help,

 -- Ross
  
Have you added author to the schema? If not, and if you are using the 
example config (that uses ext.ignore.und.fl=true), the field could just 
be ignored. Define it and it should be filled.


--
- Mark

http://www.lucidimagination.com

Re: Solr memory requirements?

2009-05-14 Thread Mark Miller


800 million docs is on the high side for modern hardware.

If even one field has norms on, your talking almost 800 MB right there. 
And then if another Searcher is brought up well the old one is serving 
(which happens when you update)? Doubled.


Your best bet is to distribute across a couple machines.

To minimize you would want to turn off or down caching, don't facet, 
don't sort, turn off all norms, possibly get at the Lucene term interval 
and raise it. Drop on deck searchers setting. Even then, 800 
million...time to distribute I'd think.


vivek sar wrote:

Some update on this issue,

1) I attached jconsole to my app and monitored the memory usage.
During indexing the memory usage goes up and down, which I think is
normal. The memory remains around the min heap size (4 G) for
indexing, but as soon as I run a search the tenured heap usage jumps
up to 6G and remains there. Subsequent searches increases the heap
usage even more until it reaches the max (8G) - after which everything
(indexing and searching becomes slow).

The search query is a very generic one in this case which goes through
all the cores (4 of them - 800 million records), finds 400million
matches and returns 100 rows.

Does the Solr searcher holds up the reference to objects in memory? I
couldn't find any settings that would tell me it does, but every
search causing heap to go up is definitely suspicious.

2) I ran the jmap histo to get the top objects (this is on a smaller
instance with 2 G memory, this is before running search - after
running search I wasn't able to run jmap),

 num #instances #bytes  class name
--
   1:   3890855  222608992  [C
   2:   3891673  155666920  java.lang.String
   3:   3284341  131373640  org.apache.lucene.index.TermInfo
   4:   3334198  106694336  org.apache.lucene.index.Term
   5:   271   26286496  [J
   6:16   26273936  [Lorg.apache.lucene.index.Term;
   7:16   26273936  [Lorg.apache.lucene.index.TermInfo;
   8:320512   15384576
org.apache.lucene.index.FreqProxTermsWriter$PostingList
   9: 10335   11554136  [I

I'm not sure what's the first one (C)? I couldn't profile it to know
what all the Strings are being allocated by - any ideas?

Any ideas on what Searcher might be holding on and how can we change
that behavior?

Thanks,
-vivek


On Thu, May 14, 2009 at 11:33 AM, vivek sar vivex...@gmail.com wrote:
  

I don't know if field type has any impact on the memory usage - does it?

Our use cases require complete matches, thus there is no need of any
analysis in most cases - does it matter in terms of memory usage?

Also, is there any default caching used by Solr if I comment out all
the caches under query in solrconfig.xml? I also don't have any
auto-warming queries.

Thanks,
-vivek

On Wed, May 13, 2009 at 4:24 PM, Erick Erickson erickerick...@gmail.com wrote:


Warning: I'm wy out of my competency range when I comment
on SOLR, but I've seen the statement that string fields are NOT
tokenized while text fields are, and I notice that almost all of your fields
are string type.

Would someone more knowledgeable than me care to comment on whether
this is at all relevant? Offered in the spirit that sometimes there are
things
so basic that only an amateur can see them G

Best
Erick

On Wed, May 13, 2009 at 4:42 PM, vivek sar vivex...@gmail.com wrote:

  

Thanks Otis.

Our use case doesn't require any sorting or faceting. I'm wondering if
I've configured anything wrong.

I got total of 25 fields (15 are indexed and stored, other 10 are just
stored). All my fields are basic data type - which I thought are not
sorted. My id field is unique key.

Is there any field here that might be getting sorted?

 field name=id type=long indexed=true stored=true
required=true omitNorms=true compressed=false/

  field name=atmps type=integer indexed=false stored=true
compressed=false/
  field name=bcid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=cmpcd type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=ctry type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=dlt type=date indexed=false stored=true
default=NOW/HOUR  compressed=false/
  field name=dmn type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=eaddr type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=emsg type=string indexed=false stored=true
compressed=false/
  field name=erc type=string indexed=false stored=true
compressed=false/
  field name=evt type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=from type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=lfid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=lsid type=string indexed=true

Re: Search Query Questions

2009-05-14 Thread Matt Weber

I think you will want to look at the Field Collapsing patch for this.  http://issues.apache.org/jira/browse/SOLR-236 
.


Thanks,

Matt Weber
eSr Technologies
http://www.esr-technologies.com




On May 14, 2009, at 5:52 PM, Chris Miller wrote:


Oh, one more question

3) Is there a way to effectively do a GROUP BY? For example, if I  
have a document that has a photoID attached to it, is there a way to  
return a set of results that does not duplicate the photoID field?


Thanks,

Chris Miller
ServerMotion
www.servermotion.com



On May 14, 2009, at 7:46 PM, Chris Miller wrote:


I have two questions:

1) How do I search for ALL items? For example, I provide a sort  
query parameter of updated and a rows query parameter of 10 to  
limit the query results. I still have to provide a search query, of  
course. What if I want to provide a list of ALL results that match  
this? Or, in this case, the most recent 10 updated documents?


2) How do I search for all documents with a field that has data?  
For example, I have a field foo that is optional and multi- 
valued. How do I search for documents that have this field set to  
anything.


Thanks,

Chris Miller
ServerMotion
www.servermotion.com

Re: Solr memory requirements?

2009-05-14 Thread vivek sar

Thanks Mark.

I checked all the items you mentioned,

1) I've omitnorms=true for all my indexed fields (stored only fields I
guess doesn't matter)
2) I've tried commenting out all caches in the solrconfig.xml, but
that doesn't help much
3) I've tried commenting out the first and new searcher listeners
settings in the solrconfig.xml - the only way that helps is that at
startup time the memory usage doesn't spike up - that's only because
there is no auto-warmer query to run. But, I noticed commenting out
searchers slows down any other queries to Solr.
4) I don't have any sort or facet in my queries
5) I'm not sure how to change the Lucene term interval from Solr -
is there a way to do that?

I've been playing around with this memory thing the whole day and have
found that it's the search that's hogging the memory. Any time there
is a search on all the records (800 million) the heap consumption
jumps by 5G. This makes me think there has to be some configuration in
Solr that's causing some terms per document to be loaded in memory.

I've posted my settings several times on this forum, but no one has
been able to pin point what configuration might be causing this. If
someone is interested I can attach the solrconfig and schema files as
well. Here are the settings again under Query tag,

query
  maxBooleanClauses1024/maxBooleanClauses
  enableLazyFieldLoadingtrue/enableLazyFieldLoading
  queryResultWindowSize50/queryResultWindowSize
  queryResultMaxDocsCached200/queryResultMaxDocsCached
   HashDocSet maxSize=3000 loadFactor=0.75/
  useColdSearcherfalse/useColdSearcher
  maxWarmingSearchers2/maxWarmingSearchers
 /query

and schema,

 field name=id type=long indexed=true stored=true
required=true omitNorms=true compressed=false/

  field name=atmps type=integer indexed=false stored=true
compressed=false/
  field name=bcid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=cmpcd type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=ctry type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=dlt type=date indexed=false stored=true
default=NOW/HOUR  compressed=false/
  field name=dmn type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=eaddr type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=emsg type=string indexed=false stored=true
compressed=false/
  field name=erc type=string indexed=false stored=true
compressed=false/
  field name=evt type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=from type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=lfid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=lsid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=prsid type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=rc type=string indexed=false stored=true
compressed=false/
  field name=rmcd type=string indexed=false stored=true
compressed=false/
  field name=rmscd type=string indexed=false stored=true
compressed=false/
  field name=scd type=string indexed=true stored=true
omitNorms=true compressed=false/
  field name=sip type=string indexed=false stored=true
compressed=false/
  field name=ts type=date indexed=true stored=false
default=NOW/HOUR omitNorms=true/

  !-- catchall field, containing all other searchable text fields (implemented
   via copyField further on in this schema  --
  field name=all type=text_ws indexed=true stored=false
omitNorms=true multiValued=true/

Any help is greatly appreciated.

Thanks,
-vivek

On Thu, May 14, 2009 at 6:22 PM, Mark Miller markrmil...@gmail.com wrote:
 800 million docs is on the high side for modern hardware.

 If even one field has norms on, your talking almost 800 MB right there. And
 then if another Searcher is brought up well the old one is serving (which
 happens when you update)? Doubled.

 Your best bet is to distribute across a couple machines.

 To minimize you would want to turn off or down caching, don't facet, don't
 sort, turn off all norms, possibly get at the Lucene term interval and raise
 it. Drop on deck searchers setting. Even then, 800 million...time to
 distribute I'd think.

 vivek sar wrote:

 Some update on this issue,

 1) I attached jconsole to my app and monitored the memory usage.
 During indexing the memory usage goes up and down, which I think is
 normal. The memory remains around the min heap size (4 G) for
 indexing, but as soon as I run a search the tenured heap usage jumps
 up to 6G and remains there. Subsequent searches increases the heap
 usage even more until it reaches the max (8G) - after which everything
 (indexing and searching becomes slow).

 The search query is a very generic one in this case which goes through
 all the cores (4 of them - 800 million records), finds 400million
 matches and returns 100 rows.

 Does the Solr

56 matches

Mail list logo