Re: Num docs, block join, and dupes?

2015-03-10 Thread Jessica Mallet
We've seen this as well. Before we understood the cause, it seemed very
bizarre that hitting different nodes would yield different numFound, as
well as using different rows=N (since the proxying node only de-dupe the
documents that are returned in the response).

I think consistency and correctness should be clearly delineated. Of
course we'd rather have consistently correct result, but failing that, I'd
rather have consistently incorrect result rather than inconsistent results
because otherwise it's even hard to debug, as was the case here.

I think either the node hosting the shard should also do the de-duping, or
no one should. It's strange that the proxying node decides to do some
sketchy limited result set de-dupe.

On Tue, Mar 10, 2015 at 9:09 AM, Timothy Potter thelabd...@gmail.com
wrote:

 Before I open a JIRA, I wanted to put this out to solicit feedback on what
 I'm seeing and what Solr should be doing. So I've indexed the following 8
 docs into a 2-shard collection (Solr 4.8'ish - internal custom branch
 roughly based on 4.8) ... notice that the 3 grand-children of 2-1 have
 dup'd keys:

 [
   {
 id:1,
 name:parent,
 _childDocuments_:[
   {
 id:1-1,
 name:child
   },
   {
 id:1-2,
 name:child
   }
 ]
   },
   {
 id:2,
 name:parent,
 _childDocuments_:[
   {
 id:2-1,
 name:child,
 _childDocuments_:[
   {
 id:2-1-1,
 name:grandchild
   },
   {
 id:2-1-1,
 name:grandchild2
   },
   {
 id:2-1-1,
 name:grandchild3
   }
 ]
   }
 ]
   }
 ]

 When I query this collection, using:


http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*wt=jsonindent=trueshards.info=truerows=10

 I get:

 {
   responseHeader:{
 status:0,
 QTime:9,
 params:{
   indent:true,
   q:*:*,
   shards.info:true,
   wt:json,
   rows:10}},
   shards.info:{
 
http://localhost:8984/solr/blockjoin2_shard1_replica1/|http://localhost:8985/solr/blockjoin2_shard1_replica2/
:{
   numFound:3,
   maxScore:1.0,
   shardAddress:
http://localhost:8984/solr/blockjoin2_shard1_replica1;,
   time:4},
 
http://localhost:8984/solr/blockjoin2_shard2_replica1/|http://localhost:8985/solr/blockjoin2_shard2_replica2/
:{
   numFound:5,
   maxScore:1.0,
   shardAddress:
http://localhost:8985/solr/blockjoin2_shard2_replica2;,
   time:4}},
   response:{numFound:6,start:0,maxScore:1.0,docs:[
   {
 id:1-1,
 name:child},
   {
 id:1-2,
 name:child},
   {
 id:1,
 name:parent,
 _version_:1495272401329455104},
   {
 id:2-1-1,
 name:grandchild},
   {
 id:2-1,
 name:child},
   {
 id:2,
 name:parent,
 _version_:1495272401361960960}]
   }}


 So Solr has de-duped the results.

 If I execute this query against the shard that has the dupes
(distrib=false):


http://localhost:8984/solr/blockjoin2_shard2_replica1/select?q=*%3A*wt=jsonindent=trueshards.info=truerows=10distrib=false

 Then the dupes are returned:

 {
   responseHeader:{
 status:0,
 QTime:0,
 params:{
   indent:true,
   q:*:*,
   shards.info:true,
   distrib:false,
   wt:json,
   rows:10}},
   response:{numFound:5,start:0,docs:[
   {
 id:2-1-1,
 name:grandchild},
   {
 id:2-1-1,
 name:grandchild2},
   {
 id:2-1-1,
 name:grandchild3},
   {
 id:2-1,
 name:child},
   {
 id:2,
 name:parent,
 _version_:1495272401361960960}]
   }}

 So I guess my question is why doesn't the non-distrib query do
 de-duping? Mainly confirming this is how it's supposed to work and
 this behavior doesn't strike anyone else as odd ;-)

 Cheers,

 Tim


Re: Num docs, block join, and dupes?

2015-03-10 Thread Mikhail Khludnev
On Tue, Mar 10, 2015 at 7:09 PM, Timothy Potter thelabd...@gmail.com
wrote:

 So I guess my question is why doesn't the non-distrib query do
 de-duping?


Tim,
that's by design behavior. the special _root_ field is used as a delete
term when a block update is applied i.e in case of block, uniqueKey is
not used. see
https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L224
I agree that's one of the issues of the current block update
implementation, but frankly speaking, I didn't consider it as an oddity. Do
you? What do you want to achieve?

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
mkhlud...@griddynamics.com


Re: Num docs

2008-06-14 Thread Marcus Herou
Hmmm distributed BDB brrr :)

On Fri, Jun 13, 2008 at 3:21 AM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:

 Or, if you want to go with something older/more stable, go with BDB. :)


 Otis --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Marcus Herou [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Thursday, June 12, 2008 3:17:52 PM
  Subject: Re: Num docs
 
  Cacti, Nagios you name it already in use :)
 
  Well I'm the CTO so the one really really interested in estimating perf.
 
  The id's come from a db initially and is later used for retrieval from a
  distributed on disk caching system which I have written.
  I'm in the process of moving from MySQL to HBase or Hypertable.
 
  /M
 
  On Tue, Jun 10, 2008 at 10:03 PM, Otis Gospodnetic 
  [EMAIL PROTECTED] wrote:
 
   Marcus,
  
   It sounds like you may just want to use a good server monitoring
 package
   that collects server data and prints out pretty charts.  Then you can
 show
   them to your IT/budget people when the charts start showing increased
 query
   latency times, very little available RAM, swapping, high CPU usage and
 such.
Nagios, Ganglia, any of those things will do.
  
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
   - Original Message 
From: Marcus Herou
To: solr-user@lucene.apache.org
Sent: Tuesday, June 10, 2008 3:29:40 PM
Subject: Re: Num docs
   
Well guys you are right... Still I want to have a clue about how much
   each
machine stores to predict when we need more machines (measure
 performance
degradation per new document). But it's harder to collect that kind
 of
   data.
It sure is doable no doubt and is a normal sharding algo for MySQL.
   
The best approach I think is to have some bg threads run X number of
   queries
and collect the response times, throw away the n lowest/highest
 response
times and calc an avg time which is used for in sharding and query
   lb'ing.
   
Little off topic but interesting
What would you guys say about a good correlation between the index
 size
   on
disk (no stored text content) and available RAM and having good
 response
times.
   
How long is a rope would you perhaps say...but I think some rule of
 thumb
could be established...
   
One of the schemas of concern
   
   
required=true /
   
required=true /
   
required=false /
   
stored=false required=true /
   
required=true /
   
required=true /
   
required=false /
   
required=true /
   
required=true /
   
required=false /
   
required=false multiValued=true/
   
required=false /
   
required=false /
   
required=false /
   
required=false /
   
   
and a normal solr query (taken from the log):
/select
   
  
 
 start=0q=(title:(apple)^4+OR+description:(apple))version=2.2rows=15wt=xmlsort=publishDate+desc
   
   
//Marcus
   
   
   
   
   
On Tue, Jun 10, 2008 at 1:15 AM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:
   
 Exactly.  I think I mentioned this once before several months ago.
  One
   can
 take various hardware specs (# cores, CPU speed, FSB, RAM, etc.),
 performance numbers, etc. and come up with a number for each
 server's
 overall capacity.


 As a matter of fact, I think this would be useful to have right in
   Solr,
 primarily for use when allocating and sizing shards for Distributed
   Search.
  JIRA enhancement/feature issue?
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Alexander Ramos Jardim
  To: solr-user@lucene.apache.org
  Sent: Monday, June 9, 2008 6:42:17 PM
  Subject: Re: Num docs
 
  I even think that such a decision should be based on the overall
   machine
  performance at a given time, and not the index size. Unless you
 are
 talking
  solely about HD space and not having any performance issues.
 
  2008/6/7 Otis Gospodnetic :
 
   Marcus,
  
  
   For that you can rely on du, vmstat, iostat, top and such, too.
 :)
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
   - Original Message 
From: Marcus Herou
To: solr-user@lucene.apache.org
Sent: Saturday, June 7, 2008 12:33:10 PM
Subject: Re: Num docs
   
Thanks, I wanna ask the indices how much more each shard can
   handle
   before
they're considered full and scream for a budget to get a
 new
 machine :)
   
/M
   
On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic
wrote:
   
 Marcus, check out the Luke request handler.  You can get it
   from
 its
 output.  It may also be possible to get *just* that number,
 but
   I'm

Re: Num docs

2008-06-10 Thread Marcus Herou
Well guys you are right... Still I want to have a clue about how much each
machine stores to predict when we need more machines (measure performance
degradation per new document). But it's harder to collect that kind of data.
It sure is doable no doubt and is a normal sharding algo for MySQL.

The best approach I think is to have some bg threads run X number of queries
and collect the response times, throw away the n lowest/highest response
times and calc an avg time which is used for in sharding and query lb'ing.

Little off topic but interesting
What would you guys say about a good correlation between the index size on
disk (no stored text content) and available RAM and having good response
times.

How long is a rope would you perhaps say...but I think some rule of thumb
could be established...

One of the schemas of concern
fields
field name=feedId type=integer indexed=true stored=false
required=true /
field name=feedItemId type=long indexed=true stored=true
required=true /
field name=siteId type=integer indexed=true stored=true
required=false /
field name=partnerType type=integer indexed=true
stored=false required=true /
field name=uid type=string indexed=true stored=false
required=true /
field name=link type=string indexed=true stored=false
required=true /
field name=description type=text indexed=true stored=false
required=false /
field name=title type=text indexed=true stored=false
required=true /
field name=publishDate type=date indexed=true stored=false
required=true /
field name=author type=string indexed=true stored=false
required=false /
field name=keyWordId type=integer indexed=true stored=false
required=false multiValued=true/
field name=category type=integer indexed=true stored=false
required=false /
field name=language type=integer indexed=true stored=false
required=false /
field name=country type=integer indexed=true stored=false
required=false /
field name=ngramLang type=integer indexed=true stored=false
required=false /
/fields

and a normal solr query (taken from the log):
/select
start=0q=(title:(apple)^4+OR+description:(apple))version=2.2rows=15wt=xmlsort=publishDate+desc


//Marcus





On Tue, Jun 10, 2008 at 1:15 AM, Otis Gospodnetic 
[EMAIL PROTECTED] wrote:

 Exactly.  I think I mentioned this once before several months ago.  One can
 take various hardware specs (# cores, CPU speed, FSB, RAM, etc.),
 performance numbers, etc. and come up with a number for each server's
 overall capacity.


 As a matter of fact, I think this would be useful to have right in Solr,
 primarily for use when allocating and sizing shards for Distributed Search.
  JIRA enhancement/feature issue?
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Alexander Ramos Jardim [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Monday, June 9, 2008 6:42:17 PM
  Subject: Re: Num docs
 
  I even think that such a decision should be based on the overall machine
  performance at a given time, and not the index size. Unless you are
 talking
  solely about HD space and not having any performance issues.
 
  2008/6/7 Otis Gospodnetic :
 
   Marcus,
  
  
   For that you can rely on du, vmstat, iostat, top and such, too. :)
  
   Otis
   --
   Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
  
  
   - Original Message 
From: Marcus Herou
To: solr-user@lucene.apache.org
Sent: Saturday, June 7, 2008 12:33:10 PM
Subject: Re: Num docs
   
Thanks, I wanna ask the indices how much more each shard can handle
   before
they're considered full and scream for a budget to get a new
 machine :)
   
/M
   
On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic
wrote:
   
 Marcus, check out the Luke request handler.  You can get it from
 its
 output.  It may also be possible to get *just* that number, but I'm
 not
 looking at docs/code right now to know for sure.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Marcus Herou
  To: solr-user@lucene.apache.org
  Sent: Saturday, June 7, 2008 5:09:20 AM
  Subject: Num docs
 
  Hi.
 
  Is there a way of retrieve IndexWriter.numDocs() in SOLR ?
 
  Kindly
 
  //Marcus
 
  --
  Marcus Herou CTO and co-founder Tailsweep AB
  +46702561312
  [EMAIL PROTECTED]
  http://www.tailsweep.com/
  http://blogg.tailsweep.com/


   
   
--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/
  
  
 
 
  --
  Alexander Ramos Jardim




-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/


Re: Num docs

2008-06-10 Thread Alexander Ramos Jardim
Marcus,

2008/6/10 Marcus Herou [EMAIL PROTECTED]:

 Well guys you are right... Still I want to have a clue about how much each
 machine stores to predict when we need more machines (measure performance
 degradation per new document). But it's harder to collect that kind of
 data.
 It sure is doable no doubt and is a normal sharding algo for MySQL.

Sorry, but I think performance degradation per new document isn't a good
metric, for not saying a false one.
You measure the cost in processing, memory and io writing/reading speed that
Solr is developing and I can't see a way to get these informations based on
your document quantity.
Just figure that the same index with different usage policies and overall
architecture can have a drastic or the system performance.


 The best approach I think is to have some bg threads run X number of
 queries
 and collect the response times, throw away the n lowest/highest response
 times and calc an avg time which is used for in sharding and query lb'ing.

Sorry? Didn't get the point...


 Little off topic but interesting
 What would you guys say about a good correlation between the index size on
 disk (no stored text content) and available RAM and having good response
 times.

I would need to benchmark a little more to answer you.


 How long is a rope would you perhaps say...but I think some rule of thumb
 could be established...

We need to establish good metrics for establishing good rules.


 One of the schemas of concern
 fields
field name=feedId type=integer indexed=true stored=false
 required=true /
field name=feedItemId type=long indexed=true stored=true
 required=true /
field name=siteId type=integer indexed=true stored=true
 required=false /
field name=partnerType type=integer indexed=true
 stored=false required=true /
field name=uid type=string indexed=true stored=false
 required=true /
field name=link type=string indexed=true stored=false
 required=true /
field name=description type=text indexed=true stored=false
 required=false /
field name=title type=text indexed=true stored=false
 required=true /
field name=publishDate type=date indexed=true stored=false
 required=true /
field name=author type=string indexed=true stored=false
 required=false /
field name=keyWordId type=integer indexed=true stored=false
 required=false multiValued=true/
field name=category type=integer indexed=true stored=false
 required=false /
field name=language type=integer indexed=true stored=false
 required=false /
field name=country type=integer indexed=true stored=false
 required=false /
field name=ngramLang type=integer indexed=true stored=false
 required=false /
 /fields

Let me ask you something: from where do you take all these id's? database?
what about it's access times?


 and a normal solr query (taken from the log):
 /select

 start=0q=(title:(apple)^4+OR+description:(apple))version=2.2rows=15wt=xmlsort=publishDate+desc


 //Marcus





 On Tue, Jun 10, 2008 at 1:15 AM, Otis Gospodnetic 
 [EMAIL PROTECTED] wrote:

  Exactly.  I think I mentioned this once before several months ago.  One
 can
  take various hardware specs (# cores, CPU speed, FSB, RAM, etc.),
  performance numbers, etc. and come up with a number for each server's
  overall capacity.
 
 
  As a matter of fact, I think this would be useful to have right in Solr,
  primarily for use when allocating and sizing shards for Distributed
 Search.
   JIRA enhancement/feature issue?
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
  - Original Message 
   From: Alexander Ramos Jardim [EMAIL PROTECTED]
   To: solr-user@lucene.apache.org
   Sent: Monday, June 9, 2008 6:42:17 PM
   Subject: Re: Num docs
  
   I even think that such a decision should be based on the overall
 machine
   performance at a given time, and not the index size. Unless you are
  talking
   solely about HD space and not having any performance issues.
  
   2008/6/7 Otis Gospodnetic :
  
Marcus,
   
   
For that you can rely on du, vmstat, iostat, top and such, too. :)
   
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
   
   
- Original Message 
 From: Marcus Herou
 To: solr-user@lucene.apache.org
 Sent: Saturday, June 7, 2008 12:33:10 PM
 Subject: Re: Num docs

 Thanks, I wanna ask the indices how much more each shard can handle
before
 they're considered full and scream for a budget to get a new
  machine :)

 /M

 On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic
 wrote:

  Marcus, check out the Luke request handler.  You can get it from
  its
  output.  It may also be possible to get *just* that number, but
 I'm
  not
  looking at docs/code right now to know for sure.
 
   Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Re: Num docs

2008-06-10 Thread Otis Gospodnetic
Marcus,

It sounds like you may just want to use a good server monitoring package that 
collects server data and prints out pretty charts.  Then you can show them to 
your IT/budget people when the charts start showing increased query latency 
times, very little available RAM, swapping, high CPU usage and such.  Nagios, 
Ganglia, any of those things will do.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Marcus Herou [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Tuesday, June 10, 2008 3:29:40 PM
 Subject: Re: Num docs
 
 Well guys you are right... Still I want to have a clue about how much each
 machine stores to predict when we need more machines (measure performance
 degradation per new document). But it's harder to collect that kind of data.
 It sure is doable no doubt and is a normal sharding algo for MySQL.
 
 The best approach I think is to have some bg threads run X number of queries
 and collect the response times, throw away the n lowest/highest response
 times and calc an avg time which is used for in sharding and query lb'ing.
 
 Little off topic but interesting
 What would you guys say about a good correlation between the index size on
 disk (no stored text content) and available RAM and having good response
 times.
 
 How long is a rope would you perhaps say...but I think some rule of thumb
 could be established...
 
 One of the schemas of concern
 
 
 required=true /
 
 required=true /
 
 required=false /
 
 stored=false required=true /
 
 required=true /
 
 required=true /
 
 required=false /
 
 required=true /
 
 required=true /
 
 required=false /
 
 required=false multiValued=true/
 
 required=false /
 
 required=false /
 
 required=false /
 
 required=false /
 
 
 and a normal solr query (taken from the log):
 /select
 start=0q=(title:(apple)^4+OR+description:(apple))version=2.2rows=15wt=xmlsort=publishDate+desc
 
 
 //Marcus
 
 
 
 
 
 On Tue, Jun 10, 2008 at 1:15 AM, Otis Gospodnetic 
 [EMAIL PROTECTED] wrote:
 
  Exactly.  I think I mentioned this once before several months ago.  One can
  take various hardware specs (# cores, CPU speed, FSB, RAM, etc.),
  performance numbers, etc. and come up with a number for each server's
  overall capacity.
 
 
  As a matter of fact, I think this would be useful to have right in Solr,
  primarily for use when allocating and sizing shards for Distributed Search.
   JIRA enhancement/feature issue?
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
  - Original Message 
   From: Alexander Ramos Jardim 
   To: solr-user@lucene.apache.org
   Sent: Monday, June 9, 2008 6:42:17 PM
   Subject: Re: Num docs
  
   I even think that such a decision should be based on the overall machine
   performance at a given time, and not the index size. Unless you are
  talking
   solely about HD space and not having any performance issues.
  
   2008/6/7 Otis Gospodnetic :
  
Marcus,
   
   
For that you can rely on du, vmstat, iostat, top and such, too. :)
   
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
   
   
- Original Message 
 From: Marcus Herou
 To: solr-user@lucene.apache.org
 Sent: Saturday, June 7, 2008 12:33:10 PM
 Subject: Re: Num docs

 Thanks, I wanna ask the indices how much more each shard can handle
before
 they're considered full and scream for a budget to get a new
  machine :)

 /M

 On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic
 wrote:

  Marcus, check out the Luke request handler.  You can get it from
  its
  output.  It may also be possible to get *just* that number, but I'm
  not
  looking at docs/code right now to know for sure.
 
   Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
  - Original Message 
   From: Marcus Herou
   To: solr-user@lucene.apache.org
   Sent: Saturday, June 7, 2008 5:09:20 AM
   Subject: Num docs
  
   Hi.
  
   Is there a way of retrieve IndexWriter.numDocs() in SOLR ?
  
   Kindly
  
   //Marcus
  
   --
   Marcus Herou CTO and co-founder Tailsweep AB
   +46702561312
   [EMAIL PROTECTED]
   http://www.tailsweep.com/
   http://blogg.tailsweep.com/
 
 


 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 [EMAIL PROTECTED]
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/
   
   
  
  
   --
   Alexander Ramos Jardim
 
 
 
 
 -- 
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 [EMAIL PROTECTED]
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/



Re: Num docs

2008-06-09 Thread Otis Gospodnetic
Exactly.  I think I mentioned this once before several months ago.  One can 
take various hardware specs (# cores, CPU speed, FSB, RAM, etc.), performance 
numbers, etc. and come up with a number for each server's overall capacity.

 
As a matter of fact, I think this would be useful to have right in Solr, 
primarily for use when allocating and sizing shards for Distributed Search.  
JIRA enhancement/feature issue?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Alexander Ramos Jardim [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Monday, June 9, 2008 6:42:17 PM
 Subject: Re: Num docs
 
 I even think that such a decision should be based on the overall machine
 performance at a given time, and not the index size. Unless you are talking
 solely about HD space and not having any performance issues.
 
 2008/6/7 Otis Gospodnetic :
 
  Marcus,
 
 
  For that you can rely on du, vmstat, iostat, top and such, too. :)
 
  Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
  - Original Message 
   From: Marcus Herou 
   To: solr-user@lucene.apache.org
   Sent: Saturday, June 7, 2008 12:33:10 PM
   Subject: Re: Num docs
  
   Thanks, I wanna ask the indices how much more each shard can handle
  before
   they're considered full and scream for a budget to get a new machine :)
  
   /M
  
   On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic
   wrote:
  
Marcus, check out the Luke request handler.  You can get it from its
output.  It may also be possible to get *just* that number, but I'm not
looking at docs/code right now to know for sure.
   
 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
   
   
- Original Message 
 From: Marcus Herou
 To: solr-user@lucene.apache.org
 Sent: Saturday, June 7, 2008 5:09:20 AM
 Subject: Num docs

 Hi.

 Is there a way of retrieve IndexWriter.numDocs() in SOLR ?

 Kindly

 //Marcus

 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 [EMAIL PROTECTED]
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/
   
   
  
  
   --
   Marcus Herou CTO and co-founder Tailsweep AB
   +46702561312
   [EMAIL PROTECTED]
   http://www.tailsweep.com/
   http://blogg.tailsweep.com/
 
 
 
 
 -- 
 Alexander Ramos Jardim



Re: Num docs

2008-06-07 Thread Otis Gospodnetic
Marcus, check out the Luke request handler.  You can get it from its output.  
It may also be possible to get *just* that number, but I'm not looking at 
docs/code right now to know for sure.

 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Marcus Herou [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Saturday, June 7, 2008 5:09:20 AM
 Subject: Num docs
 
 Hi.
 
 Is there a way of retrieve IndexWriter.numDocs() in SOLR ?
 
 Kindly
 
 //Marcus
 
 -- 
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 [EMAIL PROTECTED]
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/



Re: Num docs

2008-06-07 Thread Marcus Herou
Thanks, I wanna ask the indices how much more each shard can handle before
they're considered full and scream for a budget to get a new machine :)

/M

On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic [EMAIL PROTECTED]
wrote:

 Marcus, check out the Luke request handler.  You can get it from its
 output.  It may also be possible to get *just* that number, but I'm not
 looking at docs/code right now to know for sure.

  Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


 - Original Message 
  From: Marcus Herou [EMAIL PROTECTED]
  To: solr-user@lucene.apache.org
  Sent: Saturday, June 7, 2008 5:09:20 AM
  Subject: Num docs
 
  Hi.
 
  Is there a way of retrieve IndexWriter.numDocs() in SOLR ?
 
  Kindly
 
  //Marcus
 
  --
  Marcus Herou CTO and co-founder Tailsweep AB
  +46702561312
  [EMAIL PROTECTED]
  http://www.tailsweep.com/
  http://blogg.tailsweep.com/




-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/


Re: Num docs

2008-06-07 Thread Otis Gospodnetic
Marcus,


For that you can rely on du, vmstat, iostat, top and such, too. :)

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Marcus Herou [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Saturday, June 7, 2008 12:33:10 PM
 Subject: Re: Num docs
 
 Thanks, I wanna ask the indices how much more each shard can handle before
 they're considered full and scream for a budget to get a new machine :)
 
 /M
 
 On Sat, Jun 7, 2008 at 3:07 PM, Otis Gospodnetic 
 wrote:
 
  Marcus, check out the Luke request handler.  You can get it from its
  output.  It may also be possible to get *just* that number, but I'm not
  looking at docs/code right now to know for sure.
 
   Otis
  --
  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
  - Original Message 
   From: Marcus Herou 
   To: solr-user@lucene.apache.org
   Sent: Saturday, June 7, 2008 5:09:20 AM
   Subject: Num docs
  
   Hi.
  
   Is there a way of retrieve IndexWriter.numDocs() in SOLR ?
  
   Kindly
  
   //Marcus
  
   --
   Marcus Herou CTO and co-founder Tailsweep AB
   +46702561312
   [EMAIL PROTECTED]
   http://www.tailsweep.com/
   http://blogg.tailsweep.com/
 
 
 
 
 -- 
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 [EMAIL PROTECTED]
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/



RE: Num docs

2008-06-07 Thread Lance Norskog
This appears in the stats.jsp page. Both the total of document 'slots' and
the number of live documents.

-Original Message-
From: Marcus Herou [mailto:[EMAIL PROTECTED] 
Sent: Saturday, June 07, 2008 2:09 AM
To: solr-user@lucene.apache.org
Subject: Num docs

Hi.

Is there a way of retrieve IndexWriter.numDocs() in SOLR ?

Kindly

//Marcus

--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[EMAIL PROTECTED]
http://www.tailsweep.com/
http://blogg.tailsweep.com/