Re: Core size - distinguish between merge and deletes

2017-11-09 Thread Erick Erickson
Please don't do that ;) Unless you're willing to do it frequently. See:

https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

expungeDeletes is really a variety of optimize, so the issues outlined
in that blog apply.

Best,
Erick

On Thu, Nov 9, 2017 at 12:24 PM, Shashank Pedamallu
 wrote:
> Thanks for the response Erick. I’m deleting the documents with expungeDeletes 
> option set as true. So, that does trigger a merge to throw away the deleted 
> documents.
>
> On 11/9/17, 12:17 PM, "Erick Erickson"  wrote:
>
> bq: Is there a way to distinguish between when size is being reduced
> because of a delete from that of during a lucene merge.
>
> Not sure what you're really looking for here. Size on disk is _never_
> reduced by a delete operation, the document is only 'marked as
> deleted'. Only when segments are merged are the resources reclaimed,
> i.e. the index gets smaller. You can figure out what % of your index
> consists of deleted documents by the delta between numDocs and
> maxDocs, available on the admin UI and from the Luke handler (and
> maybe JMX).
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 12:06 PM, Shashank Pedamallu
>  wrote:
> > Hi,
> >
> > I wanted to get accurate metrics regarding to the amount of data being 
> indexed in Solr. In this regard, I observe that sometimes, this number 
> decreases due to lucene merges. But I’m also deleting data at times. Is there 
> a way to distinguish between when size is being reduced because of a delete 
> from that of during a lucene merge.
> >
> > Thanks,
> > Shashank
>
>


Re: Core size - distinguish between merge and deletes

2017-11-09 Thread Shashank Pedamallu
Thanks for the response Erick. I’m deleting the documents with expungeDeletes 
option set as true. So, that does trigger a merge to throw away the deleted 
documents.

On 11/9/17, 12:17 PM, "Erick Erickson"  wrote:

bq: Is there a way to distinguish between when size is being reduced
because of a delete from that of during a lucene merge.

Not sure what you're really looking for here. Size on disk is _never_
reduced by a delete operation, the document is only 'marked as
deleted'. Only when segments are merged are the resources reclaimed,
i.e. the index gets smaller. You can figure out what % of your index
consists of deleted documents by the delta between numDocs and
maxDocs, available on the admin UI and from the Luke handler (and
maybe JMX).

Best,
Erick

On Thu, Nov 9, 2017 at 12:06 PM, Shashank Pedamallu
 wrote:
> Hi,
>
> I wanted to get accurate metrics regarding to the amount of data being 
indexed in Solr. In this regard, I observe that sometimes, this number 
decreases due to lucene merges. But I’m also deleting data at times. Is there a 
way to distinguish between when size is being reduced because of a delete from 
that of during a lucene merge.
>
> Thanks,
> Shashank




Re: Core size - distinguish between merge and deletes

2017-11-09 Thread Erick Erickson
bq: Is there a way to distinguish between when size is being reduced
because of a delete from that of during a lucene merge.

Not sure what you're really looking for here. Size on disk is _never_
reduced by a delete operation, the document is only 'marked as
deleted'. Only when segments are merged are the resources reclaimed,
i.e. the index gets smaller. You can figure out what % of your index
consists of deleted documents by the delta between numDocs and
maxDocs, available on the admin UI and from the Luke handler (and
maybe JMX).

Best,
Erick

On Thu, Nov 9, 2017 at 12:06 PM, Shashank Pedamallu
 wrote:
> Hi,
>
> I wanted to get accurate metrics regarding to the amount of data being 
> indexed in Solr. In this regard, I observe that sometimes, this number 
> decreases due to lucene merges. But I’m also deleting data at times. Is there 
> a way to distinguish between when size is being reduced because of a delete 
> from that of during a lucene merge.
>
> Thanks,
> Shashank


Core size - distinguish between merge and deletes

2017-11-09 Thread Shashank Pedamallu
Hi,

I wanted to get accurate metrics regarding to the amount of data being indexed 
in Solr. In this regard, I observe that sometimes, this number decreases due to 
lucene merges. But I’m also deleting data at times. Is there a way to 
distinguish between when size is being reduced because of a delete from that of 
during a lucene merge.

Thanks,
Shashank


Re: optimum solr core size

2012-08-30 Thread Otis Gospodnetic
Jame,

That really depends on your hardware, query load and profile, search latency 
and other performance requirements.

That said, a 123 GB core is a pretty big core :)

Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm 



- Original Message -
 From: jame vaalet jamevaa...@gmail.com
 To: solr-user@lucene.apache.org
 Cc: 
 Sent: Thursday, August 30, 2012 2:40 AM
 Subject: optimum solr core size
 
 Hi,
 I have got singel core solr deployment in production and the documents are
 getting added daily (around 1 million entries per week). My core size has
 approached 123 GB, i would like to know what would be the optimum size of a
 single core (not the number of docs but the size of the index file) to
 start a new core.
 
 -- 
 
 -JAME



Re: optimum solr core size

2012-08-30 Thread pravesh
How many documents are there in the index? How many stored/indexed fields?
There is no magic number as yet for defining the size of a single
core(whether no. of docs OR the size of index), but 123GB seems to be on a
higher side, so, you could definitely go for sharding of indexes.

BTW, how are your searches/indexing performing over the time? Are there any
impact?

Regards
Pravesh



--
View this message in context: 
http://lucene.472066.n3.nabble.com/optimum-solr-core-size-tp4004251p4004424.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr core size on disk

2009-12-17 Thread Matthieu Labour
Paul
Thank you for your reply
I did du -sh in /solr_env/index/data
and it shows
36G
It is distributed among 700 cores with most of them being 150M
Is that a big index that should be sharded ?



2009/12/17 Noble Paul നോബിള്‍ नोब्ळ् noble.p...@corp.aol.com

 look at the index dir and see the size of the files . it is typically
 in $SOLR_HOME/data/index

 On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour matth...@kikin.com
 wrote:
  Hi
  I am new to solr. Here is my question:
  How to find out the size of a solr core on disk ?
  Thank you
  matt
 



 --
 -
 Noble Paul | Systems Architect| AOL | http://aol.com



Re: solr core size on disk

2009-12-17 Thread Erik Hatcher
Sharding isn't necessarily decided upon by index size.  Is your search  
performance ok?  Got enough free disk space to optimize?  Then don't  
shard.


But no, 150M is not a large index size.

700 cores, now that's a lot!

Erik

On Dec 17, 2009, at 1:27 PM, Matthieu Labour wrote:


Paul
Thank you for your reply
I did du -sh in /solr_env/index/data
and it shows
36G
It is distributed among 700 cores with most of them being 150M
Is that a big index that should be sharded ?



2009/12/17 Noble Paul നോബിള്‍ नोब्ळ्  
noble.p...@corp.aol.com



look at the index dir and see the size of the files . it is typically
in $SOLR_HOME/data/index

On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour matth...@kikin.com
wrote:

Hi
I am new to solr. Here is my question:
How to find out the size of a solr core on disk ?
Thank you
matt





--
-
Noble Paul | Systems Architect| AOL | http://aol.com





solr core size on disk

2009-12-16 Thread Matthieu Labour
Hi
I am new to solr. Here is my question:
How to find out the size of a solr core on disk ?
Thank you
matt


Re: solr core size on disk

2009-12-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
look at the index dir and see the size of the files . it is typically
in $SOLR_HOME/data/index

On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour matth...@kikin.com wrote:
 Hi
 I am new to solr. Here is my question:
 How to find out the size of a solr core on disk ?
 Thank you
 matt




-- 
-
Noble Paul | Systems Architect| AOL | http://aol.com


Re: core size

2009-11-17 Thread Lance Norskog
Been there done that.

Indexing into the smaller cores will be faster.
You will be able to spread the load across multiple machines.

There are other advantages:
You will not have a 1/2Terabyte set of files to worry about.
You will not need 1.1T in one partition to run an optimize.
You will not need 12+ hours to run an optimize.
It will not take 1/2 hour to copy the newly optimized index to a query server.

On Mon, Nov 16, 2009 at 7:14 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
 If an index fits in memory, I am guessing you'll see the speed change roughly 
 proportionally to the size of the index.  If an index does not fit into 
 memory (i.e. disk head has to run around the disk to look for info), then the 
 improvement will be even greater.  I haven't explicitly tested this and am 
 hoping somebody will correct me if this is wrong.

 Otis
 --
 Sematext is hiring -- http://sematext.com/about/jobs.html?mls
 Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



 - Original Message 
 From: Phil Hagelberg p...@hagelb.org
 To: solr-user@lucene.apache.org
 Sent: Mon, November 16, 2009 8:42:49 PM
 Subject: core size


 I'm are planning out a system with large indexes and wondering what kind
 of performance boost I'd see if I split out documents into many cores
 rather than using a single core and splitting by a field. I've got about
 500GB worth of indexes ranging from 100MB to 50GB each.

 I'm assuming if we split them out to multiple cores we would see the
 most dramatic benefit in searches on the smaller cores, but I'm just
 wondering what level of speedup I should expect. Eventually the cores
 will be split up anyway, I'm just trying to determine how to prioritize
 it.

 thanks,
 Phil





-- 
Lance Norskog
goks...@gmail.com


core size

2009-11-16 Thread Phil Hagelberg

I'm are planning out a system with large indexes and wondering what kind
of performance boost I'd see if I split out documents into many cores
rather than using a single core and splitting by a field. I've got about
500GB worth of indexes ranging from 100MB to 50GB each.

I'm assuming if we split them out to multiple cores we would see the
most dramatic benefit in searches on the smaller cores, but I'm just
wondering what level of speedup I should expect. Eventually the cores
will be split up anyway, I'm just trying to determine how to prioritize
it.

thanks,
Phil


Re: core size

2009-11-16 Thread Otis Gospodnetic
If an index fits in memory, I am guessing you'll see the speed change roughly 
proportionally to the size of the index.  If an index does not fit into memory 
(i.e. disk head has to run around the disk to look for info), then the 
improvement will be even greater.  I haven't explicitly tested this and am 
hoping somebody will correct me if this is wrong.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 
 From: Phil Hagelberg p...@hagelb.org
 To: solr-user@lucene.apache.org
 Sent: Mon, November 16, 2009 8:42:49 PM
 Subject: core size
 
 
 I'm are planning out a system with large indexes and wondering what kind
 of performance boost I'd see if I split out documents into many cores
 rather than using a single core and splitting by a field. I've got about
 500GB worth of indexes ranging from 100MB to 50GB each.
 
 I'm assuming if we split them out to multiple cores we would see the
 most dramatic benefit in searches on the smaller cores, but I'm just
 wondering what level of speedup I should expect. Eventually the cores
 will be split up anyway, I'm just trying to determine how to prioritize
 it.
 
 thanks,
 Phil



Re: Solr Core Size limit

2008-11-12 Thread Norberto Meijome
On Tue, 11 Nov 2008 10:25:07 -0800 (PST)
Otis Gospodnetic [EMAIL PROTECTED] wrote:

 Doc ID gaps are zapped during segment merges and index optimization.
 

thanks Otis :)
b
_
{Beto|Norberto|Numard} Meijome

I didn't attend the funeral, but I sent a nice letter saying  I approved of 
it.
  Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Solr Core Size limit

2008-11-12 Thread Norberto Meijome
On Tue, 11 Nov 2008 20:39:32 -0800 (PST)
Otis Gospodnetic [EMAIL PROTECTED] wrote:

 With Distributed Search you are limited to # of shards * Integer.MAX_VALUE.

yeah, makes sense. And i would suspect since this is PER INDEX , it applies to 
each core only ( so you could have n cores in m shards for n * m * 
integer.MAX_VALUE docs).


_
{Beto|Norberto|Numard} Meijome

The more I see the less I know for sure. 
  John Lennon

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Solr Core Size limit

2008-11-11 Thread Otis Gospodnetic
Doc ID gaps are zapped during segment merges and index optimization.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Norberto Meijome [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, November 10, 2008 6:45:01 PM
Subject: Re: Solr Core Size limit

On Mon, 10 Nov 2008 10:24:47 -0800 (PST)
Otis Gospodnetic [EMAIL PROTECTED] wrote:

 I don't think there is a limit other than your hardware and the internal Doc
 ID which limits you to 2B docs on 32-bit machines.

Hi Otis,
just curious is this internal doc ID reused when an optimise happens? or 
gaps left and re-filled when 2B is reached ? 

cheers,
b

_
{Beto|Norberto|Numard} Meijome

Whenever you find that you are on the side of the majority, it is time to 
reform.
   Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.


Re: Solr Core Size limit

2008-11-11 Thread Matthew Runo
What happens when we use another uniqueKey in this case? I was under  
the assumption that if we say uniqueKeystyleId/uniqueKey then our  
doc IDs will be our styleIds.


Is there a secondary ID that's kept internal to Solr/Lucene in this  
case?


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Nov 11, 2008, at 10:25 AM, Otis Gospodnetic wrote:


Doc ID gaps are zapped during segment merges and index optimization.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Norberto Meijome [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, November 10, 2008 6:45:01 PM
Subject: Re: Solr Core Size limit

On Mon, 10 Nov 2008 10:24:47 -0800 (PST)
Otis Gospodnetic [EMAIL PROTECTED] wrote:

I don't think there is a limit other than your hardware and the  
internal Doc

ID which limits you to 2B docs on 32-bit machines.


Hi Otis,
just curious is this internal doc ID reused when an optimise  
happens? or gaps left and re-filled when 2B is reached ?


cheers,
b

_
{Beto|Norberto|Numard} Meijome

Whenever you find that you are on the side of the majority, it is  
time to reform.

  Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery  
when wet. Reading disclaimers makes you go blind. Writing them is  
worse. You have been Warned.




Re: Solr Core Size limit

2008-11-11 Thread Yonik Seeley
On Tue, Nov 11, 2008 at 6:59 PM, Matthew Runo [EMAIL PROTECTED] wrote:
 What happens when we use another uniqueKey in this case? I was under the
 assumption that if we say uniqueKeystyleId/uniqueKey then our doc IDs
 will be our styleIds.

 Is there a secondary ID that's kept internal to Solr/Lucene in this case?

There is an internal document id in Lucene indexes - an integer
starting at zero.  Since it's transient (can change with segment
mergers and is thus only really valid for the duration of an
IndexReader), Solr tends to hide those.  The uniqueKey has no bearing
on what the internal Lucene docid is.

-Yonik


Re: Solr Core Size limit

2008-11-11 Thread Ryan McKinley


On Nov 11, 2008, at 8:03 PM, Yonik Seeley wrote:

On Tue, Nov 11, 2008 at 6:59 PM, Matthew Runo [EMAIL PROTECTED]  
wrote:
What happens when we use another uniqueKey in this case? I was  
under the
assumption that if we say uniqueKeystyleId/uniqueKey then our  
doc IDs

will be our styleIds.

Is there a secondary ID that's kept internal to Solr/Lucene in this  
case?


There is an internal document id in Lucene indexes - an integer
starting at zero.  Since it's transient (can change with segment
mergers and is thus only really valid for the duration of an
IndexReader), Solr tends to hide those.  The uniqueKey has no bearing
on what the internal Lucene docid is.

-Yonik



For the record...  with replication, solr is not limited to  
Integer.MAX_VALUE documents.  (Although each shard is limited to  
Integer.MAX_VALUE docs)


ryan



Re: Solr Core Size limit

2008-11-11 Thread Otis Gospodnetic
Right.  With Distributed Search you are limited to # of shards * 
Integer.MAX_VALUE.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: Ryan McKinley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Tuesday, November 11, 2008 9:38:34 PM
Subject: Re: Solr Core Size limit


On Nov 11, 2008, at 8:03 PM, Yonik Seeley wrote:

 On Tue, Nov 11, 2008 at 6:59 PM, Matthew Runo [EMAIL PROTECTED] wrote:
 What happens when we use another uniqueKey in this case? I was under the
 assumption that if we say uniqueKeystyleId/uniqueKey then our doc IDs
 will be our styleIds.
 
 Is there a secondary ID that's kept internal to Solr/Lucene in this case?
 
 There is an internal document id in Lucene indexes - an integer
 starting at zero.  Since it's transient (can change with segment
 mergers and is thus only really valid for the duration of an
 IndexReader), Solr tends to hide those.  The uniqueKey has no bearing
 on what the internal Lucene docid is.
 
 -Yonik


For the record...  with replication, solr is not limited to Integer.MAX_VALUE 
documents.  (Although each shard is limited to Integer.MAX_VALUE docs)

ryan

Solr Core Size limit

2008-11-10 Thread RaghavPrabhu

Hi,

 Im using Solr multicore functionality in my app. I want to know the size
limit of holding the index files in each core.How can i identify the maximum
size limit of the cores.


Thanks in advance
Prabhu.K
-- 
View this message in context: 
http://www.nabble.com/Solr-Core-Size-limit-tp20416899p20416899.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Core Size limit

2008-11-10 Thread Otis Gospodnetic
Hi,

I don't think there is a limit other than your hardware and the internal Doc ID 
which limits you to 2B docs on 32-bit machines.

 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch





From: RaghavPrabhu [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Monday, November 10, 2008 5:43:17 AM
Subject: Solr Core Size limit


Hi,

Im using Solr multicore functionality in my app. I want to know the size
limit of holding the index files in each core.How can i identify the maximum
size limit of the cores.


Thanks in advance
Prabhu.K
-- 
View this message in context: 
http://www.nabble.com/Solr-Core-Size-limit-tp20416899p20416899.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Core Size limit

2008-11-10 Thread Norberto Meijome
On Mon, 10 Nov 2008 10:24:47 -0800 (PST)
Otis Gospodnetic [EMAIL PROTECTED] wrote:

 I don't think there is a limit other than your hardware and the internal Doc
 ID which limits you to 2B docs on 32-bit machines.

Hi Otis,
just curious is this internal doc ID reused when an optimise happens? or 
gaps left and re-filled when 2B is reached ? 

cheers,
b

_
{Beto|Norberto|Numard} Meijome

Whenever you find that you are on the side of the majority, it is time to 
reform.
   Mark Twain

I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.