Re: Core size - distinguish between merge and deletes
Please don't do that ;) Unless you're willing to do it frequently. See: https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/ expungeDeletes is really a variety of optimize, so the issues outlined in that blog apply. Best, Erick On Thu, Nov 9, 2017 at 12:24 PM, Shashank Pedamalluwrote: > Thanks for the response Erick. I’m deleting the documents with expungeDeletes > option set as true. So, that does trigger a merge to throw away the deleted > documents. > > On 11/9/17, 12:17 PM, "Erick Erickson" wrote: > > bq: Is there a way to distinguish between when size is being reduced > because of a delete from that of during a lucene merge. > > Not sure what you're really looking for here. Size on disk is _never_ > reduced by a delete operation, the document is only 'marked as > deleted'. Only when segments are merged are the resources reclaimed, > i.e. the index gets smaller. You can figure out what % of your index > consists of deleted documents by the delta between numDocs and > maxDocs, available on the admin UI and from the Luke handler (and > maybe JMX). > > Best, > Erick > > On Thu, Nov 9, 2017 at 12:06 PM, Shashank Pedamallu > wrote: > > Hi, > > > > I wanted to get accurate metrics regarding to the amount of data being > indexed in Solr. In this regard, I observe that sometimes, this number > decreases due to lucene merges. But I’m also deleting data at times. Is there > a way to distinguish between when size is being reduced because of a delete > from that of during a lucene merge. > > > > Thanks, > > Shashank > >
Re: Core size - distinguish between merge and deletes
Thanks for the response Erick. I’m deleting the documents with expungeDeletes option set as true. So, that does trigger a merge to throw away the deleted documents. On 11/9/17, 12:17 PM, "Erick Erickson"wrote: bq: Is there a way to distinguish between when size is being reduced because of a delete from that of during a lucene merge. Not sure what you're really looking for here. Size on disk is _never_ reduced by a delete operation, the document is only 'marked as deleted'. Only when segments are merged are the resources reclaimed, i.e. the index gets smaller. You can figure out what % of your index consists of deleted documents by the delta between numDocs and maxDocs, available on the admin UI and from the Luke handler (and maybe JMX). Best, Erick On Thu, Nov 9, 2017 at 12:06 PM, Shashank Pedamallu wrote: > Hi, > > I wanted to get accurate metrics regarding to the amount of data being indexed in Solr. In this regard, I observe that sometimes, this number decreases due to lucene merges. But I’m also deleting data at times. Is there a way to distinguish between when size is being reduced because of a delete from that of during a lucene merge. > > Thanks, > Shashank
Re: Core size - distinguish between merge and deletes
bq: Is there a way to distinguish between when size is being reduced because of a delete from that of during a lucene merge. Not sure what you're really looking for here. Size on disk is _never_ reduced by a delete operation, the document is only 'marked as deleted'. Only when segments are merged are the resources reclaimed, i.e. the index gets smaller. You can figure out what % of your index consists of deleted documents by the delta between numDocs and maxDocs, available on the admin UI and from the Luke handler (and maybe JMX). Best, Erick On Thu, Nov 9, 2017 at 12:06 PM, Shashank Pedamalluwrote: > Hi, > > I wanted to get accurate metrics regarding to the amount of data being > indexed in Solr. In this regard, I observe that sometimes, this number > decreases due to lucene merges. But I’m also deleting data at times. Is there > a way to distinguish between when size is being reduced because of a delete > from that of during a lucene merge. > > Thanks, > Shashank
Core size - distinguish between merge and deletes
Hi, I wanted to get accurate metrics regarding to the amount of data being indexed in Solr. In this regard, I observe that sometimes, this number decreases due to lucene merges. But I’m also deleting data at times. Is there a way to distinguish between when size is being reduced because of a delete from that of during a lucene merge. Thanks, Shashank
Re: optimum solr core size
Jame, That really depends on your hardware, query load and profile, search latency and other performance requirements. That said, a 123 GB core is a pretty big core :) Otis Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spm - Original Message - From: jame vaalet jamevaa...@gmail.com To: solr-user@lucene.apache.org Cc: Sent: Thursday, August 30, 2012 2:40 AM Subject: optimum solr core size Hi, I have got singel core solr deployment in production and the documents are getting added daily (around 1 million entries per week). My core size has approached 123 GB, i would like to know what would be the optimum size of a single core (not the number of docs but the size of the index file) to start a new core. -- -JAME
Re: optimum solr core size
How many documents are there in the index? How many stored/indexed fields? There is no magic number as yet for defining the size of a single core(whether no. of docs OR the size of index), but 123GB seems to be on a higher side, so, you could definitely go for sharding of indexes. BTW, how are your searches/indexing performing over the time? Are there any impact? Regards Pravesh -- View this message in context: http://lucene.472066.n3.nabble.com/optimum-solr-core-size-tp4004251p4004424.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr core size on disk
Paul Thank you for your reply I did du -sh in /solr_env/index/data and it shows 36G It is distributed among 700 cores with most of them being 150M Is that a big index that should be sharded ? 2009/12/17 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com look at the index dir and see the size of the files . it is typically in $SOLR_HOME/data/index On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour matth...@kikin.com wrote: Hi I am new to solr. Here is my question: How to find out the size of a solr core on disk ? Thank you matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: solr core size on disk
Sharding isn't necessarily decided upon by index size. Is your search performance ok? Got enough free disk space to optimize? Then don't shard. But no, 150M is not a large index size. 700 cores, now that's a lot! Erik On Dec 17, 2009, at 1:27 PM, Matthieu Labour wrote: Paul Thank you for your reply I did du -sh in /solr_env/index/data and it shows 36G It is distributed among 700 cores with most of them being 150M Is that a big index that should be sharded ? 2009/12/17 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com look at the index dir and see the size of the files . it is typically in $SOLR_HOME/data/index On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour matth...@kikin.com wrote: Hi I am new to solr. Here is my question: How to find out the size of a solr core on disk ? Thank you matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
solr core size on disk
Hi I am new to solr. Here is my question: How to find out the size of a solr core on disk ? Thank you matt
Re: solr core size on disk
look at the index dir and see the size of the files . it is typically in $SOLR_HOME/data/index On Thu, Dec 17, 2009 at 2:56 AM, Matthieu Labour matth...@kikin.com wrote: Hi I am new to solr. Here is my question: How to find out the size of a solr core on disk ? Thank you matt -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: core size
Been there done that. Indexing into the smaller cores will be faster. You will be able to spread the load across multiple machines. There are other advantages: You will not have a 1/2Terabyte set of files to worry about. You will not need 1.1T in one partition to run an optimize. You will not need 12+ hours to run an optimize. It will not take 1/2 hour to copy the newly optimized index to a query server. On Mon, Nov 16, 2009 at 7:14 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: If an index fits in memory, I am guessing you'll see the speed change roughly proportionally to the size of the index. If an index does not fit into memory (i.e. disk head has to run around the disk to look for info), then the improvement will be even greater. I haven't explicitly tested this and am hoping somebody will correct me if this is wrong. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Phil Hagelberg p...@hagelb.org To: solr-user@lucene.apache.org Sent: Mon, November 16, 2009 8:42:49 PM Subject: core size I'm are planning out a system with large indexes and wondering what kind of performance boost I'd see if I split out documents into many cores rather than using a single core and splitting by a field. I've got about 500GB worth of indexes ranging from 100MB to 50GB each. I'm assuming if we split them out to multiple cores we would see the most dramatic benefit in searches on the smaller cores, but I'm just wondering what level of speedup I should expect. Eventually the cores will be split up anyway, I'm just trying to determine how to prioritize it. thanks, Phil -- Lance Norskog goks...@gmail.com
core size
I'm are planning out a system with large indexes and wondering what kind of performance boost I'd see if I split out documents into many cores rather than using a single core and splitting by a field. I've got about 500GB worth of indexes ranging from 100MB to 50GB each. I'm assuming if we split them out to multiple cores we would see the most dramatic benefit in searches on the smaller cores, but I'm just wondering what level of speedup I should expect. Eventually the cores will be split up anyway, I'm just trying to determine how to prioritize it. thanks, Phil
Re: core size
If an index fits in memory, I am guessing you'll see the speed change roughly proportionally to the size of the index. If an index does not fit into memory (i.e. disk head has to run around the disk to look for info), then the improvement will be even greater. I haven't explicitly tested this and am hoping somebody will correct me if this is wrong. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Phil Hagelberg p...@hagelb.org To: solr-user@lucene.apache.org Sent: Mon, November 16, 2009 8:42:49 PM Subject: core size I'm are planning out a system with large indexes and wondering what kind of performance boost I'd see if I split out documents into many cores rather than using a single core and splitting by a field. I've got about 500GB worth of indexes ranging from 100MB to 50GB each. I'm assuming if we split them out to multiple cores we would see the most dramatic benefit in searches on the smaller cores, but I'm just wondering what level of speedup I should expect. Eventually the cores will be split up anyway, I'm just trying to determine how to prioritize it. thanks, Phil
Re: Solr Core Size limit
On Tue, 11 Nov 2008 10:25:07 -0800 (PST) Otis Gospodnetic [EMAIL PROTECTED] wrote: Doc ID gaps are zapped during segment merges and index optimization. thanks Otis :) b _ {Beto|Norberto|Numard} Meijome I didn't attend the funeral, but I sent a nice letter saying I approved of it. Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
On Tue, 11 Nov 2008 20:39:32 -0800 (PST) Otis Gospodnetic [EMAIL PROTECTED] wrote: With Distributed Search you are limited to # of shards * Integer.MAX_VALUE. yeah, makes sense. And i would suspect since this is PER INDEX , it applies to each core only ( so you could have n cores in m shards for n * m * integer.MAX_VALUE docs). _ {Beto|Norberto|Numard} Meijome The more I see the less I know for sure. John Lennon I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
Doc ID gaps are zapped during segment merges and index optimization. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Norberto Meijome [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, November 10, 2008 6:45:01 PM Subject: Re: Solr Core Size limit On Mon, 10 Nov 2008 10:24:47 -0800 (PST) Otis Gospodnetic [EMAIL PROTECTED] wrote: I don't think there is a limit other than your hardware and the internal Doc ID which limits you to 2B docs on 32-bit machines. Hi Otis, just curious is this internal doc ID reused when an optimise happens? or gaps left and re-filled when 2B is reached ? cheers, b _ {Beto|Norberto|Numard} Meijome Whenever you find that you are on the side of the majority, it is time to reform. Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
What happens when we use another uniqueKey in this case? I was under the assumption that if we say uniqueKeystyleId/uniqueKey then our doc IDs will be our styleIds. Is there a secondary ID that's kept internal to Solr/Lucene in this case? Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 On Nov 11, 2008, at 10:25 AM, Otis Gospodnetic wrote: Doc ID gaps are zapped during segment merges and index optimization. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Norberto Meijome [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, November 10, 2008 6:45:01 PM Subject: Re: Solr Core Size limit On Mon, 10 Nov 2008 10:24:47 -0800 (PST) Otis Gospodnetic [EMAIL PROTECTED] wrote: I don't think there is a limit other than your hardware and the internal Doc ID which limits you to 2B docs on 32-bit machines. Hi Otis, just curious is this internal doc ID reused when an optimise happens? or gaps left and re-filled when 2B is reached ? cheers, b _ {Beto|Norberto|Numard} Meijome Whenever you find that you are on the side of the majority, it is time to reform. Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: Solr Core Size limit
On Tue, Nov 11, 2008 at 6:59 PM, Matthew Runo [EMAIL PROTECTED] wrote: What happens when we use another uniqueKey in this case? I was under the assumption that if we say uniqueKeystyleId/uniqueKey then our doc IDs will be our styleIds. Is there a secondary ID that's kept internal to Solr/Lucene in this case? There is an internal document id in Lucene indexes - an integer starting at zero. Since it's transient (can change with segment mergers and is thus only really valid for the duration of an IndexReader), Solr tends to hide those. The uniqueKey has no bearing on what the internal Lucene docid is. -Yonik
Re: Solr Core Size limit
On Nov 11, 2008, at 8:03 PM, Yonik Seeley wrote: On Tue, Nov 11, 2008 at 6:59 PM, Matthew Runo [EMAIL PROTECTED] wrote: What happens when we use another uniqueKey in this case? I was under the assumption that if we say uniqueKeystyleId/uniqueKey then our doc IDs will be our styleIds. Is there a secondary ID that's kept internal to Solr/Lucene in this case? There is an internal document id in Lucene indexes - an integer starting at zero. Since it's transient (can change with segment mergers and is thus only really valid for the duration of an IndexReader), Solr tends to hide those. The uniqueKey has no bearing on what the internal Lucene docid is. -Yonik For the record... with replication, solr is not limited to Integer.MAX_VALUE documents. (Although each shard is limited to Integer.MAX_VALUE docs) ryan
Re: Solr Core Size limit
Right. With Distributed Search you are limited to # of shards * Integer.MAX_VALUE. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Ryan McKinley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Tuesday, November 11, 2008 9:38:34 PM Subject: Re: Solr Core Size limit On Nov 11, 2008, at 8:03 PM, Yonik Seeley wrote: On Tue, Nov 11, 2008 at 6:59 PM, Matthew Runo [EMAIL PROTECTED] wrote: What happens when we use another uniqueKey in this case? I was under the assumption that if we say uniqueKeystyleId/uniqueKey then our doc IDs will be our styleIds. Is there a secondary ID that's kept internal to Solr/Lucene in this case? There is an internal document id in Lucene indexes - an integer starting at zero. Since it's transient (can change with segment mergers and is thus only really valid for the duration of an IndexReader), Solr tends to hide those. The uniqueKey has no bearing on what the internal Lucene docid is. -Yonik For the record... with replication, solr is not limited to Integer.MAX_VALUE documents. (Although each shard is limited to Integer.MAX_VALUE docs) ryan
Solr Core Size limit
Hi, Im using Solr multicore functionality in my app. I want to know the size limit of holding the index files in each core.How can i identify the maximum size limit of the cores. Thanks in advance Prabhu.K -- View this message in context: http://www.nabble.com/Solr-Core-Size-limit-tp20416899p20416899.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Core Size limit
Hi, I don't think there is a limit other than your hardware and the internal Doc ID which limits you to 2B docs on 32-bit machines. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: RaghavPrabhu [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, November 10, 2008 5:43:17 AM Subject: Solr Core Size limit Hi, Im using Solr multicore functionality in my app. I want to know the size limit of holding the index files in each core.How can i identify the maximum size limit of the cores. Thanks in advance Prabhu.K -- View this message in context: http://www.nabble.com/Solr-Core-Size-limit-tp20416899p20416899.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Core Size limit
On Mon, 10 Nov 2008 10:24:47 -0800 (PST) Otis Gospodnetic [EMAIL PROTECTED] wrote: I don't think there is a limit other than your hardware and the internal Doc ID which limits you to 2B docs on 32-bit machines. Hi Otis, just curious is this internal doc ID reused when an optimise happens? or gaps left and re-filled when 2B is reached ? cheers, b _ {Beto|Norberto|Numard} Meijome Whenever you find that you are on the side of the majority, it is time to reform. Mark Twain I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.