RE: Re: Terminology question: Core vs. Collection vs...
Good write up. And what about node? I think there needs to be an official glossary of terms that is sanctioned by the solr team and some terms still ni use may need to be labeled deprecated. After so many years, its still confusing. brbrbr--- Original Message --- On 1/3/2013 08:07 AM Jack Krupansky wrote:brCollection is the more modern term and incorporates the fact that the brcollection may be sharded, with each shard on one or more cores, with each brcore being a replica of the other cores within that shard of that brcollection. br brInstance is a general term, but is commonly used to refer to a running Solr brserver, each of which can service any number of cores. A sharded collection brwould typically require multiple instances of Solr, each with a shard of the brcollection. br brMultiple collections can be supported on a single instance of Solr. They brdon't have to be sharded or replicated. But if they are, each Solr instance brwill have a copy or replica of the data (index) of one shard of each sharded brcollection - to the degree that each collection needs that many shards. br brAt the API level, you talk to a Solr instance, using a host and port, and brgiving the collection name. Some operations will refer only to the portion brof a multi-shard collection on that Solr instance, but typically Solr will brdistribute the operation, whether it be an update or a query, to all of brthe shards of the named collection. In the case of update, the update will brbe distributed to all replicas as well, but in the case of query only one brreplica of each shard of the collection is needed. br brBefore SolrCloud we Solr had master and slave and the slaves were replicas brof the master, but with SolrCloud there is no master and all the replicas of brthe shard are peers, although at any moment of time one of them will be brconsidered the leader for coordination purposes, but not in the sense that brit is a master of the other replicas in that shard. A SolrCloud replica is a brreplica of the data, in an abstract sense, for a single shard of a brcollection. A SolrCloud replica is more of an instance of the data/index. br brAn index exists at two levels: the portion of a collection on a single Solr brcore will have a Lucene index, but collectively the Lucene indexes for the brshards of a collection can be referred to the index of the collection. Each brreplica is a copy or instance of a portion of the collection's index. br brThe term slice is sometimes used to refer collectively to all of the brcores/replicas of a single shard, or sometimes to a single replica as it brcontains only a slice of the full collection data. br br-- Jack Krupansky br br-Original Message- brFrom: Alexandre Rafalovitch brSent: Thursday, January 03, 2013 4:42 AM brTo: solr-user@lucene.apache.org brSubject: Terminology question: Core vs. Collection vs... br brHello, br brI am trying to understand the core Solr terminology. I am looking for brcorrect rather than loose meaning as I am trying to teach an example that brstarts from easy scenario and may scale to multi-core, multi-machine brsituation. br brHere are the terms that seem to be all overlapping and/or crossing over in brmy mind a the moment. br br1) Index br2) Core br3) Collection br4) Instance br5) Replica (Replica of _what_?) br6) Others? br brI tried looking through documentation, but either there is a terminology brdrift or I am having trouble understanding the distinctions. br brIf anybody has a clear picture in their mind, I would appreciate a brclarification. br brRegards, br Alex. br brPersonal blog: http://blog.outerthoughts.com/ brLinkedIn: http://www.linkedin.com/in/alexandrerafalovitch br- Time is the quality of nature that keeps events from happening all at bronce. Lately, it doesn't seem to be working. (Anonymous - via GTD book) br br
RE: Re: Terminology question: Core vs. Collection vs...
Thanks again. (And sorry to jump into this convo) But I had a question on your statement: On 1/3/2013 08:07 AM Jack Krupansky wrote: brCollection is the more modern term and incorporates the fact that the brcollection may be sharded, with each shard on one or more cores, with each brcore being a replica of the other cores within that shard of that brcollection. A collection is sharded, meaning it is distributed across cores. A shard itself is not distributed across cores in the same since. Rather a shard exist on a single core and is replicated on other cores. Is that right? The way its worded above, it sounds like a shard can also be sharded... brbrbr--- Original Message --- On 1/3/2013 08:28 AM Jack Krupansky wrote:brA node is a machine in a cluster or cloud (graph). It could be a real brmachine or a virtualized machine. Technically, you could have multiple brvirtual nodes on the same physical box. Each Solr replica would be on a brdifferent node. br brTechnically, you could have multiple Solr instances running on a single brhardware node, each with a different port. They are simply instances of brSolr, although you could consider each Solr instance a node in a Solr cloud bras well, a virtual node. So, technically, you could have multiple replicas bron the same node, but that sort of defeats most of the purpose of having brreplicas in the first place - to distribute the data for performance and brfault tolerance. But, you could have replicas of different shards on the brsame node/box for a partial improvement of performance and fault tolerance. br brA Solr cloud' is really a cluster. br br-- Jack Krupansky br br-Original Message- brFrom: Darren Govoni brSent: Thursday, January 03, 2013 8:16 AM brTo: solr-user@lucene.apache.org brSubject: RE: Re: Terminology question: Core vs. Collection vs... br brGood write up. br brAnd what about node? br brI think there needs to be an official glossary of terms that is sanctioned brby the solr team and some terms still ni use may need to be labeled brdeprecated. After so many years, its still confusing. br brbrbrbr--- Original Message --- brOn 1/3/2013 08:07 AM Jack Krupansky wrote:brCollection is the more modern brterm and incorporates the fact that the brbrcollection may be sharded, with each shard on one or more cores, with breach brbrcore being a replica of the other cores within that shard of that brbrcollection. brbr brbrInstance is a general term, but is commonly used to refer to a running brSolr brbrserver, each of which can service any number of cores. A sharded brcollection brbrwould typically require multiple instances of Solr, each with a shard of brthe brbrcollection. brbr brbrMultiple collections can be supported on a single instance of Solr. They brbrdon't have to be sharded or replicated. But if they are, each Solr brinstance brbrwill have a copy or replica of the data (index) of one shard of each brsharded brbrcollection - to the degree that each collection needs that many shards. brbr brbrAt the API level, you talk to a Solr instance, using a host and port, brand brbrgiving the collection name. Some operations will refer only to the brportion brbrof a multi-shard collection on that Solr instance, but typically Solr brwill brbrdistribute the operation, whether it be an update or a query, to all brof brbrthe shards of the named collection. In the case of update, the update brwill brbrbe distributed to all replicas as well, but in the case of query only brone brbrreplica of each shard of the collection is needed. brbr brbrBefore SolrCloud we Solr had master and slave and the slaves were brreplicas brbrof the master, but with SolrCloud there is no master and all the brreplicas of brbrthe shard are peers, although at any moment of time one of them will be brbrconsidered the leader for coordination purposes, but not in the sense brthat brbrit is a master of the other replicas in that shard. A SolrCloud replica bris a brbrreplica of the data, in an abstract sense, for a single shard of a brbrcollection. A SolrCloud replica is more of an instance of the brdata/index. brbr brbrAn index exists at two levels: the portion of a collection on a single brSolr brbrcore will have a Lucene index, but collectively the Lucene indexes for brthe brbrshards of a collection can be referred to the index of the collection. brEach brbrreplica is a copy or instance of a portion of the collection's index. brbr brbrThe term slice is sometimes used to refer collectively to all of the brbrcores/replicas of a single shard, or sometimes to a single replica as it brbrcontains only a slice of the full collection data. brbr brbr-- Jack Krupansky brbr brbr-Original Message- brbrFrom: Alexandre Rafalovitch brbrSent: Thursday, January 03, 2013 4:42 AM brbrTo: solr-user@lucene.apache.org brbrSubject: Terminology question: Core vs. Collection vs... brbr brbrHello, brbr brbrI am trying
RE: Re: Terminology question: Core vs. Collection vs...
Thanks. I got that part. A group of shards (and therefore cores) represent a collection, yes. But a single shard exist only on a single core? brbrbr--- Original Message --- On 1/3/2013 09:03 AM Jack Krupansky wrote:brNo, a shard is a subset (or slice) of the collection. Sharding is a way of brslicing the original data, before we talk about how the shards get stored brand replicated on actual Solr cores. Replicas are instances of the data for bra shard. br brSometimes people may loosely speak of a replica as being a shard, but brthat's just loose use of the terminology. br brSo, we're not sharding shards, but we are replicating shards. br br-- Jack Krupansky br br-Original Message- brFrom: Darren Govoni brSent: Thursday, January 03, 2013 8:51 AM brTo: solr-user@lucene.apache.org brSubject: RE: Re: Terminology question: Core vs. Collection vs... br brThanks again. (And sorry to jump into this convo) br brBut I had a question on your statement: br brOn 1/3/2013 08:07 AM Jack Krupansky wrote: br brCollection is the more modern term and incorporates the fact that the brbrcollection may be sharded, with each shard on one or more cores, with breach brcore being a replica of the other cores within that shard of that brbrcollection. br brA collection is sharded, meaning it is distributed across cores. A shard britself is not distributed across cores in the same since. Rather a shard brexist on a single core and is replicated on other cores. Is that right? The brway its worded above, it sounds like a shard can also be sharded... br br brbrbrbr--- Original Message --- brOn 1/3/2013 08:28 AM Jack Krupansky wrote:brA node is a machine in a brcluster or cloud (graph). It could be a real brbrmachine or a virtualized machine. Technically, you could have multiple brbrvirtual nodes on the same physical box. Each Solr replica would be on bra brbrdifferent node. brbr brbrTechnically, you could have multiple Solr instances running on a single brbrhardware node, each with a different port. They are simply instances of brbrSolr, although you could consider each Solr instance a node in a Solr brcloud brbras well, a virtual node. So, technically, you could have multiple brreplicas brbron the same node, but that sort of defeats most of the purpose of having brbrreplicas in the first place - to distribute the data for performance and brbrfault tolerance. But, you could have replicas of different shards on the brbrsame node/box for a partial improvement of performance and fault brtolerance. brbr brbrA Solr cloud' is really a cluster. brbr brbr-- Jack Krupansky brbr brbr-Original Message- brbrFrom: Darren Govoni brbrSent: Thursday, January 03, 2013 8:16 AM brbrTo: solr-user@lucene.apache.org brbrSubject: RE: Re: Terminology question: Core vs. Collection vs... brbr brbrGood write up. brbr brbrAnd what about node? brbr brbrI think there needs to be an official glossary of terms that is brsanctioned brbrby the solr team and some terms still ni use may need to be labeled brbrdeprecated. After so many years, its still confusing. brbr brbrbrbrbr--- Original Message --- brbrOn 1/3/2013 08:07 AM Jack Krupansky wrote:brCollection is the more brmodern brbrterm and incorporates the fact that the brbrbrcollection may be sharded, with each shard on one or more cores, brwith brbreach brbrbrcore being a replica of the other cores within that shard of that brbrbrcollection. brbrbr brbrbrInstance is a general term, but is commonly used to refer to a brrunning brbrSolr brbrbrserver, each of which can service any number of cores. A sharded brbrcollection brbrbrwould typically require multiple instances of Solr, each with a brshard of brbrthe brbrbrcollection. brbrbr brbrbrMultiple collections can be supported on a single instance of Solr. brThey brbrbrdon't have to be sharded or replicated. But if they are, each Solr brbrinstance brbrbrwill have a copy or replica of the data (index) of one shard of each brbrsharded brbrbrcollection - to the degree that each collection needs that many brshards. brbrbr brbrbrAt the API level, you talk to a Solr instance, using a host and brport, brbrand brbrbrgiving the collection name. Some operations will refer only to the brbrportion brbrbrof a multi-shard collection on that Solr instance, but typically brSolr brbrwill brbrbrdistribute the operation, whether it be an update or a query, to brall brbrof brbrbrthe shards of the named collection. In the case of update, the brupdate brbrwill brbrbrbe distributed to all replicas as well, but in the case of query bronly brbrone brbrbrreplica of each shard of the collection is needed. brbrbr brbrbrBefore SolrCloud we Solr had master and slave and the slaves were brbrreplicas brbrbrof the master, but with SolrCloud there is no master and all the brbrreplicas of brbrbrthe shard are peers, although at any moment of time one of them will brbe brbrbrconsidered the leader
RE: Re: Terminology question: Core vs. Collection vs...
I think what's confusing about your explanation below is when you have a situation where there is no replication factor. That's possible too, yes? So in that case, is each core of a shard of a collection, still referred to as a replica? To me a replica is a duplicate/backup of a shard's core. Not the sharded core itself. Or is there just no difference. Even a non-replicated core is called a replica? brbrbr--- Original Message --- On 1/3/2013 09:08 AM Jack Krupansky wrote:brOops... let me word that a little more carefully: br br...we are replicating the data of each shard. br br br br br br-- Jack Krupansky br-Original Message- brFrom: Jack Krupansky brSent: Thursday, January 03, 2013 9:03 AM brTo: solr-user@lucene.apache.org brSubject: Re: Terminology question: Core vs. Collection vs... br brNo, a shard is a subset (or slice) of the collection. Sharding is a way of brslicing the original data, before we talk about how the shards get stored brand replicated on actual Solr cores. Replicas are instances of the data for bra shard. br brSometimes people may loosely speak of a replica as being a shard, but brthat's just loose use of the terminology. br brSo, we're not sharding shards, but we are replicating shards. br br-- Jack Krupansky br br-Original Message- brFrom: Darren Govoni brSent: Thursday, January 03, 2013 8:51 AM brTo: solr-user@lucene.apache.org brSubject: RE: Re: Terminology question: Core vs. Collection vs... br brThanks again. (And sorry to jump into this convo) br brBut I had a question on your statement: br brOn 1/3/2013 08:07 AM Jack Krupansky wrote: br brCollection is the more modern term and incorporates the fact that the brbrcollection may be sharded, with each shard on one or more cores, with breach brcore being a replica of the other cores within that shard of that brbrcollection. br brA collection is sharded, meaning it is distributed across cores. A shard britself is not distributed across cores in the same since. Rather a shard brexist on a single core and is replicated on other cores. Is that right? The brway its worded above, it sounds like a shard can also be sharded... br br brbrbrbr--- Original Message --- brOn 1/3/2013 08:28 AM Jack Krupansky wrote:brA node is a machine in a brcluster or cloud (graph). It could be a real brbrmachine or a virtualized machine. Technically, you could have multiple brbrvirtual nodes on the same physical box. Each Solr replica would be on bra brbrdifferent node. brbr brbrTechnically, you could have multiple Solr instances running on a single brbrhardware node, each with a different port. They are simply instances of brbrSolr, although you could consider each Solr instance a node in a Solr brcloud brbras well, a virtual node. So, technically, you could have multiple brreplicas brbron the same node, but that sort of defeats most of the purpose of having brbrreplicas in the first place - to distribute the data for performance and brbrfault tolerance. But, you could have replicas of different shards on the brbrsame node/box for a partial improvement of performance and fault brtolerance. brbr brbrA Solr cloud' is really a cluster. brbr brbr-- Jack Krupansky brbr brbr-Original Message- brbrFrom: Darren Govoni brbrSent: Thursday, January 03, 2013 8:16 AM brbrTo: solr-user@lucene.apache.org brbrSubject: RE: Re: Terminology question: Core vs. Collection vs... brbr brbrGood write up. brbr brbrAnd what about node? brbr brbrI think there needs to be an official glossary of terms that is brsanctioned brbrby the solr team and some terms still ni use may need to be labeled brbrdeprecated. After so many years, its still confusing. brbr brbrbrbrbr--- Original Message --- brbrOn 1/3/2013 08:07 AM Jack Krupansky wrote:brCollection is the more brmodern brbrterm and incorporates the fact that the brbrbrcollection may be sharded, with each shard on one or more cores, brwith brbreach brbrbrcore being a replica of the other cores within that shard of that brbrbrcollection. brbrbr brbrbrInstance is a general term, but is commonly used to refer to a brrunning brbrSolr brbrbrserver, each of which can service any number of cores. A sharded brbrcollection brbrbrwould typically require multiple instances of Solr, each with a brshard of brbrthe brbrbrcollection. brbrbr brbrbrMultiple collections can be supported on a single instance of Solr. brThey brbrbrdon't have to be sharded or replicated. But if they are, each Solr brbrinstance brbrbrwill have a copy or replica of the data (index) of one shard of each brbrsharded brbrbrcollection - to the degree that each collection needs that many brshards. brbrbr brbrbrAt the API level, you talk to a Solr instance, using a host and brport, brbrand brbrbrgiving the collection name. Some operations will refer only to the brbrportion brbrbrof a multi-shard collection on that Solr instance, but typically brSolr brbrwill brbrbrdistribute the operation, whether it be an update
RE: Re: Terminology question: Core vs. Collection vs...
Yes. And its worth to note that when having multiple shards in a single node(@deprecated) that they are shards of different collections... brbrbr--- Original Message --- On 1/3/2013 09:16 AM Jack Krupansky wrote:brAnd I would revise node to note that in SolrCloud a node is simply an brinstance of a Solr server. br brAnd, technically, you can have multiple shards in a single instance of Solr, brseparating the logical sharding of keys from the distribution of the data. br br-- Jack Krupansky br br-Original Message- brFrom: Jack Krupansky brSent: Thursday, January 03, 2013 9:08 AM brTo: solr-user@lucene.apache.org brSubject: Re: Terminology question: Core vs. Collection vs... br brOops... let me word that a little more carefully: br br...we are replicating the data of each shard. br br br br br br-- Jack Krupansky br-Original Message- brFrom: Jack Krupansky brSent: Thursday, January 03, 2013 9:03 AM brTo: solr-user@lucene.apache.org brSubject: Re: Terminology question: Core vs. Collection vs... br brNo, a shard is a subset (or slice) of the collection. Sharding is a way of brslicing the original data, before we talk about how the shards get stored brand replicated on actual Solr cores. Replicas are instances of the data for bra shard. br brSometimes people may loosely speak of a replica as being a shard, but brthat's just loose use of the terminology. br brSo, we're not sharding shards, but we are replicating shards. br br-- Jack Krupansky br br-Original Message- brFrom: Darren Govoni brSent: Thursday, January 03, 2013 8:51 AM brTo: solr-user@lucene.apache.org brSubject: RE: Re: Terminology question: Core vs. Collection vs... br brThanks again. (And sorry to jump into this convo) br brBut I had a question on your statement: br brOn 1/3/2013 08:07 AM Jack Krupansky wrote: br brCollection is the more modern term and incorporates the fact that the brbrcollection may be sharded, with each shard on one or more cores, with breach brcore being a replica of the other cores within that shard of that brbrcollection. br brA collection is sharded, meaning it is distributed across cores. A shard britself is not distributed across cores in the same since. Rather a shard brexist on a single core and is replicated on other cores. Is that right? The brway its worded above, it sounds like a shard can also be sharded... br br brbrbrbr--- Original Message --- brOn 1/3/2013 08:28 AM Jack Krupansky wrote:brA node is a machine in a brcluster or cloud (graph). It could be a real brbrmachine or a virtualized machine. Technically, you could have multiple brbrvirtual nodes on the same physical box. Each Solr replica would be on bra brbrdifferent node. brbr brbrTechnically, you could have multiple Solr instances running on a single brbrhardware node, each with a different port. They are simply instances of brbrSolr, although you could consider each Solr instance a node in a Solr brcloud brbras well, a virtual node. So, technically, you could have multiple brreplicas brbron the same node, but that sort of defeats most of the purpose of having brbrreplicas in the first place - to distribute the data for performance and brbrfault tolerance. But, you could have replicas of different shards on the brbrsame node/box for a partial improvement of performance and fault brtolerance. brbr brbrA Solr cloud' is really a cluster. brbr brbr-- Jack Krupansky brbr brbr-Original Message- brbrFrom: Darren Govoni brbrSent: Thursday, January 03, 2013 8:16 AM brbrTo: solr-user@lucene.apache.org brbrSubject: RE: Re: Terminology question: Core vs. Collection vs... brbr brbrGood write up. brbr brbrAnd what about node? brbr brbrI think there needs to be an official glossary of terms that is brsanctioned brbrby the solr team and some terms still ni use may need to be labeled brbrdeprecated. After so many years, its still confusing. brbr brbrbrbrbr--- Original Message --- brbrOn 1/3/2013 08:07 AM Jack Krupansky wrote:brCollection is the more brmodern brbrterm and incorporates the fact that the brbrbrcollection may be sharded, with each shard on one or more cores, brwith brbreach brbrbrcore being a replica of the other cores within that shard of that brbrbrcollection. brbrbr brbrbrInstance is a general term, but is commonly used to refer to a brrunning brbrSolr brbrbrserver, each of which can service any number of cores. A sharded brbrcollection brbrbrwould typically require multiple instances of Solr, each with a brshard of brbrthe brbrbrcollection. brbrbr brbrbrMultiple collections can be supported on a single instance of Solr. brThey brbrbrdon't have to be sharded or replicated. But if they are, each Solr brbrinstance brbrbrwill have a copy or replica of the data (index) of one shard of each brbrsharded brbrbrcollection - to the degree that each collection needs that many brshards. brbrbr brbrbrAt the API level, you talk to a Solr instance, using a host and brport, brbrand brbrbrgiving
RE: Re: Terminology question: Core vs. Collection vs...
Ah, ok. Good. Makes sense. I think I will draw all this up in a UML that includes the distinction between the logical terms and the physical terms (and their mapping) as they do get intertwined. I'll post it here when I'm done. brbrbr--- Original Message --- On 1/3/2013 09:19 AM Jack Krupansky wrote:brA single shard MAY exist on a single core, but only if it is not replicated. brGenerally, a single shard will exist on multiple cores, each a replica of brthe source data as it comes into the update handler. br br-- Jack Krupansky br br-Original Message- brFrom: Darren Govoni brSent: Thursday, January 03, 2013 9:10 AM brTo: solr-user@lucene.apache.org brSubject: RE: Re: Terminology question: Core vs. Collection vs... br brThanks. I got that part. br brA group of shards (and therefore cores) represent a collection, yes. But a brsingle shard exist only on a single core? br brbrbrbr--- Original Message --- brOn 1/3/2013 09:03 AM Jack Krupansky wrote:brNo, a shard is a subset (or brslice) of the collection. Sharding is a way of brbrslicing the original data, before we talk about how the shards get brstored brbrand replicated on actual Solr cores. Replicas are instances of the data brfor brbra shard. brbr brbrSometimes people may loosely speak of a replica as being a shard, but brbrthat's just loose use of the terminology. brbr brbrSo, we're not sharding shards, but we are replicating shards. brbr brbr-- Jack Krupansky brbr brbr-Original Message- brbrFrom: Darren Govoni brbrSent: Thursday, January 03, 2013 8:51 AM brbrTo: solr-user@lucene.apache.org brbrSubject: RE: Re: Terminology question: Core vs. Collection vs... brbr brbrThanks again. (And sorry to jump into this convo) brbr brbrBut I had a question on your statement: brbr brbrOn 1/3/2013 08:07 AM Jack Krupansky wrote: brbr brCollection is the more modern term and incorporates the fact that brthe brbrbrcollection may be sharded, with each shard on one or more cores, brwith brbreach brcore being a replica of the other cores within that shard of brthat brbrbrcollection. brbr brbrA collection is sharded, meaning it is distributed across cores. A shard brbritself is not distributed across cores in the same since. Rather a shard brbrexist on a single core and is replicated on other cores. Is that right? brThe brbrway its worded above, it sounds like a shard can also be sharded... brbr brbr brbrbrbrbr--- Original Message --- brbrOn 1/3/2013 08:28 AM Jack Krupansky wrote:brA node is a machine in a brbrcluster or cloud (graph). It could be a real brbrbrmachine or a virtualized machine. Technically, you could have brmultiple brbrbrvirtual nodes on the same physical box. Each Solr replica would be bron brbra brbrbrdifferent node. brbrbr brbrbrTechnically, you could have multiple Solr instances running on a brsingle brbrbrhardware node, each with a different port. They are simply instances brof brbrbrSolr, although you could consider each Solr instance a node in a brSolr brbrcloud brbrbras well, a virtual node. So, technically, you could have multiple brbrreplicas brbrbron the same node, but that sort of defeats most of the purpose of brhaving brbrbrreplicas in the first place - to distribute the data for performance brand brbrbrfault tolerance. But, you could have replicas of different shards on brthe brbrbrsame node/box for a partial improvement of performance and fault brbrtolerance. brbrbr brbrbrA Solr cloud' is really a cluster. brbrbr brbrbr-- Jack Krupansky brbrbr brbrbr-Original Message- brbrbrFrom: Darren Govoni brbrbrSent: Thursday, January 03, 2013 8:16 AM brbrbrTo: solr-user@lucene.apache.org brbrbrSubject: RE: Re: Terminology question: Core vs. Collection vs... brbrbr brbrbrGood write up. brbrbr brbrbrAnd what about node? brbrbr brbrbrI think there needs to be an official glossary of terms that is brbrsanctioned brbrbrby the solr team and some terms still ni use may need to be labeled brbrbrdeprecated. After so many years, its still confusing. brbrbr brbrbrbrbrbr--- Original Message --- brbrbrOn 1/3/2013 08:07 AM Jack Krupansky wrote:brCollection is the brmore brbrmodern brbrbrterm and incorporates the fact that the brbrbrbrcollection may be sharded, with each shard on one or more cores, brbrwith brbrbreach brbrbrbrcore being a replica of the other cores within that shard of brthat brbrbrbrcollection. brbrbrbr brbrbrbrInstance is a general term, but is commonly used to refer to a brbrrunning brbrbrSolr brbrbrbrserver, each of which can service any number of cores. A sharded brbrbrcollection brbrbrbrwould typically require multiple instances of Solr, each with a brbrshard of brbrbrthe brbrbrbrcollection. brbrbrbr brbrbrbrMultiple collections can be supported on a single instance of brSolr. brbrThey brbrbrbrdon't have to be sharded or replicated. But if they are, each brSolr brbrbrinstance brbrbrbrwill have a copy or replica of the data (index) of one
RE: Re: Terminology question: Core vs. Collection vs...
Great point. brbrbr--- Original Message --- On 1/3/2013 10:42 AM Per Steffensen wrote:brOn 1/3/13 4:33 PM, Mark Miller wrote: br This has pretty much become the standard across other distributed systems and in the literat…err…books. brHmmm Im not sure you are right about that. Maybe more than one brdistributed system calls them Replica, but there is also a lot that brdoesnt. But if you are right, thats at least a good valid argument to do brit this way, even though I generally prefer good logical naming over brfollowing bad naming from the industry :-) Just because there is a lot brof crap out there, doesnt mean that we also want to make crap. Maybe brgood logical naming could even be a small entry in the Why Solr is brbetter than its competitors list :-) br
RE: Re: Terminology question: Core vs. Collection vs...
And based on the previous explanation there is never a copy of a shard. A shard represents and contains only replicas for itself, replicas being copies of cores within the shard. brbrbr--- Original Message --- On 1/3/2013 11:58 AM Walter Underwood wrote:brA factor is multiplied, so multiplying the leader by a replicationFactor of 1 means you have exactly one copy of that shard. br brI think that recycling the term replication within Solr was confusing, but it is a bit late to change that. br brwunder br brOn Jan 3, 2013, at 7:33 AM, Mark Miller wrote: br br This has pretty much become the standard across other distributed systems and in the literat…err…books. br br I first implemented it as you mention you'd like, but Yonik correctly pointed out that we were going against the grain. br br - Mark br br On Jan 3, 2013, at 10:01 AM, Per Steffensen st...@designware.dk wrote: br br For the same reasons that Replica shouldnt be called Replica (it requires to long an explanation to agree that it is an ok name), replicationFactor shouldnt be called replicationFactor and long as it referes to the TOTAL number of cores you get for your Shard. replicationFactor would be an ok name if replicationFactor=0 meant one core, replicationFactor=1 meant two cores etc., but as long as replicationFactor=1 means one core, replicationFactor=2 means two cores, it is bad naming (you will not get any replication with replicationFactor=1 - WTF!?!?). If we want to insist that you specify the total number of cores at least use replicaPerShard instead of replicationFactor, or even better rename Replica to Shard-instance and use instancesPerShard instead of replicationFactor. br br Regards, Per Steffensen br br On 1/3/13 3:52 PM, Per Steffensen wrote: br Hi br br Here is my version - do not believe the explanations have been very clear br br We have the following concepts (here I will try to explain what each the concept cover without naming it - its hard) br 1) Machines (virtual or physical) running Solr server JVMs (one machine can run several Solr server JVMs if you like) br 2) Solr server JVMs br 3) Logical stores where you can add/update/delete data-instances (closest to logical tables in RDBMS) br 4) Logical slices of a store (closest to non-overlapping logical sets of rows for the logical table in a RDBMS) br 5) Physical instances of slices (a physical (disk/memory) instance of the a logical slice). This is where data actually goes on disk - the logical stores and slices above are just non-physical concepts br br Terminology br 1) Believe we have no name for this (except of course machine :-) ), even though Jack claims that this is called a node. Maybe sometimes it is called a node, but I believe node is more often used to refer to a Solr server JVM. br 2) Node br 3) Collection br 4) Shard. Used to be called Slice but I believe now it is officially called Shard. I agree with that change, because I believe most of the industry also uses the term Shard for this logical/non-physical concept - just needs to be reflected it across documentation and code br 5) Replica. Used to be called Shard but I believe now it is officially called Replica. I certainly do not agree with the name Replica, because it suggests that it is a copy of an original, but it isnt. I would prefer Shard-instance here, to avoid the confusion. I understand that you can argue (if you argue long enough) that Replica is a fine name, but you really need the explanation to understand why Replica can be defended as the name for this. Is is not immediately obvious what this is as long as it is called Replica. A Replica is basically a Solr Cloud managed Core and behind every Replica/Core lives a physical Lucene index. So Replica=Core) contains/maintains Lucene index behind the scenes. The term Replica also needs to be reflected across documentation and code. br br Regards, Per Steffensen br br br br-- brWalter Underwood brwun...@wunderwood.org br br br br