[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711465#comment-14711465 ] Timothee Maret commented on SLING-2939: --- As of today, an embedded implementation using ZK would be still challenging as the zk code still invoke System.exit statements (planned to be solved in ZOOKEEPER-575) and AFAIK requires Netty dependency to be embedded. A dedicated ZK deployment may make sense though, especially considering large deployment involving many 3rd party libraries. I have opened SLING-2939 to track the implementation based on ZK (non embedded). 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487521#comment-14487521 ] Stefan Egli commented on SLING-2939: Adding another discovery variant which makes use of oak/documentMk/mongoMk's lease mechanism instead of relying on own heartbeats: SLING-4603 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the transport protocol must be TCP (or HTTP for that matter) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366785#comment-14366785 ] Philipp Suter commented on SLING-2939: -- [~marett] and [~egli]: It could be worth to investigate https://github.com/kuujo/copycat besides etcd to understand if it could be used to embed a Java based clustering solution that implements Raft. It has some interesting features like active and passive members in a cluster. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the transport protocol must be TCP (or HTTP for that matter) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360204#comment-14360204 ] Stefan Egli commented on SLING-2939: [~ery], thx for pointing out! 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the transport protocol must be TCP (or HTTP for that matter) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360401#comment-14360401 ] Jordan Zimmerman commented on SLING-2939: - ZooKeeper can be embedded. I believe Hadoop did this originally. I know others have done it. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the transport protocol must be TCP (or HTTP for that matter) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358654#comment-14358654 ] Evgeny Rachinskiy commented on SLING-2939: -- Jgroups is licensed under APL starting from version 3.4 http://www.jgroups.org/license.html 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the transport protocol must be TCP (or HTTP for that matter) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303606#comment-14303606 ] Timothee Maret commented on SLING-2939: --- Adding another alternative to consider Etcd Distributed configuration based on a distributed log * First stable release Jan 31 2015 * Open source https://github.com/coreos/etcd * Apache 2.0 license * Based on Raft https://raftconsensus.github.io/ * written in Go, java binding for some core features available * dedicated deployments only * one cluster deployment possbile (AFAIK, ) * security using certificates (client authentication based on local CA) * HTTP API (over a semantically versioned API) * offers primitives to implement leader election * is designed for storing Required features liveliness detection - yes. topology - yes cluster leader election - yes, based on raft or higher level protocol property propagation - yes, that's the primary use case 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725595#comment-13725595 ] kishore gopalakrishna commented on SLING-2939: -- Hi Stefan, Sorry missed this update. All the requirements you mentioned can be satisfied through Helix. * liveliness detection: Helix provides an api to get the LiveInstances and its configuration. * managing topology structure: Idealstate of the cluster can be set manually/dynamically and it reflects the topology of the cluster. * cluster leader election: leader standby state model allows you to achieve this. * property propagation: take a look at service discovery recipe in Helix. http://helix.incubator.apache.org/recipes/service_discovery.html Also eventually we will have Helix backed by in memory systems like hazelcast,infinispan that will remove the dependency on zookeeper( this is ok for some use cases like yours) and also allow the cluster to span multiple data centers. I think for your usecase both curator and Helix will do the job. Let me know if you need additional information. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703033#comment-13703033 ] Stefan Egli commented on SLING-2939: Hi [~k4j], Sure, would be great to get more info on Helix! In comparison to Curator I considered Curator a better fit than Helix re the requirements. Basically the requirements are: * liveliness detection: be able to detect if an instance is up or not * manage a topology structure consisting of clusters and instances * cluster leader election (stable leader, ie the leader stays until it crashes/shuts down) * property propagation: each instance can announce properties to the topology, so every other instance can read those properties Cheers, Stefan 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702523#comment-13702523 ] kishore gopalakrishna commented on SLING-2939: -- Hi, I am Kishore and work on Helix. Looks like Helix was ruled out because its focussed on resource management. Would like to point that its not entirely true. I would like to provide more info but i got lost trying to gather the requirements from the JIRA. Its not clear if the requirement is only discovery and/or leader election. I also saw in between the mention of being rack aware etc. Would be glad to provide more info. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700549#comment-13700549 ] Stefan Egli commented on SLING-2939: Reaching sort of a blocker for using ZooKeeper-embedded: zookeeper 3.5 - which will contain dynamic reconfiguration, a requirement for embedding it in sling imo - is not released yet. Which renders using it in embedded mode non feasible. Using it in client mode only - thus requiring a dedicated zookeeper cluster to be installed - is the less preferred option though... -- [0] https://issues.apache.org/jira/browse/ZOOKEEPER/fixforversion/12316644 [1] http://de.slideshare.net/Hadoop_Summit/dynamic-reconfiguration-of-zookeeper [2] http://mail-archives.apache.org/mod_mbox/zookeeper-user/201306.mbox/%3C716FCA9F-B5DC-40BA-A650-9BC4A4BA3526%40yahoo.com%3E 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700555#comment-13700555 ] Stefan Egli commented on SLING-2939: Which brings us back to the following pros/cons overview: * zookeeper/curator: + apache project + fulfills requirements - except: it cannot be used in embedded mode. requires dedicated zookeeper-cluster * jgroups: + fulfills most requirements (the properties handling requires some implementation) - LGPL license atm * hazelcast: + fulfills requirements (including properties handling) - 'Community Edition' - but APL 2.0 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end.
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700950#comment-13700950 ] Stefan Egli commented on SLING-2939: re hazelcast: with the concept of 'normal', 'lite' and 'client', hazelcast allows a similar setup as zookeeper-mixed-mode: you can have a central sling-cluster running hazelcast in 'normal' mode and the satellite clusters/instances could be set to use the hazelcast client and connect to one of the central-sling-cluster instances. all in TCP. the client would automatically connect to another instance (of the central cluster) if one of them fails. if the entire center cluster fails though, then the topology would fall apart. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699411#comment-13699411 ] Robert Munteanu commented on SLING-2939: [~egli] - JGroups can function well with UDP or TCP. I actually switched to TCP at some point in my cluster for more reliability, but I guess things have changed in the last 4 years. As for dedicated servers vs embedded, since JGroups is easily embeddable I solved this problem by running a mini-app which just embedded JGroups and (IIRC) was designated to be the JGroups coordinator. This application being always available and under low load, it was a perfect fit for a coordinator. The details are a bit unclear for me right now, but I can dig them up if needed. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697563#comment-13697563 ] Stefan Egli commented on SLING-2939: [~olli] Agreed, should be a good example of a Hazelcast implementation. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the transport protocol must be TCP (or HTTP for that matter) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697567#comment-13697567 ] Stefan Egli commented on SLING-2939: [~ianeboston] great points. I agree that JGroups probably requires every member of the cluster to agree while ZooKeeper typically has a rather small set of servers (3, 5, 7) which do the consensus. Thus having an advantage there. Another aspect I think is the fact that JGroups is a messaging channel and is excellent at that. What the discovery.api needs though is not messaging, it only needs leader consensus and liveliness detection (and a bit of property propagation). But I agree that in a reasonable sized cluster setup JGroups would be a lean fit. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697729#comment-13697729 ] Stefan Egli commented on SLING-2939: An alternative approach could be a configurable mix of zookeeper embedded and zookeeper clients: given a deployment of a Sling topology that has a natural 'central cluster', those instances from that central cluster could have embedded zookeeper (with the requirement of being ideally 2n+1), and the remaining Sling instances are configured to be zookeeper clients. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the transport protocol
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696787#comment-13696787 ] Stefan Egli commented on SLING-2939: h3. Comparison of candidates || Category || Name || Short Description || Pros \\ || Cons \\ || Conclusion \\ || | Messaging | [\*-JMS |http://en.wikipedia.org/wiki/Java_Message_Service]\\ | java message service. \\ could be used to communicate state between instances and elect leader etc \\ | * well established API * HA JMS exists | * no ootb support for liveliness detection * no ootb support for leader election | Not suited at all \\ | | | [JGroups|http://www.jgroups.org/]\\ | JGroups supports the concept of membership with which you'd get leader election for free \\ | * ootb support for liveliness detection * ootb support for leader election * No central services, embedded by default | * no ootb support for property storage/propagation | Usable but would require property propagation to be implemented/added \\ | | | \*-messaging | | | | There are of course many other messaging based solutions (eg [Apache Kafka|http://kafka.apache.org/]) that could be candidates for a 3rd party solution. But in the end the discovery.api is not about messaging and thus a messaging solution is not the ideal fit probably. \\ | | Distributed Caches/In-memory grids \\ | [Hazelcast Community Edition|http://www.hazelcast.com/products-community.jsp]\\ | Supports distributed implementations for various java.util.concurrent objects \\ | * ootb support for liveliness detection * ootb support for property storage/propagation * No central services, embedded by default | * no ootb support for leader election. has to be coded based on locks * 'community edition' * embedded mode only | IMHO geared towards UDP/Multicast, but TCP also possible. \\ Large, cross data-center deployments of Hazelcast via TCP is IMHO similar to [ZooKeeper]\-embedded and less ideal, although possible to deploy. \\ | | | \*-memory grid \\ | | | | There are other memory grid-like solutions out there (eg [Terracotta|http://terracotta.org/], [Coherence|http://www.oracle.com/technetwork/middleware/coherence/overview/index.html]) that could be candidates. And the delineation between such solutions and distributed coordination based solution is less clear than vs messaging. What can be said though that in the end, any such solution's main strength is distributed memory - and that's not what discovery.api requires exactly. \\ | | Distributed Coordination \\ | [Apache ZooKeeper|http://zookeeper.apache.org/]\\ | [ZooKeeper] is a centralized yet distributed/highly available configuration service which supports various distributed coordination patterns through _recipes_ \\ | * ootb support for liveliness detection * ootb support for property storage/propagation * supports both dedicated and embedded modes | * no ootb support for leader election. has to be coded based on ephemeral nodes | While [ZooKeeper] fulfills the requirements, it is known for being tricky to deal with (exception handling, session-reconnects, leader election needs to be coded according to provided recipes). \\ Can be deployed embedded or dedicated. \\ | | | [Apache Curator|http://curator.incubator.apache.org/]\\ | Built on top of [ZooKeeper], Curator provides default implementation of the recipes documented in [ZooKeeper]. \\ | * ootb support for liveliness detection * ootb support for property storage/propagation * ootb support for leader election * supports both dedicated and embedded modes | | Matches all requirements. \\ Can be deployed embedded or dedicated. \\ | | Cluster Management \\ | [Apache Helix|http://helix.incubator.apache.org/]\\ | Built on top of [ZooKeeper], Helix is a cluster management framework for handling distributed resources \\ | * similar to Apache Curator | * similar to Apache Curator | Compared to Apache Curator, Helix has a different focus (management of resources in a cluster) while the discovery.api does not require any of Helix' additional features. In this direct comparison Apache Curator is clearly the better fit. \\ | * A note on embedding [ZooKeeper]: [ZooKeeper] and its father Google Chubby were designed to be a centralized, dedicated, clustered service providing configuration and coordination to a large set of nodes. Typical deployments of [ZooKeeper] are 3, 5 or 7 replicated instances. While you +can+ embed [ZooKeeper] in an application (possible thanks to [ZOOKEEPER-107|https://issues.apache.org/jira/browse/ZOOKEEPER-107]) this mode is IMHO less suited for large, distributed setups like in CQ/Granite case. Each Granite instance would contain an embedded [ZooKeeper] and talk to a set of other Granite instances through the [ZooKeeper] protocols. Especially when deployed accross data centers, the embedded mode is not optimal. * Another note on embedding [ZooKeeper]:
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696829#comment-13696829 ] Stefan Egli commented on SLING-2939: Earlier feedback from a private discussion: -- Feedback from [~fmeschbe]: I am all for having a scalable implementation out-of-the-box. And, honestly, if the repository is not able to be the basis for such an implement, so be it. So, yes, improve the implementation, use proven technology – refactoring the existing implementation if that helps. -- Feedback from [~ianeboston]: JGroups is embedded by default and adds cluster awareness to whatever its embedded into without the need to additional co-ordinating servers. By comparison to ZooKeeper, that normally expects dedicated servers, JGroups is very low impact. JGroups also supports site, rack, machine and is used to provide topology information in a number of apps (Infinispan1, JBoss Application Server2), without (IIUC) the need to deploy redundant replicated servers. IIUC Inifinspan DataGrid scales reasonably well over multiple racks, sites. The only downside of the JBoss originated components is you have to read the license carefully as the higher level components are often a variant of GPL licensed. JGroups is LGPL 2.1 Licensed, which may be an issue for Adobe. On the other-hand, Zookeeper and the layers that sit ontop of it provide a richer pre-built cluster and topology management environment, at the expense of deployment complexity and dedicated server resources. I don't believe using Zookeeper as an embedded component is viable at scale. Zookeeper is Apache licensed, so no issue. Perhaps seeing how these two subsystems behave in reality on a 100 node cluster of tiny EC2 instances would be a good way of gathering evidence to base a decision on ? It might also be worth having a quick look to see how ElasticSearch manages topology. It is site,rack,machine aware and also AWS/EC2 aware. The topology management is embedded and ES has been used in large clusters, eg 3 1 https://docs.jboss.org/author/display/ISPN/Getting+Started+Guide+-+Clustered+Cache+in+Java+SE?_sscc=t 2 https://issues.jboss.org/browse/AS7-3023 3 http://architects.dzone.com/articles/our-experience-creating-large -- Feedback from [~rombert]: +1 for JGroups. I've worked with it previously and it's small, embeddable and does the job. My cluster was about 20 machines, but reportedly the primary author has sighted a JGroups cluster of 536 machines . As for the licensing, JGroups is investigating moving to APL 2.0, but that move was not yet finalized. 1 http://belaban.blogspot.ro/2011/04/largest-jgroups-cluster-ever-536-nodes.html 2 http://belaban.blogspot.ro/2013/05/jgroups-to-investigate-adopting-apache.html 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696867#comment-13696867 ] Stefan Egli commented on SLING-2939: [~olli]: thanks for the heads-up. I believe Karaf is not such a good fit though, as it is itself an osgi container (although it would support discovery aspects). 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the transport protocol must be TCP (or HTTP for that matter) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696876#comment-13696876 ] Stefan Egli commented on SLING-2939: [~ianeboston], [~rombert]: regarding JGroups: I think JGroups is quite a good fit, except for two aspects: * large installations typically would be point-to-point rather than udp (the 536 machine cluster for example used udp-multicast). I believe that we would like to support Sling deployments across data-centers and use discovery between those data-centers for certain admin operations. My concern here is how feasible is udp cross data-centers. * I think the decision can be broken down to two deployment models: embedded or dedicated servers. With embedding you have the advantage of no additional services required but ideally use multicast (thus running into above concern). With a dedicated service there is the downside of such an additional component, but the scalability of the point-to-point setup, also cross data-center, seems better. (Scalability not in terms of pure performance - there multicast is best - but in terms of ease of configuration/setup). 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697252#comment-13697252 ] Ian Boston commented on SLING-2939: --- [~egli] JGroups supports UDP multicast, UDP, TCP and Tunnelling through firewalls. Configuration documentation and diagrams are at [1]. However once you are on more than one subnet you can't UDP multicast outside that subnet without router support, which is the reason for the other protocols. Using the other protocols to hop between subnets requires that the configuration of the subnets is known. In this respect JGroups is no different from Zookeeper, in that both require some level of deployment configuration and complexity to make them work. I think the main difference is that JGroups expects to be embedded on every instance in the cluster and to self configure leaders reusing every member of the cluster to achieve resilience, whereas Zookeeper (like Chubby) requires centralised Zookeeper servers that are made resilient through replicas. Probably, Zookeeper is better suited to very large clusters (eg 2500 nodes upwards), and JGroups is better suited to smaller clusters, although I have no evidence to back that up, and I am sure both communities would disagree. The reason I mentioned ElasticSearch is that I know it supports very large clusters and when you deploy it in AWS, you tell it you are running in AWS. I havent looked into the detail of precisely what it does, but I have talked to people who run it over multiple AWS sites, multitennanted, successfully. Which makes it worth looking at. Without hard evidence, it might be better to provide a JGroups bundle and a Zookeeper bundle and find out what the real issues are with them. IIRC the JGroups code required to do this is minimal, as I did a something simular for Sling last year. Not certain how much effort is required for Zookeeper. 1 see http://www.jgroups.org/manual/html/user-advanced.html#d0e2251 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a
[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api
[ https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697526#comment-13697526 ] Oliver Lietz commented on SLING-2939: - [~egli] Sure it's an OSGi container, therefore it's worth to look at (not use) their implementation when going for Hazelcast, no? Using Karaf as a replacement for Launchpad is a whole different story. 3rd-party based implementation of discovery.api --- Key: SLING-2939 URL: https://issues.apache.org/jira/browse/SLING-2939 Project: Sling Issue Type: Task Components: Extensions Affects Versions: Discovery API 1.0.0 Reporter: Stefan Egli Assignee: Stefan Egli The Sling Discovery API introduces the abstraction of a topology which contains (Sling) clusters and instances, supports liveliness-detection, leader-election within a cluster and property-propagation between the instances. As a default and reference implementation a resource-based, OOTB implementation was created (org.apache.sling.discovery.impl). Pros and cons of the discovery.impl Although the discovery.impl supports everything required in discovery.api, it has a few limitations. Here's a list of pros and cons: Pros No additional software required (leverages repository for intra-cluster communication/storage and HTTP-REST calls for cross-cluster communication) Very small footprint Perfectly suited for a single clusters, instance and for small, rather stable hub-based topologies Cons Config-/deployment-limitations (aka embedded-limitation): connections between clusters are peer-to-peer and explicit. To span a topology, a number of instances must (be made) know (to) each other, changes in the topology typically requires config adjustments to guarantee high availability of the discovery service Except if a natural hub cluster exists that can serve as connection point for all satellite clusters Other than that, it is less suited for large and/or dynamic topologies Change propagation (for topology parts reported via connectors) is non-atomic and slow, hop-by-hop based No guarantee on order of TopologyEvents sent in individual instances - ie different instances might see different orders of TopologyEvents (ie changes in the topology) but eventually the topology is guaranteed to be consistent Robustness of discovery.impl wrt storm situations depends on robustness of underlying cluster (not a real negative but discovery.impl might in theory unveil repository bugs which would otherwise not have been a problem) Rather new, little tested code which might have issues with edge cases wrt network problems although partitioning-support is not a requirement per se, similar edge-cases might exist wrt network-delays/timing/crashes Reusing a suitable 3rd party library To provide an additional option as implementation of the discovery.api one idea is to use a suitable 3rd party library. Requirements The following is a list of requirements a 3rd party library must support: liveliness detection: detect whether an instance is up and running stable leader election within a cluster: stable describes the fact that a leader will remain leader until it leaves/crashes and no new, joining instance shall take over while a leader exists stable instance ordering: the list of instances within a cluster is ordered and stable, new, joining instances are put at the end of the list property propagation: propagate the properties provided within one instance to everybody in the topology. there are no timing requirements bound to this but the intention of this is not to be used as messaging but to announce config parameters to the topology support large, dynamic clusters: configuration of the new discovery implementation should be easy and support frequent changes in the (large) topology no single point of failure: this is obvious, there should of course be no single point of failure in the setup embedded or dedicated: this might be a hot topic: embedding a library has the advantages of not having to install anything additional. a dedicated service on the other hand requires additional handling in deployment. embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather than via a centralized service. this IMHO is a negative for large topologies which would typically be cross data-centers. hence a dedicated service could be seen as an advantage in the end. due to need for cross data-center deployments, the transport protocol must be TCP (or HTTP for that matter) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA