[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2015-08-25 Thread Timothee Maret (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711465#comment-14711465
 ] 

Timothee Maret commented on SLING-2939:
---

As of today, an embedded implementation using ZK would be still challenging as 
the zk code still invoke System.exit statements (planned to be solved in 
ZOOKEEPER-575) and AFAIK requires Netty dependency to be embedded.
A dedicated ZK deployment may make sense though, especially considering large 
deployment involving many 3rd party libraries.
I have opened SLING-2939 to track the implementation based on ZK (non embedded).

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2015-04-09 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14487521#comment-14487521
 ] 

Stefan Egli commented on SLING-2939:


Adding another discovery variant which makes use of oak/documentMk/mongoMk's 
lease mechanism instead of relying on own heartbeats: SLING-4603

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the transport protocol 
 must be TCP (or HTTP for that matter)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2015-03-18 Thread Philipp Suter (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366785#comment-14366785
 ] 

Philipp Suter commented on SLING-2939:
--

[~marett] and [~egli]: It could be worth to investigate 
https://github.com/kuujo/copycat besides etcd to understand if it could be used 
to embed a Java based clustering solution that implements Raft. It has some 
interesting features like active and passive members in a cluster.

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the transport protocol 
 must be TCP (or HTTP for that matter)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2015-03-13 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360204#comment-14360204
 ] 

Stefan Egli commented on SLING-2939:


[~ery], thx for pointing out!

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the transport protocol 
 must be TCP (or HTTP for that matter)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2015-03-13 Thread Jordan Zimmerman (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360401#comment-14360401
 ] 

Jordan Zimmerman commented on SLING-2939:
-

ZooKeeper can be embedded. I believe Hadoop did this originally. I know others 
have done it.

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the transport protocol 
 must be TCP (or HTTP for that matter)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2015-03-12 Thread Evgeny Rachinskiy (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358654#comment-14358654
 ] 

Evgeny Rachinskiy commented on SLING-2939:
--

Jgroups is licensed under APL starting from version 3.4  
http://www.jgroups.org/license.html

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the transport protocol 
 must be TCP (or HTTP for that matter)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2015-02-03 Thread Timothee Maret (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14303606#comment-14303606
 ] 

Timothee Maret commented on SLING-2939:
---

Adding another alternative to consider

Etcd

Distributed configuration based on a distributed log

* First stable release Jan 31 2015
* Open source https://github.com/coreos/etcd
* Apache 2.0 license
* Based on Raft https://raftconsensus.github.io/
* written in Go, java binding for some core features available
* dedicated deployments only
* one cluster deployment possbile (AFAIK, )
* security using certificates (client authentication based on local CA)
* HTTP API (over a semantically versioned API)
* offers primitives to implement leader election
* is designed for storing 

Required features

liveliness detection - yes.
topology - yes
cluster leader election - yes, based on raft or higher level protocol
property propagation - yes, that's the primary use case

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-31 Thread kishore gopalakrishna (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725595#comment-13725595
 ] 

kishore gopalakrishna commented on SLING-2939:
--

Hi Stefan,

Sorry missed this update.

All the requirements you mentioned can be satisfied through Helix.

* liveliness detection: Helix provides an api to get the LiveInstances and its 
configuration.
* managing topology structure: Idealstate of the cluster can be set 
manually/dynamically and it reflects the topology of the cluster.
* cluster leader election: leader standby state model allows you to achieve 
this.
* property propagation: take a look at service discovery recipe in Helix. 
http://helix.incubator.apache.org/recipes/service_discovery.html

Also eventually we will have Helix backed by in memory systems like 
hazelcast,infinispan that will remove the dependency on zookeeper( this is ok 
for some use cases like yours) and also allow the cluster to span multiple data 
centers.

I think for your usecase both curator and Helix will do the job. Let me know if 
you need additional information. 


 






 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-09 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13703033#comment-13703033
 ] 

Stefan Egli commented on SLING-2939:


Hi [~k4j],

Sure, would be great to get more info on Helix! In comparison to Curator I 
considered Curator a better fit than Helix re the requirements.

Basically the requirements are:
 * liveliness detection: be able to detect if an instance is up or not
 * manage a topology structure consisting of clusters and instances
 * cluster leader election (stable leader, ie the leader stays until it 
crashes/shuts down)
 * property propagation: each instance can announce properties to the topology, 
so every other instance can read those properties

Cheers,
Stefan

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-08 Thread kishore gopalakrishna (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702523#comment-13702523
 ] 

kishore gopalakrishna commented on SLING-2939:
--

Hi,

I am Kishore and work on Helix. Looks like Helix was ruled out because its 
focussed on resource management. Would like to point that its not entirely 
true. I would like to provide more info but i got lost trying to gather the 
requirements from the JIRA. Its not clear if the requirement is only discovery 
and/or leader election. I also saw in between the mention of being rack aware 
etc. 

Would be glad to provide more info.



 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700549#comment-13700549
 ] 

Stefan Egli commented on SLING-2939:


Reaching sort of a blocker for using ZooKeeper-embedded: zookeeper 3.5 - which 
will contain dynamic reconfiguration, a requirement for embedding it in sling 
imo - is not released yet.

Which renders using it in embedded mode non feasible.
Using it in client mode only - thus requiring a dedicated zookeeper cluster to 
be installed - is the less preferred option though...
--
[0] https://issues.apache.org/jira/browse/ZOOKEEPER/fixforversion/12316644
[1] http://de.slideshare.net/Hadoop_Summit/dynamic-reconfiguration-of-zookeeper
[2] 
http://mail-archives.apache.org/mod_mbox/zookeeper-user/201306.mbox/%3C716FCA9F-B5DC-40BA-A650-9BC4A4BA3526%40yahoo.com%3E

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700555#comment-13700555
 ] 

Stefan Egli commented on SLING-2939:


Which brings us back to the following pros/cons overview:

 * zookeeper/curator:
  + apache project
  + fulfills requirements
  - except: it cannot be used in embedded mode. requires dedicated 
zookeeper-cluster

 * jgroups:
  + fulfills most requirements (the properties handling requires some 
implementation)
  - LGPL license atm

 * hazelcast:
  + fulfills requirements (including properties handling)
  - 'Community Edition' - but APL 2.0

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
   

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-05 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13700950#comment-13700950
 ] 

Stefan Egli commented on SLING-2939:


re hazelcast: with the concept of 'normal', 'lite' and 'client', hazelcast 
allows a similar setup as zookeeper-mixed-mode: you can have a central 
sling-cluster running hazelcast in 'normal' mode and the satellite 
clusters/instances could be set to use the hazelcast client and connect to one 
of the central-sling-cluster instances. all in TCP. the client would 
automatically connect to another instance (of the central cluster) if one of 
them fails. if the entire center cluster fails though, then the topology would 
fall apart.

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-03 Thread Robert Munteanu (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13699411#comment-13699411
 ] 

Robert Munteanu commented on SLING-2939:


[~egli] - JGroups can function well with UDP or TCP. I actually switched to TCP 
at some point in my cluster for more reliability, but I guess things have 
changed in the last 4 years.

As for dedicated servers vs embedded, since JGroups is easily embeddable I 
solved this problem by running a mini-app which just embedded JGroups and 
(IIRC) was designated to be the JGroups coordinator. This application being 
always available and under low load, it was a perfect fit for a coordinator. 
The details are a bit unclear for me right now, but I can dig them up if needed.

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-02 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697563#comment-13697563
 ] 

Stefan Egli commented on SLING-2939:


[~olli]
Agreed, should be a good example of a Hazelcast implementation.

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the transport protocol 
 must be TCP (or HTTP for that matter)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-02 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697567#comment-13697567
 ] 

Stefan Egli commented on SLING-2939:


[~ianeboston] great points. I agree that JGroups probably requires every member 
of the cluster to agree while ZooKeeper typically has a rather small set of 
servers (3, 5, 7) which do the consensus. Thus having an advantage there. 
Another aspect I think is the fact that JGroups is a messaging channel and is 
excellent at that. What the discovery.api needs though is not messaging, it 
only needs leader consensus and liveliness detection (and a bit of property 
propagation). But I agree that in a reasonable sized cluster setup JGroups 
would be a lean fit.

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-02 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697729#comment-13697729
 ] 

Stefan Egli commented on SLING-2939:


An alternative approach could be a configurable mix of zookeeper embedded and 
zookeeper clients: given a deployment of a Sling topology that has a natural 
'central cluster', those instances from that central cluster could have 
embedded zookeeper (with the requirement of being ideally 2n+1), and the 
remaining Sling instances are configured to be zookeeper clients. 

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the transport protocol 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-01 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696787#comment-13696787
 ] 

Stefan Egli commented on SLING-2939:


h3. Comparison of candidates

|| Category || Name || Short Description || Pros \\ || Cons \\ || Conclusion \\ 
||
| Messaging | [\*-JMS |http://en.wikipedia.org/wiki/Java_Message_Service]\\ | 
java message service. \\
could be used to communicate state between instances and elect leader etc \\ | 
* well established API
* HA JMS exists | * no ootb support for liveliness detection
* no ootb support for leader election | Not suited at all \\ |
| | [JGroups|http://www.jgroups.org/]\\ | JGroups supports the concept of 
membership with which you'd get leader election for free \\ | * ootb support 
for liveliness detection
* ootb support for leader election
* No central services, embedded by default | * no ootb support for property 
storage/propagation | Usable but would require property propagation to be 
implemented/added \\ |
| | \*-messaging | | | | There are of course many other messaging based 
solutions (eg [Apache Kafka|http://kafka.apache.org/]) that could be candidates 
for a 3rd party solution. But in the end the discovery.api is not about 
messaging and thus a messaging solution is not the ideal fit probably. \\ |
| Distributed Caches/In-memory grids \\ | [Hazelcast Community 
Edition|http://www.hazelcast.com/products-community.jsp]\\ | Supports 
distributed implementations for various java.util.concurrent objects \\ | * 
ootb support for liveliness detection
* ootb support for property storage/propagation
* No central services, embedded by default | * no ootb support for leader 
election. has to be coded based on locks
* 'community edition'
* embedded mode only | IMHO geared towards UDP/Multicast, but TCP also 
possible. \\
Large, cross data-center deployments of Hazelcast via TCP is IMHO similar to 
[ZooKeeper]\-embedded and less ideal, although possible to deploy. \\ |
| | \*-memory grid \\ | | | | There are other memory grid-like solutions out 
there (eg [Terracotta|http://terracotta.org/], 
[Coherence|http://www.oracle.com/technetwork/middleware/coherence/overview/index.html])
 that could be candidates. And the delineation between such solutions and 
distributed coordination based solution is less clear than vs messaging. What 
can be said though that in the end, any such solution's main strength is 
distributed memory - and that's not what discovery.api requires exactly. \\ |
| Distributed Coordination \\ | [Apache 
ZooKeeper|http://zookeeper.apache.org/]\\ | [ZooKeeper] is a centralized yet 
distributed/highly available configuration service which supports various 
distributed coordination patterns through _recipes_ \\ | * ootb support for 
liveliness detection
* ootb support for property storage/propagation
* supports both dedicated and embedded modes | * no ootb support for leader 
election. has to be coded based on ephemeral nodes | While [ZooKeeper] fulfills 
the requirements, it is known for being tricky to deal with (exception 
handling, session-reconnects, leader election needs to be coded according to 
provided recipes). \\
Can be deployed embedded or dedicated. \\ |
| | [Apache Curator|http://curator.incubator.apache.org/]\\ | Built on top of 
[ZooKeeper], Curator provides default implementation of the recipes documented 
in [ZooKeeper]. \\ | * ootb support for liveliness detection
* ootb support for property storage/propagation
* ootb support for leader election
* supports both dedicated and embedded modes | | Matches all requirements. \\
Can be deployed embedded or dedicated. \\ |
| Cluster Management \\ | [Apache Helix|http://helix.incubator.apache.org/]\\ | 
Built on top of [ZooKeeper], Helix is a cluster management framework for 
handling distributed resources \\ | * similar to Apache Curator | * similar to 
Apache Curator | Compared to Apache Curator, Helix has a different focus 
(management of resources in a cluster) while the discovery.api does not require 
any of Helix' additional features. In this direct comparison Apache Curator is 
clearly the better fit. \\ |
* A note on embedding [ZooKeeper]: [ZooKeeper] and its father Google Chubby 
were designed to be a centralized, dedicated, clustered service providing 
configuration and coordination to a large set of nodes. Typical deployments of 
[ZooKeeper] are 3, 5 or 7 replicated instances. While you +can+ embed 
[ZooKeeper] in an application (possible thanks to 
[ZOOKEEPER-107|https://issues.apache.org/jira/browse/ZOOKEEPER-107]) this mode 
is IMHO less suited for large, distributed setups like in CQ/Granite case. Each 
Granite instance would contain an embedded [ZooKeeper] and talk to a set of 
other Granite instances through the [ZooKeeper] protocols. Especially when 
deployed accross data centers, the embedded mode is not optimal.
* Another note on embedding [ZooKeeper]: 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-01 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696829#comment-13696829
 ] 

Stefan Egli commented on SLING-2939:


Earlier feedback from a private discussion:
--
Feedback from [~fmeschbe]: I am all for having a scalable implementation 
out-of-the-box. And, honestly, if the repository is not able to be the basis 
for such an implement, so be it. So, yes, improve the implementation, use 
proven technology – refactoring the existing implementation if that helps.
--
Feedback from [~ianeboston]: 

JGroups is embedded by default and adds cluster awareness to whatever its 
embedded into without the need to additional co-ordinating servers. By 
comparison to ZooKeeper, that normally expects dedicated servers, JGroups is 
very low impact. JGroups also supports site, rack, machine and is used to 
provide topology information in a number of apps (Infinispan1, JBoss 
Application Server2), without (IIUC) the need to deploy redundant replicated 
servers. IIUC Inifinspan DataGrid scales reasonably well over multiple racks, 
sites. The only downside of the JBoss originated components is you have to read 
the license carefully as the higher level components are often a variant of GPL 
licensed. JGroups is LGPL 2.1 Licensed, which may be an issue for Adobe.

On the other-hand, Zookeeper and the layers that sit ontop of it provide a 
richer pre-built cluster and topology management environment, at the expense of 
deployment complexity and dedicated server resources. I don't believe using 
Zookeeper as an embedded component is viable at scale. Zookeeper is Apache 
licensed, so no issue.

Perhaps seeing how these two subsystems behave in reality on a 100 node cluster 
of tiny EC2 instances would be a good way of gathering evidence to base a 
decision on ?

It might also be worth having a quick look to see how ElasticSearch manages 
topology. It is site,rack,machine aware and also AWS/EC2 aware. The topology 
management is embedded and ES has been used in large clusters, eg 3

1 
https://docs.jboss.org/author/display/ISPN/Getting+Started+Guide+-+Clustered+Cache+in+Java+SE?_sscc=t
2 https://issues.jboss.org/browse/AS7-3023
3 http://architects.dzone.com/articles/our-experience-creating-large
--
Feedback from [~rombert]: 
+1 for JGroups. I've worked with it previously and it's small, embeddable and 
does the job. My cluster was about 20 machines, but reportedly the primary 
author has sighted a JGroups cluster of 536 machines .

As for the licensing, JGroups is investigating moving to APL 2.0, but that move 
was not yet finalized.

1 http://belaban.blogspot.ro/2011/04/largest-jgroups-cluster-ever-536-nodes.html
2 http://belaban.blogspot.ro/2013/05/jgroups-to-investigate-adopting-apache.html

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-01 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696867#comment-13696867
 ] 

Stefan Egli commented on SLING-2939:


[~olli]: thanks for the heads-up. I believe Karaf is not such a good fit 
though, as it is itself an osgi container (although it would support discovery 
aspects).

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the transport protocol 
 must be TCP (or HTTP for that matter)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-01 Thread Stefan Egli (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696876#comment-13696876
 ] 

Stefan Egli commented on SLING-2939:


[~ianeboston], [~rombert]: regarding JGroups: I think JGroups is quite a good 
fit, except for two aspects:

 * large installations typically would be point-to-point rather than udp (the 
536 machine cluster for example used udp-multicast). I believe that we would 
like to support Sling deployments across data-centers and use discovery between 
those data-centers for certain admin operations. My concern here is how 
feasible is udp cross data-centers.

 * I think the decision can be broken down to two deployment models: embedded 
or dedicated servers. With embedding you have the advantage of no additional 
services required but ideally use multicast (thus running into above concern). 
With a dedicated service there is the downside of such an additional component, 
but the scalability of the point-to-point setup, also cross data-center, seems 
better. (Scalability not in terms of pure performance - there multicast is best 
- but in terms of ease of configuration/setup).

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-01 Thread Ian Boston (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697252#comment-13697252
 ] 

Ian Boston commented on SLING-2939:
---

[~egli]
JGroups supports UDP multicast, UDP, TCP and Tunnelling through firewalls. 
Configuration documentation and diagrams are at [1]. However once you are on 
more than one subnet you can't UDP multicast outside that subnet without router 
support, which is the reason for the other protocols. Using the other protocols 
to hop between subnets requires that the configuration of the subnets is known. 
In this respect JGroups is no different from Zookeeper, in that both require 
some level of deployment configuration and complexity to make them work. 

I think the main difference is that JGroups expects to be embedded on every 
instance in the cluster and to self configure leaders reusing every member of 
the cluster to achieve resilience, whereas Zookeeper (like Chubby) requires 
centralised Zookeeper servers that are made resilient through replicas. 
Probably, Zookeeper is better suited to very large clusters (eg 2500 nodes 
upwards), and JGroups is better suited to smaller clusters, although I have no 
evidence to back that up, and I am sure both communities would disagree.

The reason I mentioned ElasticSearch is that I know it supports very large 
clusters and when you deploy it in AWS, you tell it you are running in AWS. I 
havent looked into the detail of precisely what it does, but I have talked to 
people who run it over multiple AWS sites, multitennanted, successfully. Which 
makes it worth looking at.

Without hard evidence, it might be better to provide a JGroups bundle and a 
Zookeeper bundle and find out what the real issues are with them. IIRC the 
JGroups code required to do this is minimal, as I did a something simular for 
Sling last year. Not certain how much effort is required for Zookeeper.


1  see http://www.jgroups.org/manual/html/user-advanced.html#d0e2251

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a 

[jira] [Commented] (SLING-2939) 3rd-party based implementation of discovery.api

2013-07-01 Thread Oliver Lietz (JIRA)

[ 
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13697526#comment-13697526
 ] 

Oliver Lietz commented on SLING-2939:
-

[~egli]
Sure it's an OSGi container, therefore it's worth to look at (not use) their 
implementation when going for Hazelcast, no?
Using Karaf as a replacement for Launchpad is a whole different story.

 3rd-party based implementation of discovery.api
 ---

 Key: SLING-2939
 URL: https://issues.apache.org/jira/browse/SLING-2939
 Project: Sling
  Issue Type: Task
  Components: Extensions
Affects Versions: Discovery API 1.0.0
Reporter: Stefan Egli
Assignee: Stefan Egli

 The Sling Discovery API introduces the abstraction of a topology which 
 contains (Sling) clusters and instances, supports liveliness-detection, 
 leader-election within a cluster and property-propagation between the 
 instances. As a default and reference implementation a resource-based, OOTB 
 implementation was created (org.apache.sling.discovery.impl).
 Pros and cons of the discovery.impl
 Although the discovery.impl supports everything required in discovery.api, it 
 has a few limitations. Here's a list of pros and cons:
 Pros
 No additional software required (leverages repository for intra-cluster 
 communication/storage and HTTP-REST calls for cross-cluster communication)
 Very small footprint
 Perfectly suited for a single clusters, instance and for small, rather 
 stable hub-based topologies
 Cons
 Config-/deployment-limitations (aka embedded-limitation): connections 
 between clusters are peer-to-peer and explicit. To span a topology, a number 
 of instances must (be made) know (to) each other, changes in the topology 
 typically requires config adjustments to guarantee high availability of the 
 discovery service
 Except if a natural hub cluster exists that can serve as connection 
 point for all satellite clusters
 Other than that, it is less suited for large and/or dynamic topologies
 Change propagation (for topology parts reported via connectors) is 
 non-atomic and slow, hop-by-hop based
 No guarantee on order of TopologyEvents sent in individual instances - ie 
 different instances might see different orders of TopologyEvents (ie changes 
 in the topology) but eventually the topology is guaranteed to be consistent
 Robustness of discovery.impl wrt storm situations depends on robustness 
 of underlying cluster (not a real negative but discovery.impl might in theory 
 unveil repository bugs which would otherwise not have been a problem)
 Rather new, little tested code which might have issues with edge cases 
 wrt network problems
 although partitioning-support is not a requirement per se, similar 
 edge-cases might exist wrt network-delays/timing/crashes
 Reusing a suitable 3rd party library
 To provide an additional option as implementation of the discovery.api one 
 idea is to use a suitable 3rd party library.
 Requirements
 The following is a list of requirements a 3rd party library must support:
 liveliness detection: detect whether an instance is up and running
 stable leader election within a cluster: stable describes the fact that a 
 leader will remain leader until it leaves/crashes and no new, joining 
 instance shall take over while a leader exists
 stable instance ordering: the list of instances within a cluster is 
 ordered and stable, new, joining instances are put at the end of the list
 property propagation: propagate the properties provided within one 
 instance to everybody in the topology. there are no timing requirements bound 
 to this but the intention of this is not to be used as messaging but to 
 announce config parameters to the topology
 support large, dynamic clusters: configuration of the new discovery 
 implementation should be easy and support frequent changes in the (large) 
 topology
 no single point of failure: this is obvious, there should of course be no 
 single point of failure in the setup
 embedded or dedicated: this might be a hot topic: embedding a library has 
 the advantages of not having to install anything additional. a dedicated 
 service on the other hand requires additional handling in deployment. 
 embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather 
 than via a centralized service. this IMHO is a negative for large topologies 
 which would typically be cross data-centers. hence a dedicated service could 
 be seen as an advantage in the end.
 due to need for cross data-center deployments, the transport protocol 
 must be TCP (or HTTP for that matter)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA