Thank Zhen and Greg for your replies! It sounds a feasible solution to have a local helix agent sit between local c++ service and helix controller.
I want to point out that, we have proxies in front of the storage pool to route user requests to different partitions. When a master shard is down, proxy failover to a slave (and promote a slave to master). Based on Zhen's proposal, it seems helix agent should also push the state transitions to proxy to change its routing accordingly. I have a few more questions: 1. What's the current failure detection used by helix? Who reports errors to zookeeper? Waiting for zookeeper timeout is not optimal for us because we need fast fail over. 2. Can a participant proactively push updates to zookeeper once failure is detected, so controller can react fast. In our use case, proxy usually is the first to detect errors because a request has short timeout. 3. are there any side effects if our c++ service peek on the zookeeper data used by helix? Any comments are highly appreciated. -Neutron On Fri, Mar 25, 2016 at 10:42 AM, Zhen Zhang <[email protected]> wrote: > Hi Neutron, follow up on the HelixAgent idea. I think we can define two > protocols: > > 1) PING > You c/c++ service will implement an endpoint which accepts PING and returns > OK. This is basically used for monitoring the liveness of your c/c++ > service. > > 2) STATE_TRANSITION > You c/c++ service will implement another endpoint which accepts > STATE_TRANSITION(resource, partition, from_state, end_state), executes the > actual state transition in your service, and returns OK or ERROR depending > on the execution results. Note that state transitions are parallel, so you > endpoint should be async. Actual parallelism can be configured. > > For the actual implementation, you can use tcp endpoint (eg. > tcp://localhost:123456), or a restful endpoint (eg. http://localhost/ping > for ping, and > http://localhost/state_transition/?resource=..&partition=..&from_state=..&to_state=.. > for state transition) > > On the other hand, a java base HelixAgent will join the cluster as a > participant. It keeps monitoring your service using the PING endpoint, and > proxy the state transitions using the STATE_TRANSITION endpoint. Both > endpoints can be configured on HelixAgent and we can provide a couple of > default implementations like tcp or http. > > This idea should be straightforward to implement. Let us know if this works > for you. > > Thanks, > Jason > > > > On Mon, Mar 21, 2016 at 8:22 PM, Greg Brandt <[email protected]> wrote: >> >> Hey Neutron, have you considered Helix Agent? >> http://helix.apache.org/0.7.1-docs/tutorial_agent.html >> >> -Greg >> >> On Mon, Mar 21, 2016 at 4:10 PM, Neutron sharc <[email protected]> >> wrote: >>> >>> Hi Helix team, >>> >>> Our distributed storage system consists of a storage pool partitioned >>> into many shards. Each shard has a master and several slave replicas >>> and we run replication to keep them synchronized. We need a way to do >>> automatic failover, and resource rebalance. It seems helix can meet >>> our needs. However our system is written in pure c/c++. Can you guys >>> provide c++ api for participant and spectator so we can integrate >>> helix into our system? Thanks a lot! >>> >>> >>> -neutron >> >> >
