Hi,

I am new the ZeroMQ. I am trying to solve such a problem with the following
high level requirements:


Objective: I want to gather real-time data from external data sources and
serve multiple client processes.

***The Data sources***

1. The external data sources provide vendor APIs to interface with them,
but the APIs have a lot of limitations:
     a. The technology used by the APIs might be obsolete
and/or incompatible. For example, one of the sources is DDE on windows and
another is in java, while my main system is in python and on linux.
     b. The APIs might be unstable and crashes all the time.
     c.  The APIs might have restrictions on throughput, call frequency,
and other things. For example, one of the APIs can have only 100 messages
per second.

2. The external data sources push data out through the APIs.

3. It is possible to ask the sources to resend all messages they own, but
it is very expensive and can be done only when absolutely necessary.

***The Data***

1. The data can be represented as messages with keys. New message should
replace the old message with the same key.

***The clients***

1. There are a large number of clients.

2. Clients need to frequently request for snapshots of the data with
complicated filters as well as subscribing to the live updates.


With these requirements, I think an intermediary cache server  in the
middle might be adequate. At the center of the cache server is a key-value
hash table. We can build many small, simple data collection processes that
use the vendor APIs to collect external data and pushes them into the cache
server through REQ/REP. On the client facing side of the cache server, we
can publish data through PUB/SUB and also provide a snapshot service
through REQ/REP. In fact this is very similar to the shared key-value cache
example (
http://zguide.zeromq.org/page:all#A-Shared-Key-Value-Cache-Clone-Pattern)
in the guide.

There is however one problem. If the intermediary cache server crashes and
restarts, I am not sure how I can recover it. There are several problems:

1. All the data collecting processes need to know the cache server has died
and restarted, and get all data it owns from the external data sources
(very expensive operation) and pushes them to the cache server.

2. Before the cache server is fully recovered, it should not accept any
requests from clients.

Can anyone give some hint on how I can design this?

Thanks,
Tom Bennett
_______________________________________________
zeromq-dev mailing list
[email protected]
http://lists.zeromq.org/mailman/listinfo/zeromq-dev

Reply via email to