Dang, thanks for working on this, Pascal. This is one of the use cases I had in mind for J-ZMQ, though it was taking me some time to figure out how to do it in my clumsy way. I had been looking at joebo's fork thing for a little inspiration, but I got busy with a half dozen other things. I will try to play with this later next week. FWIIW, one way to prevent the "miss the first message" problem is to fire up the SUB instances before you send them anything.
For what it is worth; I originally wrote these zeromq hooks for use in a ticker plant for my own P/L. I had a better idea while working on a consulting job recently. "Big data" and "the cloud" is in the news all the time now. Mostly, this is hype, but there are some real business needs involving extracting meaning from data sets which do not fit into core. The existing solutions are generally not real impressive (Hadoop) or not designed for weakly coupled "clouds" (MPI based solutions). Almost none of these "solutions" use ideas which are actually appropriate to big data (Vowpal Wabbit being a rare exception), and I have no inclination to contribute to these tools. J has several fast and flexible data stores (still learning about Jd; very impressed so far). J is also extremely memory efficient, and can do out of core calculations. I have not done any tests to see if ZMQ can get the data across the pipes well enough to do anything useful on AWS and other such weakly coupled "clouds" favored by business, but I think I can write some useful coarse grained parallel ML tools using ZMQ and J. This is perhaps arrogant of me to think about: my J skills are weak, this is my first ZMQ project, and it's rare I have time to think about big projects like this, but I do know how to build fast and scalable machine learning algorithms. I think this could be a way forward which solves important business problems. Stuff I don't know yet: 0) Stability: as Pascal noticed, J and ZMQ are sometimes unstable when used together. I haven't fired up GDB to find out why yet. It's probably a buffer allocation thing. It's possible there is a show stopper here: I don't know. It seems to work with Kx at least: https://github.com/jaeheum/qzmq 1) Fault tolerance: things are going to crash or blow up memory. Maybe Pascal's task manager is enough for now. 2) Data provisioning: I'm guessing I'll need a framework for provisioning each server with "owned" data, using Jd or JDB. I have to look at how other frameworks do this. 3) Software provisioning: J is pretty simple to set up, but if this is going to scale out to more than a couple of machines, some kind of tool will be needed to accomplish this. I know such tools exist, but I have no way of picking the "right one" at present (suggestions?). 4) Security: many of the existing parallel analytics tools have none. CZMQ seems to provide some, but it looks unpleasant to use compared to the rest of ZMQ. This is low priority, since nobody else bothers with it. Will I ever actually accomplish this? Probably not real quickly (too many day jobs), but I think it is an exciting potential use for J. Oddly, this thread does not show up in Nabble, where I usually read the J-lists. -Scott > http://www.jsoftware.com/jwiki/PascalJasmin/OOP%20scheduler%20and%20ZeroMQ > > Thanks to Scott for getting this started. Its a work in progress, but its > probably more helpful to see the simplest core version first, than just the > bloated version. > > > The scheduler is a framework for multitasking several polling (endless) > loops within a single J instance. The simplest multithreading > synchronization library is avoiding multithreading altogether, and an in > process scheduler allows what are semantically seperate processes to work > together without concerning yourself about the possibilities of one process > writting to a variable that is being read or written to by another process. > > It can integrate with other "real" multiprocessing setups by grouping > together tasks that need tight cooperation. The canonical usefulness is for > socket programing, which typically involve polling loops for each client > and server that add testing tedium even with just a single client and > server. The scheduler eases development and testing of several clients and > servers all in a single application, and simplifies testing/learning of > frameworks like ZeroMQ and its J implementation. ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
