Re: [Jprogramming] wiki: task scheduler and zeromq intro

Scott Locklin Sat, 23 Nov 2013 16:09:41 -0800

Dang, thanks for working on this, Pascal. This is one of the use cases I had in 
mind for J-ZMQ, though it was taking me some time to figure out how to do it in 
my clumsy way. I had been looking at joebo's fork thing for a little 
inspiration, but I got busy with a half dozen other things. I will try to play 
with this later next week. FWIIW, one way to prevent the "miss the first 
message" problem is to fire up the SUB instances before you send them anything.



For what it is worth; I originally wrote these zeromq hooks for use in a ticker 
plant for my own P/L. I had a better idea while working on a consulting job 
recently. "Big data" and "the cloud" is in the news all the time now. Mostly, 
this is hype, but there are some real business needs involving extracting 
meaning from data sets which do not fit into core. The existing solutions are 
generally not real impressive (Hadoop) or not designed for weakly coupled 
"clouds" (MPI based solutions). Almost none of these "solutions" use ideas 
which are actually appropriate to big data (Vowpal Wabbit being a rare 
exception), and I have no inclination to contribute to these tools. 


J has several fast and flexible data stores (still learning about Jd; very 
impressed so far). J is also extremely memory efficient, and can do out of core 
calculations. I have not done any tests to see if ZMQ can get the data across 
the pipes well enough to do anything useful on AWS and other such weakly 
coupled "clouds" favored by business, but I think I can write some useful 
coarse grained parallel ML tools using ZMQ and J. This is perhaps arrogant of 
me to think about: my J skills are weak, this is my first ZMQ project, and it's 
rare I have time to think about big projects like this, but I do know how to 
build fast and scalable machine learning algorithms. I think this could be a 
way forward which solves important business problems. 

Stuff I don't know yet:
0) Stability: as Pascal noticed, J and ZMQ are sometimes unstable when used 
together. I haven't fired up GDB to find out why yet. It's probably a buffer 
allocation thing. It's possible there is a show stopper here: I don't know. It 
seems to work with Kx at least: https://github.com/jaeheum/qzmq
1) Fault tolerance: things are going to crash or blow up memory. Maybe Pascal's 
task manager is enough for now.
2) Data provisioning: I'm guessing I'll need a framework for provisioning each 
server with "owned" data, using Jd or JDB. I have to look at how other 
frameworks do this.
3) Software provisioning: J is pretty simple to set up, but if this is going to 
scale out to more than a couple of machines, some kind of tool will be needed 
to accomplish this. I know such tools exist, but I have no way of picking the 
"right one" at present (suggestions?). 
4) Security: many of the existing parallel analytics tools have none. 
CZMQ seems to provide some, but it looks unpleasant to use compared to the rest
 of ZMQ. This is low priority, since nobody else bothers with it.


Will I ever actually accomplish this? Probably not real quickly (too many day 
jobs), but I think it is an exciting potential use for J.

Oddly, this thread does not show up in Nabble, where I usually read the J-lists.


-Scott


> http://www.jsoftware.com/jwiki/PascalJasmin/OOP%20scheduler%20and%20ZeroMQ
>
> Thanks to Scott for getting this started.  Its a work in progress, but its
> probably more helpful to see the simplest core version first, than just the
> bloated version.
>
>
> The scheduler is a framework for multitasking several polling (endless)
> loops within a
 single J instance. The simplest multithreading
> synchronization library is avoiding multithreading altogether, and an in
> process scheduler allows what are semantically seperate processes to work
> together without concerning yourself about the possibilities of one process
> writting to a variable that is being read or written to by another process.
>
> It can integrate with other "real" multiprocessing setups by grouping
> together tasks that need tight cooperation. The canonical usefulness is for
> socket programing, which typically involve polling loops for each client
> and server that add testing tedium even with just a single client and
> server. The scheduler eases development and testing of several clients and
> servers all in a single application, and simplifies testing/learning of
> frameworks like ZeroMQ and its J implementation.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] wiki: task scheduler and zeromq intro

Reply via email to