Joe - thanks for posting the link to "j-fork-server" - it's something I've wanted to do myself for some time but have not gotten around to it.
On Sun, Nov 24, 2013 at 6:41 PM, Devon McCormick <[email protected]> wrote: > Most stochastic solutions are very amenable to multi-core. > > > On Sun, Nov 24, 2013 at 5:58 PM, Pascal Jasmin <[email protected]>wrote: > >> Probably not news to anyone here, but any algorithm that can be expressed >> in n ^ x time (polynomial including cases where x is 1 or less) can more >> imporantly be expressed as a multi-core/thread/processor algorithm (of k >> cores) if the data can be segmented into k parts. n/k ^ x can be a >> significant performance improvement if x >1 , but even if x = 1 or less, a >> data partitioned algorithms is valuable considering the reality that there >> are more affordable quad core 3 ghz processors, than 12 ghz single core >> processors. >> >> So simple search usually has n/a run time, and is usually partitionable, >> and it can benefit from multi-core approaches, but there are costs to >> coordinating the threads and accumulating results. >> >> The point, if you are looking for ideas to apply multi-core solutions to, >> is that you can do simple search as an example, or focus on any other >> problem that has a data partitionable solution. >> >> >> >> ________________________________ >> From: Joe Bogner <[email protected]> >> To: [email protected] >> Sent: Saturday, November 23, 2013 9:46:24 PM >> Subject: Re: [Jprogramming] wiki: task scheduler and zeromq intro >> >> >> Can anyone share specific examples where it was needed to scale out to >> multiple cores and machines? I am interested in learning about the types of >> problems this would be applied to. I have read some examples while >> researching but haven't ran into anyone who has. >> >> >> >> >> For example, last week I had to create a database of the best 100,000 >> solutions out 56 billion combinations as part of a work deliverable. I am >> sure there may have been more elegant solutions however brute forcing with >> 4 instances of R and 32 gig of ram took 3 hours, which was fine. >> >> >> >> It might be worthwhile to create a small reproducible example of a >> problem that would benefit from multiple cores and machines. I could make >> one up or borrow from somewhere else but does anyone have any examples that >> come to mind? >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Sat, Nov 23, 2013 at 7:09 PM, Scott Locklin <[email protected] >> ="mailto:[email protected]">> wrote: >> Dang, thanks for working on this, Pascal. This is one of the use cases I >> had in mind for J-ZMQ, though it was taking me some time to figure out how >> to do it in my clumsy way. I had been looking at joebo's fork thing for a >> little inspiration, but I got busy with a half dozen other things. I will >> try to play with this later next week. FWIIW, one way to prevent the "miss >> the first message" problem is to fire up the SUB instances before you send >> them anything. >> >> >> >> For what it is worth; I originally wrote these zeromq hooks for use in a >> ticker plant for my own P/L. I had a better idea while working on a >> consulting job recently. "Big data" and "the cloud" is in the news all the >> time now. Mostly, this is hype, but there are some real business needs >> involving extracting meaning from data sets which do not fit into core. The >> existing solutions are generally not real impressive (Hadoop) or not >> designed for weakly coupled "clouds" (MPI based solutions). Almost none of >> these "solutions" use ideas which are actually appropriate to big data >> (Vowpal Wabbit being a rare exception), and I have no inclination to >> contribute to these tools. >> >> >> >> J has several fast and flexible data stores (still learning about Jd; >> very impressed so far). J is also extremely memory efficient, and can do >> out of core calculations. I have not done any tests to see if ZMQ can get >> the data across the pipes well enough to do anything useful on AWS and >> other such weakly coupled "clouds" favored by business, but I think I can >> write some useful coarse grained parallel ML tools using ZMQ and J. This is >> perhaps arrogant of me to think about: my J skills are weak, this is my >> first ZMQ project, and it's rare I have time to think about big projects >> like this, but I do know how to build fast and scalable machine learning >> algorithms. I think this could be a way forward which solves important >> business problems. >> >> >> Stuff I don't know yet: >> >> 0) Stability: as Pascal noticed, J and ZMQ are sometimes unstable when >> used together. I haven't fired up GDB to find out why yet. It's probably a >> buffer allocation thing. It's possible there is a show stopper here: I >> don't know. It seems to work with Kx at least: >> https://github.com/jaeheum/qzmq >> >> 1) Fault tolerance: things are going to crash or blow up memory. Maybe >> Pascal's task manager is enough for now. >> >> 2) Data provisioning: I'm guessing I'll need a framework for provisioning >> each server with "owned" data, using Jd or JDB. I have to look at how other >> frameworks do this. >> >> 3) Software provisioning: J is pretty simple to set up, but if this is >> going to scale out to more than a couple of machines, some kind of tool >> will be needed to accomplish this. I know such tools exist, but I have no >> way of picking the "right one" at present (suggestions?). >> >> 4) Security: many of the existing parallel analytics tools have none. >> >> CZMQ seems to provide some, but it looks unpleasant to use compared to >> the rest >> >> of ZMQ. This is low priority, since nobody else bothers with it. >> >> >> >> Will I ever actually accomplish this? Probably not real quickly (too many >> day jobs), but I think it is an exciting potential use for J. >> >> >> Oddly, this thread does not show up in Nabble, where I usually read the >> J-lists. >> >> >> >> -Scott >> >> >> >> > >> http://www.jsoftware.com/jwiki/PascalJasmin/OOP%20scheduler%20and%20ZeroMQ >> >> > >> >> > Thanks to Scott for getting this started. Its a work in progress, but >> its >> >> > probably more helpful to see the simplest core version first, than just >> the >> >> > bloated version. >> >> > >> >> > >> >> > The scheduler is a framework for multitasking several polling (endless) >> >> > loops within a >> >> single J instance. The simplest multithreading >> >> > synchronization library is avoiding multithreading altogether, and an in >> >> > process scheduler allows what are semantically seperate processes to >> work >> >> > together without concerning yourself about the possibilities of one >> process >> >> > writting to a variable that is being read or written to by another >> process. >> >> > >> >> > It can integrate with other "real" multiprocessing setups by grouping >> >> > together tasks that need tight cooperation. The canonical usefulness is >> for >> >> > socket programing, which typically involve polling loops for each client >> >> > and server that add testing tedium even with just a single client and >> >> > server. The scheduler eases development and testing of several clients >> and >> >> > servers all in a single application, and simplifies testing/learning of >> >> > frameworks like ZeroMQ and its J implementation. >> >> ---------------------------------------------------------------------- >> >> For information about J forums see http://www.jsoftware.com/forums.htm >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > > > -- > Devon McCormick, CFA > > -- Devon McCormick, CFA ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
