Re: [Jprogramming] wiki: task scheduler and zeromq intro

Devon McCormick Sun, 24 Nov 2013 16:07:17 -0800

Joe - thanks for posting the link to "j-fork-server" - it's something I've
wanted to do myself for some time but have not gotten around to it.



On Sun, Nov 24, 2013 at 6:41 PM, Devon McCormick <[email protected]> wrote:

> Most stochastic solutions are very amenable to multi-core.
>
>
> On Sun, Nov 24, 2013 at 5:58 PM, Pascal Jasmin <[email protected]>wrote:
>
>> Probably not news to anyone here, but any algorithm that can be expressed
>> in n ^ x time (polynomial including cases where x is 1 or less) can more
>> imporantly be expressed as a multi-core/thread/processor algorithm (of k
>> cores) if the data can be segmented into k parts.  n/k ^ x can be a
>> significant performance improvement if x >1 , but even if x = 1 or less, a
>> data partitioned algorithms is valuable considering the reality that there
>> are more affordable quad core 3 ghz processors, than 12 ghz single core
>> processors.
>>
>> So simple search usually has n/a run time, and is usually partitionable,
>> and it can benefit from multi-core approaches, but there are costs to
>> coordinating the threads and accumulating results.
>>
>> The point, if you are looking for ideas to apply multi-core solutions to,
>> is that you can do simple search as an example, or focus on any other
>> problem that has a data partitionable solution.
>>
>>
>>
>> ________________________________
>> From: Joe Bogner <[email protected]>
>> To: [email protected]
>> Sent: Saturday, November 23, 2013 9:46:24 PM
>> Subject: Re: [Jprogramming] wiki: task scheduler and zeromq intro
>>
>>
>> Can anyone share specific examples where it was needed to scale out to
>> multiple cores and machines? I am interested in learning about the types of
>> problems this would be applied to. I have read some examples while
>> researching but haven't ran into anyone who has.
>>
>>
>>
>>
>> For example, last week I had to create a database of the best 100,000
>> solutions out 56 billion combinations as part of a work deliverable. I am
>> sure there may have been more elegant solutions however brute forcing with
>> 4 instances of R and 32 gig of ram took 3 hours, which was fine.
>>
>>
>>
>> It might be worthwhile to create a small reproducible example of a
>> problem that would benefit from multiple cores and machines. I could make
>> one up or borrow from somewhere else but does anyone have any examples that
>> come to mind?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Sat, Nov 23, 2013 at 7:09 PM, Scott Locklin <[email protected]
>> ="mailto:[email protected]";>> wrote:
>> Dang, thanks for working on this, Pascal. This is one of the use cases I
>> had in mind for J-ZMQ, though it was taking me some time to figure out how
>> to do it in my clumsy way. I had been looking at joebo's fork thing for a
>> little inspiration, but I got busy with a half dozen other things. I will
>> try to play with this later next week. FWIIW, one way to prevent the "miss
>> the first message" problem is to fire up the SUB instances before you send
>> them anything.
>>
>>
>>
>> For what it is worth; I originally wrote these zeromq hooks for use in a
>> ticker plant for my own P/L. I had a better idea while working on a
>> consulting job recently. "Big data" and "the cloud" is in the news all the
>> time now. Mostly, this is hype, but there are some real business needs
>> involving extracting meaning from data sets which do not fit into core. The
>> existing solutions are generally not real impressive (Hadoop) or not
>> designed for weakly coupled "clouds" (MPI based solutions). Almost none of
>> these "solutions" use ideas which are actually appropriate to big data
>> (Vowpal Wabbit being a rare exception), and I have no inclination to
>> contribute to these tools.
>>
>>
>>
>> J has several fast and flexible data stores (still learning about Jd;
>> very impressed so far). J is also extremely memory efficient, and can do
>> out of core calculations. I have not done any tests to see if ZMQ can get
>> the data across the pipes well enough to do anything useful on AWS and
>> other such weakly coupled "clouds" favored by business, but I think I can
>> write some useful coarse grained parallel ML tools using ZMQ and J. This is
>> perhaps arrogant of me to think about: my J skills are weak, this is my
>> first ZMQ project, and it's rare I have time to think about big projects
>> like this, but I do know how to build fast and scalable machine learning
>> algorithms. I think this could be a way forward which solves important
>> business problems.
>>
>>
>> Stuff I don't know yet:
>>
>> 0) Stability: as Pascal noticed, J and ZMQ are sometimes unstable when
>> used together. I haven't fired up GDB to find out why yet. It's probably a
>> buffer allocation thing. It's possible there is a show stopper here: I
>> don't know. It seems to work with Kx at least:
>> https://github.com/jaeheum/qzmq
>>
>> 1) Fault tolerance: things are going to crash or blow up memory. Maybe
>> Pascal's task manager is enough for now.
>>
>> 2) Data provisioning: I'm guessing I'll need a framework for provisioning
>> each server with "owned" data, using Jd or JDB. I have to look at how other
>> frameworks do this.
>>
>> 3) Software provisioning: J is pretty simple to set up, but if this is
>> going to scale out to more than a couple of machines, some kind of tool
>> will be needed to accomplish this. I know such tools exist, but I have no
>> way of picking the "right one" at present (suggestions?).
>>
>> 4) Security: many of the existing parallel analytics tools have none.
>>
>> CZMQ seems to provide some, but it looks unpleasant to use compared to
>> the rest
>>
>> of ZMQ. This is low priority, since nobody else bothers with it.
>>
>>
>>
>> Will I ever actually accomplish this? Probably not real quickly (too many
>> day jobs), but I think it is an exciting potential use for J.
>>
>>
>> Oddly, this thread does not show up in Nabble, where I usually read the
>> J-lists.
>>
>>
>>
>> -Scott
>>
>>
>>
>> >
>> http://www.jsoftware.com/jwiki/PascalJasmin/OOP%20scheduler%20and%20ZeroMQ
>>
>> >
>>
>> > Thanks to Scott for getting this started.  Its a work in progress, but
>> its
>>
>> > probably more helpful to see the simplest core version first, than just
>> the
>>
>> > bloated version.
>>
>> >
>>
>> >
>>
>> > The scheduler is a framework for multitasking several polling (endless)
>>
>> > loops within a
>>
>> single J instance. The simplest multithreading
>>
>> > synchronization library is avoiding multithreading altogether, and an in
>>
>> > process scheduler allows what are semantically seperate processes to
>> work
>>
>> > together without concerning yourself about the possibilities of one
>> process
>>
>> > writting to a variable that is being read or written to by another
>> process.
>>
>> >
>>
>> > It can integrate with other "real" multiprocessing setups by grouping
>>
>> > together tasks that need tight cooperation. The canonical usefulness is
>> for
>>
>> > socket programing, which typically involve polling loops for each client
>>
>> > and server that add testing tedium even with just a single client and
>>
>> > server. The scheduler eases development and testing of several clients
>> and
>>
>> > servers all in a single application, and simplifies testing/learning of
>>
>> > frameworks like ZeroMQ and its J implementation.
>>
>> ----------------------------------------------------------------------
>>
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
>>
>
>
>
> --
> Devon McCormick, CFA
>
>


-- 
Devon McCormick, CFA
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] wiki: task scheduler and zeromq intro

Reply via email to