RE: Top level name proposal - ComputeCluster
I raised this on this list a month ago, with a number of comments recieved. I'm getting close to initial release, but there has been some internal discussion about the name internally here. As a result of the internal discussion, I'm down to two choices for the name: Choice 1: HPCI - high performance computing interface HPCD - high performance computing driver Choice 2: ComputeJobManager::Interface ComputeJobManager::Driver I've moved away from having cluster in the name because it will not necessarily be limited to operating with clusters; the same interface should also work for a bunch of forked processes on the same host (maybe even threads if they are even safe to use in perl) and for cloud interfaces. (The forked processes driver will perhaps be most useful for running tests of the package itself that don't need to be executed on a cluster-capable system.) Choice 1 is succinct, but it is not self-identifying to casual browsers. Is that concern worth making actual users have a mcuh longer package name to type whenever they use it? (I've got one medium size coding change and a few small details, plus adjusting the name throughout whenever a final name is chosen, so I'm very close to the first release. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.cahttp://www.oicr.on.ca/ This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: John Macdonald Sent: September 4, 2014 10:23 AM To: module-authors@perl.org Subject: Top level name proposal - ComputeCluster Hi, I wanted to get general comment/concensus about a top level name that I am proposing. I'm starting to organize a set of modules for managing jobs on a computer cluster. I intend it to work much like DBI - with a top level abstract interface that programs can use, actually implemented by drivers that translate the common interface into the interface used by the particular type of compute cluster that is being accessed. Initially, I will provide a driver for SGE, since that is what we have and use in our lab (but after I have that running, my PI can get me access to a couple of other type of compute cluster to add some more. For naming, I am planning to use: ComputeCluster - top level name - will provide switching functions to create a class of object for a particular cluster type ComputeCluster::Role - collection of roles for the generic portion of the commonly available object types ComputeCluster::$cluster - cluster-specific implementations of the roles for each type of cluster supported Does anyone have any alternative suggestions, objections, etc.? John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.cahttp://www.oicr.on.ca/ This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
Re: Top level name proposal - ComputeCluster
What about HPC::? BTW, we have SGE at work too so this seems very interesting :-) Cheers! Alex
Re: Top level name proposal - ComputeCluster
So you intend to develop a new pure Perl compute cluster? Because if you just need to get the job done why would you not use Hadoop whether private cluster or AWS? It has a Perl APi and it will cheerfully run Perl jobs. Hadoop is an Apache project, open source free software with a large installed base. -Original Message- From: John Macdonald john.macdon...@oicr.on.ca Date: Fri, 5 Sep 2014 13:57:47 To: Fields, Christopher Jcjfie...@illinois.edu Cc: James E Keenanjk...@verizon.net; module-authors@perl.orgmodule-authors@perl.org Subject: RE: Top level name proposal - ComputeCluster I'm intending that ComputeCluster (or whatever the final name turns out to be) will be domain-agnostic at the top level interface at least. However, my lab will be using it for genome analysis pipelines, and I suspect a significant proportion of the potential other users will also be in this field (as shown by the repsonses on this discussion already) so there could be domain-specific submodules - either within this namespace or in other namespaces simply using this module set. Chris, Alex, and anyone else who is interested as a potential future user/contributor, feel free to email me outside of this module-authors discussion about how the actual module will develop. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Fields, Christopher J [cjfie...@illinois.edu] Sent: September 5, 2014 9:47 AM To: John Macdonald Cc: James E Keenan; module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster Yup, I agree. I think Cluster is too generic and can mean a lot of things (I think of cluster analysis myself). Maybe something more distinctive? Is it application- or domain-specific (bioinformatics, etc)? There are a few tools with similar functionality that come to mind. Most of them have catchy names; one written in Perl is Clusterflow (not on CPAN but here: https://github.com/ewels/clusterflow/). Another is the (completely unmaintained, likely broken, but possibly useful for something) biopipe project: https://github.com/bioperl/bioperl-pipeline. I have thought about retooling the latter to be less reliant on bioperl and more a stand-alone tool. There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) and nextflow (https://github.com/nextflow-io/nextflow). And I agree with Alex; as you might guess based on my comment on biopipe, our group would be very interested in helping out on this, even if it’s at simply the testing phase (we run PBS/Torque locally). chris On Sep 5, 2014, at 8:00 AM, John Macdonald john.macdon...@oicr.on.ca wrote: Cluster was my first thought for a name, but when I did a search to see what modules already existed (bot in case someone had already written a generic cluster module saving me the bother of starting a new one, and to see what types of cluster had cluster-specific modules written for them) the word cluster came up in a large number of contexts. An tightly connected group of things is a cluster (e.g. nodes in a graph) - so I didn't think that the simple name would be clear enough. The name Cluster leaves the reader with the immediate question Cluster of what?. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: James E Keenan [jk...@verizon.net] Sent: September 5, 2014 7:25 AM To: module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster On 09/04/2014 10:23 AM, John Macdonald wrote: Hi, I wanted to get general comment/concensus about a top level name that I am
Re: Top level name proposal - ComputeCluster
There exists a Perl interface to Hadoop. I can't look up right now, but i think that was under Apache:: . AWS also offer Hadoop as a service with Perl and PHP interface at least. Under AWS::Hadoop IIRC. -Original Message- From: Fields, Christopher J cjfie...@illinois.edu Date: Fri, 5 Sep 2014 13:47:26 To: John Macdonaldjohn.macdon...@oicr.on.ca Cc: James E Keenanjk...@verizon.net; module-authors@perl.orgmodule-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster Yup, I agree. I think Cluster is too generic and can mean a lot of things (I think of cluster analysis myself). Maybe something more distinctive? Is it application- or domain-specific (bioinformatics, etc)? There are a few tools with similar functionality that come to mind. Most of them have catchy names; one written in Perl is Clusterflow (not on CPAN but here: https://github.com/ewels/clusterflow/). Another is the (completely unmaintained, likely broken, but possibly useful for something) biopipe project: https://github.com/bioperl/bioperl-pipeline. I have thought about retooling the latter to be less reliant on bioperl and more a stand-alone tool. There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) and nextflow (https://github.com/nextflow-io/nextflow). And I agree with Alex; as you might guess based on my comment on biopipe, our group would be very interested in helping out on this, even if it’s at simply the testing phase (we run PBS/Torque locally). chris On Sep 5, 2014, at 8:00 AM, John Macdonald john.macdon...@oicr.on.ca wrote: Cluster was my first thought for a name, but when I did a search to see what modules already existed (bot in case someone had already written a generic cluster module saving me the bother of starting a new one, and to see what types of cluster had cluster-specific modules written for them) the word cluster came up in a large number of contexts. An tightly connected group of things is a cluster (e.g. nodes in a graph) - so I didn't think that the simple name would be clear enough. The name Cluster leaves the reader with the immediate question Cluster of what?. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: James E Keenan [jk...@verizon.net] Sent: September 5, 2014 7:25 AM To: module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster On 09/04/2014 10:23 AM, John Macdonald wrote: Hi, I wanted to get general comment/concensus about a top level name that I am proposing. I'm starting to organize a set of modules for managing jobs on a computer cluster. I intend it to work much like DBI - with a top level abstract interface that programs can use, actually implemented by drivers that translate the common interface into the interface used by the particular type of compute cluster that is being accessed. Initially, I will provide a driver for SGE, since that is what we have and use in our lab (but after I have that running, my PI can get me access to a couple of other type of compute cluster to add some more. For naming, I am planning to use: ComputeCluster - top level name - will provide switching functions to create a class of object for a particular cluster type Could that be shortened to simply: Cluster ?
Re: Top level name proposal - ComputeCluster
Doesn't Hadoop have to restart the perl interpreter for every execution step, i.e. run a script that performs the map or reduce operation? It seems like a single perl interpreter could listen on TCP for authenticated subroutines to run in threads, passing them on to idle neighbors if busy. No need for a scheduler? Scale by adding nodes; they glom together by broadcast registration, no single point of failure. They all UDP-broadcast their load averages to each other and keep track of each other from a detached `nice` child process. Load average is the only criteria used for work assignment. That process acts as a control channel. Object API for the cluster lets you give it chains of dependent subroutines. Subroutines can be defined to open a listener, broadcast the uuids of the subroutines it is waiting for, and listen until they get all expected results. If it gets missed, the subroutine return function broadcasts to ask which listener expects its results. Implement map-reduce this way? And whatever... Mark On Sep 7, 2014 11:01 AM, Dana Hudes dhu...@hudes.org wrote: There exists a Perl interface to Hadoop. I can't look up right now, but i think that was under Apache:: . AWS also offer Hadoop as a service with Perl and PHP interface at least. Under AWS::Hadoop IIRC. -Original Message- From: Fields, Christopher J cjfie...@illinois.edu Date: Fri, 5 Sep 2014 13:47:26 To: John Macdonaldjohn.macdon...@oicr.on.ca Cc: James E Keenanjk...@verizon.net; module-authors@perl.org module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster Yup, I agree. I think Cluster is too generic and can mean a lot of things (I think of cluster analysis myself). Maybe something more distinctive? Is it application- or domain-specific (bioinformatics, etc)? There are a few tools with similar functionality that come to mind. Most of them have catchy names; one written in Perl is Clusterflow (not on CPAN but here: https://github.com/ewels/clusterflow/). Another is the (completely unmaintained, likely broken, but possibly useful for something) biopipe project: https://github.com/bioperl/bioperl-pipeline. I have thought about retooling the latter to be less reliant on bioperl and more a stand-alone tool. There are a couple Java tools also: bpipe ( https://code.google.com/p/bpipe/) and nextflow ( https://github.com/nextflow-io/nextflow). And I agree with Alex; as you might guess based on my comment on biopipe, our group would be very interested in helping out on this, even if it's at simply the testing phase (we run PBS/Torque locally). chris On Sep 5, 2014, at 8:00 AM, John Macdonald john.macdon...@oicr.on.ca wrote: Cluster was my first thought for a name, but when I did a search to see what modules already existed (bot in case someone had already written a generic cluster module saving me the bother of starting a new one, and to see what types of cluster had cluster-specific modules written for them) the word cluster came up in a large number of contexts. An tightly connected group of things is a cluster (e.g. nodes in a graph) - so I didn't think that the simple name would be clear enough. The name Cluster leaves the reader with the immediate question Cluster of what?. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: James E Keenan [jk...@verizon.net] Sent: September 5, 2014 7:25 AM To: module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster On 09/04/2014 10:23 AM, John Macdonald wrote: Hi, I wanted to get general comment/concensus about a top level name that I am proposing. I'm starting to organize a set of modules for managing jobs on a computer cluster. I intend it to work much like DBI - with a top level abstract interface that programs can use, actually implemented by drivers that translate the common interface into the interface used by the particular type of compute cluster that is being accessed. Initially, I will provide a driver for SGE, since that is what we have and use in our lab (but after I have that running, my PI can get me access to a couple of other type of compute cluster to add some more. For naming, I am planning to use
Re: Top level name proposal - ComputeCluster
On 09/04/2014 10:23 AM, John Macdonald wrote: Hi, I wanted to get general comment/concensus about a top level name that I am proposing. I'm starting to organize a set of modules for managing jobs on a computer cluster. I intend it to work much like DBI - with a top level abstract interface that programs can use, actually implemented by drivers that translate the common interface into the interface used by the particular type of compute cluster that is being accessed. Initially, I will provide a driver for SGE, since that is what we have and use in our lab (but after I have that running, my PI can get me access to a couple of other type of compute cluster to add some more. For naming, I am planning to use: ComputeCluster - top level name - will provide switching functions to create a class of object for a particular cluster type Could that be shortened to simply: Cluster ?
RE: Top level name proposal - ComputeCluster
Cluster was my first thought for a name, but when I did a search to see what modules already existed (bot in case someone had already written a generic cluster module saving me the bother of starting a new one, and to see what types of cluster had cluster-specific modules written for them) the word cluster came up in a large number of contexts. An tightly connected group of things is a cluster (e.g. nodes in a graph) - so I didn't think that the simple name would be clear enough. The name Cluster leaves the reader with the immediate question Cluster of what?. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: James E Keenan [jk...@verizon.net] Sent: September 5, 2014 7:25 AM To: module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster On 09/04/2014 10:23 AM, John Macdonald wrote: Hi, I wanted to get general comment/concensus about a top level name that I am proposing. I'm starting to organize a set of modules for managing jobs on a computer cluster. I intend it to work much like DBI - with a top level abstract interface that programs can use, actually implemented by drivers that translate the common interface into the interface used by the particular type of compute cluster that is being accessed. Initially, I will provide a driver for SGE, since that is what we have and use in our lab (but after I have that running, my PI can get me access to a couple of other type of compute cluster to add some more. For naming, I am planning to use: ComputeCluster - top level name - will provide switching functions to create a class of object for a particular cluster type Could that be shortened to simply: Cluster ?
RE: Top level name proposal - ComputeCluster
I generally think the name has to be useful for people to decide quickly whether it is useful or irrelevant to their purposes. A cryptic TLA makes it easy for the people who recognize it to decide that it is what they want, but makes it hard for people who don't know the TLA to decide that they can ignore it. HPC would be short to type and meaningful to the users, but would be a confusing noise to many others. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.cahttp://www.oicr.on.ca/ This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Alex Muntada [alex.munt...@gmail.com] Sent: September 5, 2014 9:04 AM To: John Macdonald Cc: module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster What about HPC::? BTW, we have SGE at work too so this seems very interesting :-) Cheers! Alex
RE: Top level name proposal - ComputeCluster
I'm intending that ComputeCluster (or whatever the final name turns out to be) will be domain-agnostic at the top level interface at least. However, my lab will be using it for genome analysis pipelines, and I suspect a significant proportion of the potential other users will also be in this field (as shown by the repsonses on this discussion already) so there could be domain-specific submodules - either within this namespace or in other namespaces simply using this module set. Chris, Alex, and anyone else who is interested as a potential future user/contributor, feel free to email me outside of this module-authors discussion about how the actual module will develop. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Fields, Christopher J [cjfie...@illinois.edu] Sent: September 5, 2014 9:47 AM To: John Macdonald Cc: James E Keenan; module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster Yup, I agree. I think Cluster is too generic and can mean a lot of things (I think of cluster analysis myself). Maybe something more distinctive? Is it application- or domain-specific (bioinformatics, etc)? There are a few tools with similar functionality that come to mind. Most of them have catchy names; one written in Perl is Clusterflow (not on CPAN but here: https://github.com/ewels/clusterflow/). Another is the (completely unmaintained, likely broken, but possibly useful for something) biopipe project: https://github.com/bioperl/bioperl-pipeline. I have thought about retooling the latter to be less reliant on bioperl and more a stand-alone tool. There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) and nextflow (https://github.com/nextflow-io/nextflow). And I agree with Alex; as you might guess based on my comment on biopipe, our group would be very interested in helping out on this, even if it’s at simply the testing phase (we run PBS/Torque locally). chris On Sep 5, 2014, at 8:00 AM, John Macdonald john.macdon...@oicr.on.ca wrote: Cluster was my first thought for a name, but when I did a search to see what modules already existed (bot in case someone had already written a generic cluster module saving me the bother of starting a new one, and to see what types of cluster had cluster-specific modules written for them) the word cluster came up in a large number of contexts. An tightly connected group of things is a cluster (e.g. nodes in a graph) - so I didn't think that the simple name would be clear enough. The name Cluster leaves the reader with the immediate question Cluster of what?. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: James E Keenan [jk...@verizon.net] Sent: September 5, 2014 7:25 AM To: module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster On 09/04/2014 10:23 AM, John Macdonald wrote: Hi, I wanted to get general comment/concensus about a top level name that I am proposing. I'm starting to organize a set of modules for managing jobs on a computer cluster. I intend it to work much like DBI - with a top level abstract interface that programs can use, actually implemented by drivers that translate the common interface into the interface used by the particular type of compute cluster that is being accessed. Initially, I will provide a driver for SGE, since that is what we have and use in our lab (but after I have that running, my PI can get me access to a couple of other type of compute cluster to add some more. For naming, I am planning to use
Re: Top level name proposal - ComputeCluster
Hello everyone, In my opinion, ComputeCluster is way too long for a root module name. Plus, T9 in my head would type ComputerCluster instead. Maybe something like HighLoad:: or HPC:: would be better as a root name? Not sure though. On Fri, Sep 5, 2014 at 4:18 PM, John Macdonald john.macdon...@oicr.on.ca wrote: I generally think the name has to be useful for people to decide quickly whether it is useful or irrelevant to their purposes. A cryptic TLA makes it easy for the people who recognize it to decide that it is what they want, but makes it hard for people who don't know the TLA to decide that they can ignore it. HPC would be short to type and meaningful to the users, but would be a confusing noise to many others. *John Macdonald* Software Engineer *Ontario Institute for Cancer Research* MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news *www.oicr.on.ca http://www.oicr.on.ca/* This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. -- *From:* Alex Muntada [alex.munt...@gmail.com] *Sent:* September 5, 2014 9:04 AM *To:* John Macdonald *Cc:* module-authors@perl.org *Subject:* Re: Top level name proposal - ComputeCluster What about HPC::? BTW, we have SGE at work too so this seems very interesting :-) Cheers! Alex
RE: Top level name proposal - ComputeCluster
Dana, I may be wrong here, but I think that Hadoop is one form of compute cluster management software, just as SGE is. I'm aiming to provide a generic interface layer that you can use for writing code to be distributed across a cluster. By changing one parameter, cluster='Hadoop' instead of cluster='SGE' your same code would run on a different type of cluster. There would be limitations if you used cluster-specific capabilities, just as there are the same limitations converting a database connection that uses DBI to replace the underlying database platform, but *most* of the code would be unaffected. (Assuming that I get a good enough generic interface definition that captures balances the requirements and capabilities of different clusters well enough in a single consistent form. :-) John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Dana Hudes [dhu...@hudes.org] Sent: September 5, 2014 10:03 AM To: John Macdonald Cc: module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster So you intend to develop a new pure Perl compute cluster? Because if you just need to get the job done why would you not use Hadoop whether private cluster or AWS? It has a Perl APi and it will cheerfully run Perl jobs. Hadoop is an Apache project, open source free software with a large installed base. -Original Message- From: John Macdonald john.macdon...@oicr.on.ca Date: Fri, 5 Sep 2014 13:57:47 To: Fields, Christopher Jcjfie...@illinois.edu Cc: James E Keenanjk...@verizon.net; module-authors@perl.orgmodule-authors@perl.org Subject: RE: Top level name proposal - ComputeCluster I'm intending that ComputeCluster (or whatever the final name turns out to be) will be domain-agnostic at the top level interface at least. However, my lab will be using it for genome analysis pipelines, and I suspect a significant proportion of the potential other users will also be in this field (as shown by the repsonses on this discussion already) so there could be domain-specific submodules - either within this namespace or in other namespaces simply using this module set. Chris, Alex, and anyone else who is interested as a potential future user/contributor, feel free to email me outside of this module-authors discussion about how the actual module will develop. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Fields, Christopher J [cjfie...@illinois.edu] Sent: September 5, 2014 9:47 AM To: John Macdonald Cc: James E Keenan; module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster Yup, I agree. I think Cluster is too generic and can mean a lot of things (I think of cluster analysis myself). Maybe something more distinctive? Is it application- or domain-specific (bioinformatics, etc)? There are a few tools with similar functionality that come to mind. Most of them have catchy names; one written in Perl is Clusterflow (not on CPAN but here: https://github.com/ewels/clusterflow/). Another is the (completely unmaintained, likely broken, but possibly useful for something) biopipe project: https://github.com/bioperl/bioperl-pipeline. I have thought about retooling the latter to be less reliant on bioperl and more a stand-alone tool. There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) and nextflow (https://github.com/nextflow-io/nextflow). And I agree with Alex; as you might guess based on my comment on biopipe, our group would be very interested in helping out on this, even if it’s at simply the testing phase (we run PBS/Torque locally
Re: Top level name proposal - ComputeCluster
Probably way off-topic but just a question: should a generic interface target something like DRMAA? http://en.wikipedia.org/wiki/DRMAA That would work across most clusters as it’s a single unified API. (there is a DRMAA module, Schedule::DRMAAc, but I believe it’s XS-based and way out of date; at least I could never get it to install) chris On Sep 5, 2014, at 9:12 AM, John Macdonald john.macdon...@oicr.on.ca wrote: Dana, I may be wrong here, but I think that Hadoop is one form of compute cluster management software, just as SGE is. I'm aiming to provide a generic interface layer that you can use for writing code to be distributed across a cluster. By changing one parameter, cluster='Hadoop' instead of cluster='SGE' your same code would run on a different type of cluster. There would be limitations if you used cluster-specific capabilities, just as there are the same limitations converting a database connection that uses DBI to replace the underlying database platform, but *most* of the code would be unaffected. (Assuming that I get a good enough generic interface definition that captures balances the requirements and capabilities of different clusters well enough in a single consistent form. :-) John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Dana Hudes [dhu...@hudes.org] Sent: September 5, 2014 10:03 AM To: John Macdonald Cc: module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster So you intend to develop a new pure Perl compute cluster? Because if you just need to get the job done why would you not use Hadoop whether private cluster or AWS? It has a Perl APi and it will cheerfully run Perl jobs. Hadoop is an Apache project, open source free software with a large installed base. -Original Message- From: John Macdonald john.macdon...@oicr.on.ca Date: Fri, 5 Sep 2014 13:57:47 To: Fields, Christopher Jcjfie...@illinois.edu Cc: James E Keenanjk...@verizon.net; module-authors@perl.orgmodule-authors@perl.org Subject: RE: Top level name proposal - ComputeCluster I'm intending that ComputeCluster (or whatever the final name turns out to be) will be domain-agnostic at the top level interface at least. However, my lab will be using it for genome analysis pipelines, and I suspect a significant proportion of the potential other users will also be in this field (as shown by the repsonses on this discussion already) so there could be domain-specific submodules - either within this namespace or in other namespaces simply using this module set. Chris, Alex, and anyone else who is interested as a potential future user/contributor, feel free to email me outside of this module-authors discussion about how the actual module will develop. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Fields, Christopher J [cjfie...@illinois.edu] Sent: September 5, 2014 9:47 AM To: John Macdonald Cc: James E Keenan; module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster Yup, I agree. I think Cluster is too generic and can mean a lot of things (I think of cluster analysis myself). Maybe something more distinctive? Is it application- or domain-specific (bioinformatics, etc)? There are a few tools with similar functionality that come to mind. Most of them have catchy names; one written in Perl is Clusterflow (not on CPAN but here: https://github.com/ewels/clusterflow/). Another is the (completely unmaintained
RE: Top level name proposal - ComputeCluster
I looked at that a while ago. As you say, it seems way out of date. There were two specs - the old one was quite limited, the newer one looked very promising. However, the available code was only for the older spec and there didn't seem to be any progress after that. The spec looked like a committee design aimed at being implemented by teams from the companies for each of the target platforms, which made it a big bite to take for a single unaligned developer. However, I should probably re-read it and see if I can get my design to fit their spec to the extent possible. Stealing (er, research) is always good. John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Fields, Christopher J [cjfie...@illinois.edu] Sent: September 5, 2014 10:22 AM To: John Macdonald Cc: dhu...@hudes.org; module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster Probably way off-topic but just a question: should a generic interface target something like DRMAA? http://en.wikipedia.org/wiki/DRMAA That would work across most clusters as it’s a single unified API. (there is a DRMAA module, Schedule::DRMAAc, but I believe it’s XS-based and way out of date; at least I could never get it to install) chris On Sep 5, 2014, at 9:12 AM, John Macdonald john.macdon...@oicr.on.ca wrote: Dana, I may be wrong here, but I think that Hadoop is one form of compute cluster management software, just as SGE is. I'm aiming to provide a generic interface layer that you can use for writing code to be distributed across a cluster. By changing one parameter, cluster='Hadoop' instead of cluster='SGE' your same code would run on a different type of cluster. There would be limitations if you used cluster-specific capabilities, just as there are the same limitations converting a database connection that uses DBI to replace the underlying database platform, but *most* of the code would be unaffected. (Assuming that I get a good enough generic interface definition that captures balances the requirements and capabilities of different clusters well enough in a single consistent form. :-) John Macdonald Software Engineer Ontario Institute for Cancer Research MaRS Centre 661 University Avenue Suite 510 Toronto, Ontario Canada M5G 0A3 Tel: Email: john.macdon...@oicr.on.ca Toll-free: 1-866-678-6427 Twitter: @OICR_news www.oicr.on.ca This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization. From: Dana Hudes [dhu...@hudes.org] Sent: September 5, 2014 10:03 AM To: John Macdonald Cc: module-authors@perl.org Subject: Re: Top level name proposal - ComputeCluster So you intend to develop a new pure Perl compute cluster? Because if you just need to get the job done why would you not use Hadoop whether private cluster or AWS? It has a Perl APi and it will cheerfully run Perl jobs. Hadoop is an Apache project, open source free software with a large installed base. -Original Message- From: John Macdonald john.macdon...@oicr.on.ca Date: Fri, 5 Sep 2014 13:57:47 To: Fields, Christopher Jcjfie...@illinois.edu Cc: James E Keenanjk...@verizon.net; module-authors@perl.orgmodule-authors@perl.org Subject: RE: Top level name proposal - ComputeCluster I'm intending that ComputeCluster (or whatever the final name turns out to be) will be domain-agnostic at the top level interface at least. However, my lab will be using it for genome analysis pipelines, and I suspect a significant proportion of the potential other users will also be in this field (as shown by the repsonses on this discussion already) so there could be domain-specific submodules - either within this namespace or in other namespaces simply using this module set. Chris, Alex, and anyone else who is interested as a potential future user/contributor, feel
Re: Top level name proposal - ComputeCluster
On Fri, 5 Sep 2014, James E Keenan wrote: Could that be shortened to simply: Cluster ? If this happens I'm claiming Cluster::Fu... well, I think you know where I'm going with this ;-) --Arthur Corliss Live Free or Die