RE: Top level name proposal - ComputeCluster

2014-10-02 Thread John Macdonald
I raised this on this list a month ago, with a number of comments recieved.

I'm getting close to initial release, but there has been some internal 
discussion about the name internally here.

As a result of the internal discussion, I'm down to two choices for the name:

Choice 1:
HPCI - high performance computing interface
HPCD - high performance computing driver

Choice 2:
ComputeJobManager::Interface
ComputeJobManager::Driver

I've moved away from having cluster in the name because it will not necessarily 
be limited to operating with clusters; the same interface should also work for 
a bunch of forked processes on the same host (maybe even threads if they are 
even safe to use in perl) and for cloud interfaces.

(The forked processes driver will perhaps be most useful for running tests of 
the package itself that don't need to be executed on a cluster-capable system.)

Choice 1 is succinct, but it is not self-identifying to casual browsers.  Is 
that concern worth making actual users have a mcuh longer package name to type 
whenever they use it?

(I've got one medium size coding change and a few small details, plus adjusting 
the name throughout whenever a final name is chosen, so I'm very close to the 
first release.


John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.cahttp://www.oicr.on.ca/

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.


From: John Macdonald
Sent: September 4, 2014 10:23 AM
To: module-authors@perl.org
Subject: Top level name proposal - ComputeCluster

Hi,

I wanted to get general comment/concensus about a top level name that I am 
proposing.

I'm starting to organize a set of modules for managing jobs on a computer 
cluster.  I intend it to work much like DBI - with a top level abstract 
interface that programs can use, actually implemented by drivers that translate 
the common interface into the interface used by the particular type of compute 
cluster that is being accessed.  Initially, I will provide a driver for SGE, 
since that is what we have and use in our lab (but after I have that running, 
my PI can get me access to a couple of other type of compute cluster to add 
some more.

For naming, I am planning to use:

ComputeCluster - top level name
  - will provide switching functions to create a class of object for a 
particular cluster type

ComputeCluster::Role - collection of roles for the generic portion of the 
commonly available object types

ComputeCluster::$cluster - cluster-specific implementations of the roles 
for each type of cluster supported

Does anyone have any alternative suggestions, objections, etc.?



John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.cahttp://www.oicr.on.ca/

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.


Re: Top level name proposal - ComputeCluster

2014-09-07 Thread Alex Muntada
What about HPC::?

BTW, we have SGE at work too so this seems very interesting :-)

Cheers!
Alex


Re: Top level name proposal - ComputeCluster

2014-09-07 Thread Dana Hudes
So you intend to develop a new pure Perl compute cluster? Because if you just 
need to get the job done why would you not use Hadoop whether private cluster 
or AWS? It has a Perl APi and it will cheerfully run Perl jobs.
Hadoop is an Apache project, open source free software with a large installed 
base.

-Original Message-
From: John Macdonald john.macdon...@oicr.on.ca
Date: Fri, 5 Sep 2014 13:57:47 
To: Fields, Christopher Jcjfie...@illinois.edu
Cc: James E Keenanjk...@verizon.net; 
module-authors@perl.orgmodule-authors@perl.org
Subject: RE: Top level name proposal - ComputeCluster

I'm intending that ComputeCluster (or whatever the final name turns out to be) 
will be domain-agnostic at the top level interface at least.  However, my lab 
will be using it for genome analysis pipelines, and I suspect a significant 
proportion of the potential other users will also be in this field (as shown by 
the repsonses on this discussion already) so there could be domain-specific 
submodules - either within this namespace or in other namespaces simply using 
this module set.

Chris, Alex, and anyone else who is interested as a potential future 
user/contributor, feel free to email me outside of this module-authors 
discussion about how the actual module will develop.

John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.


From: Fields, Christopher J [cjfie...@illinois.edu]
Sent: September 5, 2014 9:47 AM
To: John Macdonald
Cc: James E Keenan; module-authors@perl.org
Subject: Re: Top level name proposal - ComputeCluster

Yup, I agree.  I think Cluster is too generic and can mean a lot of things (I 
think of cluster analysis myself).  Maybe something more distinctive?  Is it 
application- or domain-specific (bioinformatics, etc)?

There are a few tools with similar functionality that come to mind.  Most of 
them have catchy names; one written in Perl is Clusterflow (not on CPAN but 
here: https://github.com/ewels/clusterflow/).  Another is the (completely 
unmaintained, likely broken, but possibly useful for something) biopipe 
project: https://github.com/bioperl/bioperl-pipeline.  I have thought about 
retooling the latter to be less reliant on bioperl and more a stand-alone tool.

There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) 
and nextflow (https://github.com/nextflow-io/nextflow).

And I agree with Alex; as you might guess based on my comment on biopipe, our 
group would be very interested in helping out on this, even if it’s at simply 
the testing phase (we run PBS/Torque locally).

chris

On Sep 5, 2014, at 8:00 AM, John Macdonald john.macdon...@oicr.on.ca wrote:

 Cluster was my first thought for a name, but when I did a search to see what 
 modules already existed (bot in case someone had already written a generic 
 cluster module saving me the bother of starting a new one, and to see what 
 types of cluster had cluster-specific modules written for them) the word 
 cluster came up in a large number of contexts.  An tightly connected group of 
 things is a cluster (e.g. nodes in a graph) - so I didn't think that the 
 simple name would be clear enough.  The name Cluster leaves the reader with 
 the immediate question Cluster of what?.

 John Macdonald
 Software Engineer

 Ontario Institute for Cancer Research
 MaRS Centre

 661 University Avenue

 Suite 510
 Toronto, Ontario

 Canada M5G 0A3


 Tel:

 Email: john.macdon...@oicr.on.ca

 Toll-free: 1-866-678-6427
 Twitter: @OICR_news


 www.oicr.on.ca

 This message and any attachments may contain confidential and/or privileged 
 information for the sole use of the intended recipient. Any review or 
 distribution by anyone other than the person for whom it was originally 
 intended is strictly prohibited. If you have received this message in error, 
 please contact the sender and delete all copies. Opinions, conclusions or 
 other information contained in this message may not be that of the 
 organization.

 
 From: James E Keenan [jk...@verizon.net]
 Sent: September 5, 2014 7:25 AM
 To: module-authors@perl.org
 Subject: Re: Top level name proposal - ComputeCluster

 On 09/04/2014 10:23 AM, John Macdonald wrote:
 Hi,

 I wanted to get general comment/concensus about a top level name that I
 am

Re: Top level name proposal - ComputeCluster

2014-09-07 Thread Dana Hudes
There exists a Perl interface to Hadoop. I can't look up right now, but i think 
that was under Apache:: . AWS also offer Hadoop as a service with Perl and PHP 
interface at least. Under AWS::Hadoop IIRC.

-Original Message-
From: Fields, Christopher J cjfie...@illinois.edu
Date: Fri, 5 Sep 2014 13:47:26 
To: John Macdonaldjohn.macdon...@oicr.on.ca
Cc: James E Keenanjk...@verizon.net; 
module-authors@perl.orgmodule-authors@perl.org
Subject: Re: Top level name proposal - ComputeCluster

Yup, I agree.  I think Cluster is too generic and can mean a lot of things (I 
think of cluster analysis myself).  Maybe something more distinctive?  Is it 
application- or domain-specific (bioinformatics, etc)?

There are a few tools with similar functionality that come to mind.  Most of 
them have catchy names; one written in Perl is Clusterflow (not on CPAN but 
here: https://github.com/ewels/clusterflow/).  Another is the (completely 
unmaintained, likely broken, but possibly useful for something) biopipe 
project: https://github.com/bioperl/bioperl-pipeline.  I have thought about 
retooling the latter to be less reliant on bioperl and more a stand-alone tool.

There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) 
and nextflow (https://github.com/nextflow-io/nextflow).  

And I agree with Alex; as you might guess based on my comment on biopipe, our 
group would be very interested in helping out on this, even if it’s at simply 
the testing phase (we run PBS/Torque locally).

chris

On Sep 5, 2014, at 8:00 AM, John Macdonald john.macdon...@oicr.on.ca wrote:

 Cluster was my first thought for a name, but when I did a search to see what 
 modules already existed (bot in case someone had already written a generic 
 cluster module saving me the bother of starting a new one, and to see what 
 types of cluster had cluster-specific modules written for them) the word 
 cluster came up in a large number of contexts.  An tightly connected group of 
 things is a cluster (e.g. nodes in a graph) - so I didn't think that the 
 simple name would be clear enough.  The name Cluster leaves the reader with 
 the immediate question Cluster of what?.
 
 John Macdonald
 Software Engineer
 
 Ontario Institute for Cancer Research
 MaRS Centre
 
 661 University Avenue
 
 Suite 510
 Toronto, Ontario
 
 Canada M5G 0A3
 
 
 Tel:
 
 Email: john.macdon...@oicr.on.ca
 
 Toll-free: 1-866-678-6427
 Twitter: @OICR_news
 
 
 www.oicr.on.ca
 
 This message and any attachments may contain confidential and/or privileged 
 information for the sole use of the intended recipient. Any review or 
 distribution by anyone other than the person for whom it was originally 
 intended is strictly prohibited. If you have received this message in error, 
 please contact the sender and delete all copies. Opinions, conclusions or 
 other information contained in this message may not be that of the 
 organization.
 
 
 From: James E Keenan [jk...@verizon.net]
 Sent: September 5, 2014 7:25 AM
 To: module-authors@perl.org
 Subject: Re: Top level name proposal - ComputeCluster
 
 On 09/04/2014 10:23 AM, John Macdonald wrote:
 Hi,
 
 I wanted to get general comment/concensus about a top level name that I
 am proposing.
 
 I'm starting to organize a set of modules for managing jobs on a
 computer cluster.  I intend it to work much like DBI - with a top level
 abstract interface that programs can use, actually implemented by
 drivers that translate the common interface into the interface used by
 the particular type of compute cluster that is being accessed.
 Initially, I will provide a driver for SGE, since that is what we have
 and use in our lab (but after I have that running, my PI can get me
 access to a couple of other type of compute cluster to add some more.
 
 For naming, I am planning to use:
 
 ComputeCluster - top level name
   - will provide switching functions to create a class of object
 for a particular cluster type
 
 
 Could that be shortened to simply:  Cluster ?



Re: Top level name proposal - ComputeCluster

2014-09-07 Thread Mark Hedges
Doesn't Hadoop have to restart the perl interpreter for every execution
step, i.e. run a script that performs the map or reduce operation?

It seems like a single perl interpreter could listen on TCP for
authenticated subroutines to run in threads, passing them on to idle
neighbors if busy.  No need for a scheduler? Scale by adding nodes; they
glom together by broadcast registration, no single point of failure. They
all UDP-broadcast their load averages to each other and keep track of each
other from a detached `nice` child process. Load average is the only
criteria used for work assignment. That process acts as a control channel.

Object API for the cluster lets you give it chains of dependent
subroutines. Subroutines can be defined to open a listener, broadcast the
uuids of the subroutines it is waiting for, and listen until they get all
expected results. If it gets missed, the subroutine return function
broadcasts to ask which listener expects its results. Implement map-reduce
this way? And whatever...

Mark
On Sep 7, 2014 11:01 AM, Dana Hudes dhu...@hudes.org wrote:

 There exists a Perl interface to Hadoop. I can't look up right now, but i
 think that was under Apache:: . AWS also offer Hadoop as a service with
 Perl and PHP interface at least. Under AWS::Hadoop IIRC.

 -Original Message-
 From: Fields, Christopher J cjfie...@illinois.edu
 Date: Fri, 5 Sep 2014 13:47:26
 To: John Macdonaldjohn.macdon...@oicr.on.ca
 Cc: James E Keenanjk...@verizon.net; module-authors@perl.org
 module-authors@perl.org
 Subject: Re: Top level name proposal - ComputeCluster

 Yup, I agree.  I think Cluster is too generic and can mean a lot of things
 (I think of cluster analysis myself).  Maybe something more distinctive?
 Is it application- or domain-specific (bioinformatics, etc)?

 There are a few tools with similar functionality that come to mind.  Most
 of them have catchy names; one written in Perl is Clusterflow (not on CPAN
 but here: https://github.com/ewels/clusterflow/).  Another is the
 (completely unmaintained, likely broken, but possibly useful for something)
 biopipe project: https://github.com/bioperl/bioperl-pipeline.  I have
 thought about retooling the latter to be less reliant on bioperl and more a
 stand-alone tool.

 There are a couple Java tools also: bpipe (
 https://code.google.com/p/bpipe/) and nextflow (
 https://github.com/nextflow-io/nextflow).

 And I agree with Alex; as you might guess based on my comment on biopipe,
 our group would be very interested in helping out on this, even if it's at
 simply the testing phase (we run PBS/Torque locally).

 chris

 On Sep 5, 2014, at 8:00 AM, John Macdonald john.macdon...@oicr.on.ca
 wrote:

  Cluster was my first thought for a name, but when I did a search to see
 what modules already existed (bot in case someone had already written a
 generic cluster module saving me the bother of starting a new one, and to
 see what types of cluster had cluster-specific modules written for them)
 the word cluster came up in a large number of contexts.  An tightly
 connected group of things is a cluster (e.g. nodes in a graph) - so I
 didn't think that the simple name would be clear enough.  The name Cluster
 leaves the reader with the immediate question Cluster of what?.
 
  John Macdonald
  Software Engineer
 
  Ontario Institute for Cancer Research
  MaRS Centre
 
  661 University Avenue
 
  Suite 510
  Toronto, Ontario
 
  Canada M5G 0A3
 
 
  Tel:
 
  Email: john.macdon...@oicr.on.ca
 
  Toll-free: 1-866-678-6427
  Twitter: @OICR_news
 
 
  www.oicr.on.ca
 
  This message and any attachments may contain confidential and/or
 privileged information for the sole use of the intended recipient. Any
 review or distribution by anyone other than the person for whom it was
 originally intended is strictly prohibited. If you have received this
 message in error, please contact the sender and delete all copies.
 Opinions, conclusions or other information contained in this message may
 not be that of the organization.
 
  
  From: James E Keenan [jk...@verizon.net]
  Sent: September 5, 2014 7:25 AM
  To: module-authors@perl.org
  Subject: Re: Top level name proposal - ComputeCluster
 
  On 09/04/2014 10:23 AM, John Macdonald wrote:
  Hi,
 
  I wanted to get general comment/concensus about a top level name that I
  am proposing.
 
  I'm starting to organize a set of modules for managing jobs on a
  computer cluster.  I intend it to work much like DBI - with a top level
  abstract interface that programs can use, actually implemented by
  drivers that translate the common interface into the interface used by
  the particular type of compute cluster that is being accessed.
  Initially, I will provide a driver for SGE, since that is what we have
  and use in our lab (but after I have that running, my PI can get me
  access to a couple of other type of compute cluster to add some more.
 
  For naming, I am planning to use

Re: Top level name proposal - ComputeCluster

2014-09-05 Thread James E Keenan

On 09/04/2014 10:23 AM, John Macdonald wrote:

Hi,

I wanted to get general comment/concensus about a top level name that I
am proposing.

I'm starting to organize a set of modules for managing jobs on a
computer cluster.  I intend it to work much like DBI - with a top level
abstract interface that programs can use, actually implemented by
drivers that translate the common interface into the interface used by
the particular type of compute cluster that is being accessed.
Initially, I will provide a driver for SGE, since that is what we have
and use in our lab (but after I have that running, my PI can get me
access to a couple of other type of compute cluster to add some more.

For naming, I am planning to use:

 ComputeCluster - top level name
   - will provide switching functions to create a class of object
for a particular cluster type



Could that be shortened to simply:  Cluster ?


RE: Top level name proposal - ComputeCluster

2014-09-05 Thread John Macdonald
Cluster was my first thought for a name, but when I did a search to see what 
modules already existed (bot in case someone had already written a generic 
cluster module saving me the bother of starting a new one, and to see what 
types of cluster had cluster-specific modules written for them) the word 
cluster came up in a large number of contexts.  An tightly connected group of 
things is a cluster (e.g. nodes in a graph) - so I didn't think that the 
simple name would be clear enough.  The name Cluster leaves the reader with the 
immediate question Cluster of what?.

John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.


From: James E Keenan [jk...@verizon.net]
Sent: September 5, 2014 7:25 AM
To: module-authors@perl.org
Subject: Re: Top level name proposal - ComputeCluster

On 09/04/2014 10:23 AM, John Macdonald wrote:
 Hi,

 I wanted to get general comment/concensus about a top level name that I
 am proposing.

 I'm starting to organize a set of modules for managing jobs on a
 computer cluster.  I intend it to work much like DBI - with a top level
 abstract interface that programs can use, actually implemented by
 drivers that translate the common interface into the interface used by
 the particular type of compute cluster that is being accessed.
 Initially, I will provide a driver for SGE, since that is what we have
 and use in our lab (but after I have that running, my PI can get me
 access to a couple of other type of compute cluster to add some more.

 For naming, I am planning to use:

  ComputeCluster - top level name
- will provide switching functions to create a class of object
 for a particular cluster type


Could that be shortened to simply:  Cluster ?


RE: Top level name proposal - ComputeCluster

2014-09-05 Thread John Macdonald
I generally think the name has to be useful for people to decide quickly 
whether it is useful or irrelevant to their purposes.  A cryptic TLA makes it 
easy for the people who recognize it to decide that it is what they want, but 
makes it hard for people who don't know the TLA to decide that they can ignore 
it.  HPC would be short to type and meaningful to the users, but would be a 
confusing noise to many others.


John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.cahttp://www.oicr.on.ca/

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.


From: Alex Muntada [alex.munt...@gmail.com]
Sent: September 5, 2014 9:04 AM
To: John Macdonald
Cc: module-authors@perl.org
Subject: Re: Top level name proposal - ComputeCluster


What about HPC::?

BTW, we have SGE at work too so this seems very interesting :-)

Cheers!
Alex


RE: Top level name proposal - ComputeCluster

2014-09-05 Thread John Macdonald
I'm intending that ComputeCluster (or whatever the final name turns out to be) 
will be domain-agnostic at the top level interface at least.  However, my lab 
will be using it for genome analysis pipelines, and I suspect a significant 
proportion of the potential other users will also be in this field (as shown by 
the repsonses on this discussion already) so there could be domain-specific 
submodules - either within this namespace or in other namespaces simply using 
this module set.

Chris, Alex, and anyone else who is interested as a potential future 
user/contributor, feel free to email me outside of this module-authors 
discussion about how the actual module will develop.

John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.


From: Fields, Christopher J [cjfie...@illinois.edu]
Sent: September 5, 2014 9:47 AM
To: John Macdonald
Cc: James E Keenan; module-authors@perl.org
Subject: Re: Top level name proposal - ComputeCluster

Yup, I agree.  I think Cluster is too generic and can mean a lot of things (I 
think of cluster analysis myself).  Maybe something more distinctive?  Is it 
application- or domain-specific (bioinformatics, etc)?

There are a few tools with similar functionality that come to mind.  Most of 
them have catchy names; one written in Perl is Clusterflow (not on CPAN but 
here: https://github.com/ewels/clusterflow/).  Another is the (completely 
unmaintained, likely broken, but possibly useful for something) biopipe 
project: https://github.com/bioperl/bioperl-pipeline.  I have thought about 
retooling the latter to be less reliant on bioperl and more a stand-alone tool.

There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) 
and nextflow (https://github.com/nextflow-io/nextflow).

And I agree with Alex; as you might guess based on my comment on biopipe, our 
group would be very interested in helping out on this, even if it’s at simply 
the testing phase (we run PBS/Torque locally).

chris

On Sep 5, 2014, at 8:00 AM, John Macdonald john.macdon...@oicr.on.ca wrote:

 Cluster was my first thought for a name, but when I did a search to see what 
 modules already existed (bot in case someone had already written a generic 
 cluster module saving me the bother of starting a new one, and to see what 
 types of cluster had cluster-specific modules written for them) the word 
 cluster came up in a large number of contexts.  An tightly connected group of 
 things is a cluster (e.g. nodes in a graph) - so I didn't think that the 
 simple name would be clear enough.  The name Cluster leaves the reader with 
 the immediate question Cluster of what?.

 John Macdonald
 Software Engineer

 Ontario Institute for Cancer Research
 MaRS Centre

 661 University Avenue

 Suite 510
 Toronto, Ontario

 Canada M5G 0A3


 Tel:

 Email: john.macdon...@oicr.on.ca

 Toll-free: 1-866-678-6427
 Twitter: @OICR_news


 www.oicr.on.ca

 This message and any attachments may contain confidential and/or privileged 
 information for the sole use of the intended recipient. Any review or 
 distribution by anyone other than the person for whom it was originally 
 intended is strictly prohibited. If you have received this message in error, 
 please contact the sender and delete all copies. Opinions, conclusions or 
 other information contained in this message may not be that of the 
 organization.

 
 From: James E Keenan [jk...@verizon.net]
 Sent: September 5, 2014 7:25 AM
 To: module-authors@perl.org
 Subject: Re: Top level name proposal - ComputeCluster

 On 09/04/2014 10:23 AM, John Macdonald wrote:
 Hi,

 I wanted to get general comment/concensus about a top level name that I
 am proposing.

 I'm starting to organize a set of modules for managing jobs on a
 computer cluster.  I intend it to work much like DBI - with a top level
 abstract interface that programs can use, actually implemented by
 drivers that translate the common interface into the interface used by
 the particular type of compute cluster that is being accessed.
 Initially, I will provide a driver for SGE, since that is what we have
 and use in our lab (but after I have that running, my PI can get me
 access to a couple of other type of compute cluster to add some more.

 For naming, I am planning to use

Re: Top level name proposal - ComputeCluster

2014-09-05 Thread Konstantin S. Uvarin
Hello everyone,

  In my opinion, ComputeCluster is way too long for a root module name.
Plus, T9 in my head would type ComputerCluster instead.

  Maybe something like HighLoad:: or HPC:: would be better as a root name?

  Not sure though.


On Fri, Sep 5, 2014 at 4:18 PM, John Macdonald john.macdon...@oicr.on.ca
wrote:

  I generally think the name has to be useful for people to decide quickly
 whether it is useful or irrelevant to their purposes.  A cryptic TLA makes
 it easy for the people who recognize it to decide that it is what they
 want, but makes it hard for people who don't know the TLA to decide that
 they can ignore it.  HPC would be short to type and meaningful to the
 users, but would be a confusing noise to many others.

  *John Macdonald*
 Software Engineer

 *Ontario Institute for Cancer Research*
 MaRS Centre

 661 University Avenue

 Suite 510
 Toronto, Ontario

 Canada M5G 0A3

  Tel:

 Email: john.macdon...@oicr.on.ca

 Toll-free: 1-866-678-6427
 Twitter: @OICR_news


  *www.oicr.on.ca http://www.oicr.on.ca/*

 This message and any attachments may contain confidential and/or
 privileged information for the sole use of the intended recipient. Any
 review or distribution by anyone other than the person for whom it was
 originally intended is strictly prohibited. If you have received this
 message in error, please contact the sender and delete all copies.
 Opinions, conclusions or other information contained in this message may
 not be that of the organization.
   --
 *From:* Alex Muntada [alex.munt...@gmail.com]
 *Sent:* September 5, 2014 9:04 AM
 *To:* John Macdonald
 *Cc:* module-authors@perl.org
 *Subject:* Re: Top level name proposal - ComputeCluster

   What about HPC::?

 BTW, we have SGE at work too so this seems very interesting :-)

 Cheers!
 Alex



RE: Top level name proposal - ComputeCluster

2014-09-05 Thread John Macdonald
Dana, I may be wrong here, but I think that Hadoop is one form of compute 
cluster management software, just as SGE is.  I'm aiming to provide a generic 
interface layer that you can use for writing code to be distributed across a 
cluster.  By changing one parameter, cluster='Hadoop' instead of 
cluster='SGE' your same code would run on a different type of cluster.  There 
would be limitations if you used cluster-specific capabilities, just as there 
are the same limitations converting a database connection that uses DBI to 
replace the underlying database platform, but *most* of the code would be 
unaffected.  (Assuming that I get a good enough generic interface definition 
that captures balances the requirements and capabilities of different clusters 
well enough in a single consistent form. :-)

John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.


From: Dana Hudes [dhu...@hudes.org]
Sent: September 5, 2014 10:03 AM
To: John Macdonald
Cc: module-authors@perl.org
Subject: Re: Top level name proposal - ComputeCluster

So you intend to develop a new pure Perl compute cluster? Because if you just 
need to get the job done why would you not use Hadoop whether private cluster 
or AWS? It has a Perl APi and it will cheerfully run Perl jobs.
Hadoop is an Apache project, open source free software with a large installed 
base.

-Original Message-
From: John Macdonald john.macdon...@oicr.on.ca
Date: Fri, 5 Sep 2014 13:57:47
To: Fields, Christopher Jcjfie...@illinois.edu
Cc: James E Keenanjk...@verizon.net; 
module-authors@perl.orgmodule-authors@perl.org
Subject: RE: Top level name proposal - ComputeCluster

I'm intending that ComputeCluster (or whatever the final name turns out to be) 
will be domain-agnostic at the top level interface at least.  However, my lab 
will be using it for genome analysis pipelines, and I suspect a significant 
proportion of the potential other users will also be in this field (as shown by 
the repsonses on this discussion already) so there could be domain-specific 
submodules - either within this namespace or in other namespaces simply using 
this module set.

Chris, Alex, and anyone else who is interested as a potential future 
user/contributor, feel free to email me outside of this module-authors 
discussion about how the actual module will develop.

John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.


From: Fields, Christopher J [cjfie...@illinois.edu]
Sent: September 5, 2014 9:47 AM
To: John Macdonald
Cc: James E Keenan; module-authors@perl.org
Subject: Re: Top level name proposal - ComputeCluster

Yup, I agree.  I think Cluster is too generic and can mean a lot of things (I 
think of cluster analysis myself).  Maybe something more distinctive?  Is it 
application- or domain-specific (bioinformatics, etc)?

There are a few tools with similar functionality that come to mind.  Most of 
them have catchy names; one written in Perl is Clusterflow (not on CPAN but 
here: https://github.com/ewels/clusterflow/).  Another is the (completely 
unmaintained, likely broken, but possibly useful for something) biopipe 
project: https://github.com/bioperl/bioperl-pipeline.  I have thought about 
retooling the latter to be less reliant on bioperl and more a stand-alone tool.

There are a couple Java tools also: bpipe (https://code.google.com/p/bpipe/) 
and nextflow (https://github.com/nextflow-io/nextflow).

And I agree with Alex; as you might guess based on my comment on biopipe, our 
group would be very interested in helping out on this, even if it’s at simply 
the testing phase (we run PBS/Torque locally

Re: Top level name proposal - ComputeCluster

2014-09-05 Thread Fields, Christopher J
Probably way off-topic but just a question: should a generic interface target 
something like DRMAA?  

http://en.wikipedia.org/wiki/DRMAA

That would work across most clusters as it’s a single unified API.

(there is a DRMAA module, Schedule::DRMAAc, but I believe it’s XS-based and way 
out of date; at least I could never get it to install)

chris

On Sep 5, 2014, at 9:12 AM, John Macdonald john.macdon...@oicr.on.ca wrote:

 Dana, I may be wrong here, but I think that Hadoop is one form of compute 
 cluster management software, just as SGE is.  I'm aiming to provide a generic 
 interface layer that you can use for writing code to be distributed across a 
 cluster.  By changing one parameter, cluster='Hadoop' instead of 
 cluster='SGE' your same code would run on a different type of cluster.  
 There would be limitations if you used cluster-specific capabilities, just as 
 there are the same limitations converting a database connection that uses DBI 
 to replace the underlying database platform, but *most* of the code would be 
 unaffected.  (Assuming that I get a good enough generic interface definition 
 that captures balances the requirements and capabilities of different 
 clusters well enough in a single consistent form. :-)
 
 John Macdonald
 Software Engineer
 
 Ontario Institute for Cancer Research
 MaRS Centre
 
 661 University Avenue
 
 Suite 510
 Toronto, Ontario
 
 Canada M5G 0A3
 
 
 Tel:
 
 Email: john.macdon...@oicr.on.ca
 
 Toll-free: 1-866-678-6427
 Twitter: @OICR_news
 
 
 www.oicr.on.ca
 
 This message and any attachments may contain confidential and/or privileged 
 information for the sole use of the intended recipient. Any review or 
 distribution by anyone other than the person for whom it was originally 
 intended is strictly prohibited. If you have received this message in error, 
 please contact the sender and delete all copies. Opinions, conclusions or 
 other information contained in this message may not be that of the 
 organization.
 
 
 From: Dana Hudes [dhu...@hudes.org]
 Sent: September 5, 2014 10:03 AM
 To: John Macdonald
 Cc: module-authors@perl.org
 Subject: Re: Top level name proposal - ComputeCluster
 
 So you intend to develop a new pure Perl compute cluster? Because if you just 
 need to get the job done why would you not use Hadoop whether private cluster 
 or AWS? It has a Perl APi and it will cheerfully run Perl jobs.
 Hadoop is an Apache project, open source free software with a large installed 
 base.
 
 -Original Message-
 From: John Macdonald john.macdon...@oicr.on.ca
 Date: Fri, 5 Sep 2014 13:57:47
 To: Fields, Christopher Jcjfie...@illinois.edu
 Cc: James E Keenanjk...@verizon.net; 
 module-authors@perl.orgmodule-authors@perl.org
 Subject: RE: Top level name proposal - ComputeCluster
 
 I'm intending that ComputeCluster (or whatever the final name turns out to 
 be) will be domain-agnostic at the top level interface at least.  However, my 
 lab will be using it for genome analysis pipelines, and I suspect a 
 significant proportion of the potential other users will also be in this 
 field (as shown by the repsonses on this discussion already) so there could 
 be domain-specific submodules - either within this namespace or in other 
 namespaces simply using this module set.
 
 Chris, Alex, and anyone else who is interested as a potential future 
 user/contributor, feel free to email me outside of this module-authors 
 discussion about how the actual module will develop.
 
 John Macdonald
 Software Engineer
 
 Ontario Institute for Cancer Research
 MaRS Centre
 
 661 University Avenue
 
 Suite 510
 Toronto, Ontario
 
 Canada M5G 0A3
 
 
 Tel:
 
 Email: john.macdon...@oicr.on.ca
 
 Toll-free: 1-866-678-6427
 Twitter: @OICR_news
 
 
 www.oicr.on.ca
 
 This message and any attachments may contain confidential and/or privileged 
 information for the sole use of the intended recipient. Any review or 
 distribution by anyone other than the person for whom it was originally 
 intended is strictly prohibited. If you have received this message in error, 
 please contact the sender and delete all copies. Opinions, conclusions or 
 other information contained in this message may not be that of the 
 organization.
 
 
 From: Fields, Christopher J [cjfie...@illinois.edu]
 Sent: September 5, 2014 9:47 AM
 To: John Macdonald
 Cc: James E Keenan; module-authors@perl.org
 Subject: Re: Top level name proposal - ComputeCluster
 
 Yup, I agree.  I think Cluster is too generic and can mean a lot of things (I 
 think of cluster analysis myself).  Maybe something more distinctive?  Is it 
 application- or domain-specific (bioinformatics, etc)?
 
 There are a few tools with similar functionality that come to mind.  Most of 
 them have catchy names; one written in Perl is Clusterflow (not on CPAN but 
 here: https://github.com/ewels/clusterflow/).  Another is the (completely 
 unmaintained

RE: Top level name proposal - ComputeCluster

2014-09-05 Thread John Macdonald
I looked at that a while ago.

As you say, it seems way out of date.  There were two specs - the old one was 
quite limited, the newer one looked very promising.  However, the available 
code was only for the older spec and there didn't seem to be any progress after 
that.  The spec looked like a committee design aimed at being implemented by 
teams from the companies for each of the target platforms, which made it a big 
bite to take for a single unaligned developer.

However, I should probably re-read it and see if I can get my design to fit 
their spec to the extent possible.  Stealing (er, research) is always good.

John Macdonald
Software Engineer

Ontario Institute for Cancer Research
MaRS Centre

661 University Avenue

Suite 510
Toronto, Ontario

Canada M5G 0A3


Tel:

Email: john.macdon...@oicr.on.ca

Toll-free: 1-866-678-6427
Twitter: @OICR_news


www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.


From: Fields, Christopher J [cjfie...@illinois.edu]
Sent: September 5, 2014 10:22 AM
To: John Macdonald
Cc: dhu...@hudes.org; module-authors@perl.org
Subject: Re: Top level name proposal - ComputeCluster

Probably way off-topic but just a question: should a generic interface target 
something like DRMAA?

http://en.wikipedia.org/wiki/DRMAA

That would work across most clusters as it’s a single unified API.

(there is a DRMAA module, Schedule::DRMAAc, but I believe it’s XS-based and way 
out of date; at least I could never get it to install)

chris

On Sep 5, 2014, at 9:12 AM, John Macdonald john.macdon...@oicr.on.ca wrote:

 Dana, I may be wrong here, but I think that Hadoop is one form of compute 
 cluster management software, just as SGE is.  I'm aiming to provide a generic 
 interface layer that you can use for writing code to be distributed across a 
 cluster.  By changing one parameter, cluster='Hadoop' instead of 
 cluster='SGE' your same code would run on a different type of cluster.  
 There would be limitations if you used cluster-specific capabilities, just as 
 there are the same limitations converting a database connection that uses DBI 
 to replace the underlying database platform, but *most* of the code would be 
 unaffected.  (Assuming that I get a good enough generic interface definition 
 that captures balances the requirements and capabilities of different 
 clusters well enough in a single consistent form. :-)

 John Macdonald
 Software Engineer

 Ontario Institute for Cancer Research
 MaRS Centre

 661 University Avenue

 Suite 510
 Toronto, Ontario

 Canada M5G 0A3


 Tel:

 Email: john.macdon...@oicr.on.ca

 Toll-free: 1-866-678-6427
 Twitter: @OICR_news


 www.oicr.on.ca

 This message and any attachments may contain confidential and/or privileged 
 information for the sole use of the intended recipient. Any review or 
 distribution by anyone other than the person for whom it was originally 
 intended is strictly prohibited. If you have received this message in error, 
 please contact the sender and delete all copies. Opinions, conclusions or 
 other information contained in this message may not be that of the 
 organization.

 
 From: Dana Hudes [dhu...@hudes.org]
 Sent: September 5, 2014 10:03 AM
 To: John Macdonald
 Cc: module-authors@perl.org
 Subject: Re: Top level name proposal - ComputeCluster

 So you intend to develop a new pure Perl compute cluster? Because if you just 
 need to get the job done why would you not use Hadoop whether private cluster 
 or AWS? It has a Perl APi and it will cheerfully run Perl jobs.
 Hadoop is an Apache project, open source free software with a large installed 
 base.

 -Original Message-
 From: John Macdonald john.macdon...@oicr.on.ca
 Date: Fri, 5 Sep 2014 13:57:47
 To: Fields, Christopher Jcjfie...@illinois.edu
 Cc: James E Keenanjk...@verizon.net; 
 module-authors@perl.orgmodule-authors@perl.org
 Subject: RE: Top level name proposal - ComputeCluster

 I'm intending that ComputeCluster (or whatever the final name turns out to 
 be) will be domain-agnostic at the top level interface at least.  However, my 
 lab will be using it for genome analysis pipelines, and I suspect a 
 significant proportion of the potential other users will also be in this 
 field (as shown by the repsonses on this discussion already) so there could 
 be domain-specific submodules - either within this namespace or in other 
 namespaces simply using this module set.

 Chris, Alex, and anyone else who is interested as a potential future 
 user/contributor, feel

Re: Top level name proposal - ComputeCluster

2014-09-05 Thread Arthur Corliss

On Fri, 5 Sep 2014, James E Keenan wrote:



Could that be shortened to simply:  Cluster ?



If this happens I'm claiming Cluster::Fu... well, I think you know where I'm
going with this ;-)

--Arthur Corliss
  Live Free or Die