Re: [R] Distributed computing with R

2004-06-05 Thread A.J. Rossini

No -- the point is that they are mostly orthogonal solutions, not
mutually exclusive.  Mostly implies that sometimes, 1 + 1 = 0.5
(i.e. negative interactions can happen if you do not think through
what each is doing for scheduling/job transfer/migration).

For example, you can use SGE, OpenMOSIX, and SNOW-on-PVM (or other
message passing library) all together.

SGE and OpenMOSIX might not be too happy, since they are trying to do
the same thing at different levels, but it would work (perhaps
inefficiently).

best,
-tony


Paul Gilbert [EMAIL PROTECTED] writes:

 Tony

 Thanks, this categorization has cleared up a few things I have found
 confusing. But should I read this to mean that SNOW would not run  on
 a  system or kernel level parallel setup?

 Thanks,
 Paul Gilbert

 A.J. Rossini wrote:

Also see SNOW (which simplifies parallel programming, sits on top of
rpvm, Rmpi, or a socket-based system).

Depends on whether you want parallelism on the:

1. User-level -- the libraries such as PVM, LAM-MPI, etc will help,
 and there are various packages which provide an API
 to those.

2. System-level -- then Condor, Sun Grid Engine / Maui scheduler, and
   similar queueing/batching/allocation daemons will
   help (computational grid software is usually a
   generalization of this which adds authentication
   and resource allocation).

3. Kernel-level -- then OpenMOSIX, BPROC, etc will help.

They are mostly orthogonal.  Mostly... :-).

best,
-tony



Armin Roehrl [EMAIL PROTECTED] writes:


If you do some programming, you might want to look at MPI.
R-extensions for MPI exist  (RMPI).

It all depends a lot on what kind of usage you envisage of your cluster.
Open-PBS is also a good batch system. Maybe you also want to
look at Mosix, which is a modified linux system.

Depending on what your ultimate computing ressources are,
maybe also look at IBM's Globus toolkit.

Parallel programming is fun. The world is inherently parallel!
Ciao,
-Armin.


Armin Roehrl, http://www.approximity.com
We manage risk

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html







-- 
[EMAIL PROTECTED]http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN  Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Distributed computing with R

2004-06-03 Thread Paul Gilbert
Tony
Thanks, this categorization has cleared up a few things I have found 
confusing. But should I read this to mean that SNOW would not run  on  
a  system or kernel level parallel setup?

Thanks,
Paul Gilbert
A.J. Rossini wrote:
Also see SNOW (which simplifies parallel programming, sits on top of
rpvm, Rmpi, or a socket-based system).
Depends on whether you want parallelism on the:
1. User-level -- the libraries such as PVM, LAM-MPI, etc will help,
and there are various packages which provide an API
to those.
2. System-level -- then Condor, Sun Grid Engine / Maui scheduler, and
  similar queueing/batching/allocation daemons will
  help (computational grid software is usually a
  generalization of this which adds authentication
  and resource allocation).
3. Kernel-level -- then OpenMOSIX, BPROC, etc will help.
They are mostly orthogonal.  Mostly... :-).
best,
-tony

Armin Roehrl [EMAIL PROTECTED] writes:
 

If you do some programming, you might want to look at MPI.
R-extensions for MPI exist  (RMPI).
It all depends a lot on what kind of usage you envisage of your cluster.
Open-PBS is also a good batch system. Maybe you also want to
look at Mosix, which is a modified linux system.
Depending on what your ultimate computing ressources are,
maybe also look at IBM's Globus toolkit.
Parallel programming is fun. The world is inherently parallel!
Ciao,
   -Armin.

Armin Roehrl, http://www.approximity.com
We manage risk
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
   

 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Distributed computing with R

2004-06-03 Thread Roger D. Peng
snow works well on an openMosix system, and is actually quite 
convenient since you don't have to worry about which process is going 
to which computer.  The kernel migrates the processes automatically 
(usually).

-roger
Paul Gilbert wrote:
Tony
Thanks, this categorization has cleared up a few things I have found 
confusing. But should I read this to mean that SNOW would not run  on  
a  system or kernel level parallel setup?

Thanks,
Paul Gilbert
A.J. Rossini wrote:
Also see SNOW (which simplifies parallel programming, sits on top of
rpvm, Rmpi, or a socket-based system).
Depends on whether you want parallelism on the:
1. User-level -- the libraries such as PVM, LAM-MPI, etc will help,
and there are various packages which provide an API
to those.
2. System-level -- then Condor, Sun Grid Engine / Maui scheduler, and
  similar queueing/batching/allocation daemons will
  help (computational grid software is usually a
  generalization of this which adds authentication
  and resource allocation).
3. Kernel-level -- then OpenMOSIX, BPROC, etc will help.
They are mostly orthogonal.  Mostly... :-).
best,
-tony

Armin Roehrl [EMAIL PROTECTED] writes:
 

If you do some programming, you might want to look at MPI.
R-extensions for MPI exist  (RMPI).
It all depends a lot on what kind of usage you envisage of your cluster.
Open-PBS is also a good batch system. Maybe you also want to
look at Mosix, which is a modified linux system.
Depending on what your ultimate computing ressources are,
maybe also look at IBM's Globus toolkit.
Parallel programming is fun. The world is inherently parallel!
Ciao,
   -Armin.

Armin Roehrl, http://www.approximity.com
We manage risk
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

  

 

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


RE: [R] Distributed computing with R

2004-06-02 Thread Stephen C. Upton
Soraj,

Haven't had any experience with R in a distributed computing environment,
but have used Condor (http://www.cs.wisc.edu/condor/) with several
applications. It's free, has a public license for use, is easy to use, but
is not open source. Since you can batch up R code, this might be a
reasonable option. 

HTH
steve

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Saroj Mohapatra
Sent: Wednesday, June 02, 2004 7:12 PM
To: [EMAIL PROTECTED]
Subject: [R] Distributed computing with R


Dear all,

We have started using R for data analysis since a few months and find it
useful. We are planning to acquire a high-end dedicated system for
microarray data analysis and thinking of a distributed environment. I
would appreciate if some one could send some pointers regarding how to
choose a proper hardware configuration, software (R or other software,
esp. MATLAB), issues on setting up the cluster, etc. Has anyone here
some experience of R on a cluster? Does it provide significant benefits
as regards processing time? Is setting up the cluster more difficult
than using R on it?

Thanks.

Saroj K Mohapatra, MD
Research Associate
Tainsky Lab
Karmanos Cancer Institute
Wayne State University School of Medicine
110 E. Warren, Room 311
Detroit MI 48201
313-833-0715 x2424
[EMAIL PROTECTED]

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Distributed computing with R

2004-06-02 Thread Armin Roehrl
If you do some programming, you might want to look at MPI.
R-extensions for MPI exist  (RMPI).
It all depends a lot on what kind of usage you envisage of your cluster.
Open-PBS is also a good batch system. Maybe you also want to
look at Mosix, which is a modified linux system.
Depending on what your ultimate computing ressources are,
maybe also look at IBM's Globus toolkit.
Parallel programming is fun. The world is inherently parallel!
Ciao,
   -Armin.

Armin Roehrl, http://www.approximity.com
We manage risk
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Distributed computing with R

2004-06-02 Thread A.J. Rossini

Also see SNOW (which simplifies parallel programming, sits on top of
rpvm, Rmpi, or a socket-based system).

Depends on whether you want parallelism on the:

1. User-level -- the libraries such as PVM, LAM-MPI, etc will help,
 and there are various packages which provide an API
 to those.

2. System-level -- then Condor, Sun Grid Engine / Maui scheduler, and
   similar queueing/batching/allocation daemons will
   help (computational grid software is usually a
   generalization of this which adds authentication
   and resource allocation).

3. Kernel-level -- then OpenMOSIX, BPROC, etc will help.

They are mostly orthogonal.  Mostly... :-).

best,
-tony



Armin Roehrl [EMAIL PROTECTED] writes:

 If you do some programming, you might want to look at MPI.
 R-extensions for MPI exist  (RMPI).

 It all depends a lot on what kind of usage you envisage of your cluster.
 Open-PBS is also a good batch system. Maybe you also want to
 look at Mosix, which is a modified linux system.

 Depending on what your ultimate computing ressources are,
 maybe also look at IBM's Globus toolkit.

 Parallel programming is fun. The world is inherently parallel!
 Ciao,
 -Armin.

 
 Armin Roehrl, http://www.approximity.com
 We manage risk

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


-- 
[EMAIL PROTECTED]http://www.analytics.washington.edu/ 
Biomedical and Health Informatics   University of Washington
Biostatistics, SCHARP/HVTN  Fred Hutchinson Cancer Research Center
UW (Tu/Th/F): 206-616-7630 FAX=206-543-3461 | Voicemail is unreliable
FHCRC  (M/W): 206-667-7025 FAX=206-667-4812 | use Email

CONFIDENTIALITY NOTICE: This e-mail message and any attachme...{{dropped}}

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] Distributed computing with R

2004-06-02 Thread Roger D. Peng
I would suggest installing PVM or LAM-MPI and using the R 
packages `snow' and `rpvm' (or `Rmpi').  I've found the `snow' 
package very simple to use and useful for quick and dirty 
solutions.  I've used `snow' with an openMosix setup and on a 
simple cluster of workstations without any scheduler.  openMosix 
is nice because you don't have to worry about which process goes 
where but that's not to say it doesn't have its own difficulties.
Overall, my experience with parallel computing in R has been a 
little clunky but that's mostly because the problems I work on 
don't benefit much from such a setup.

-roger
Saroj Mohapatra wrote:
Dear all,
We have started using R for data analysis since a few months and find it
useful. We are planning to acquire a high-end dedicated system for
microarray data analysis and thinking of a distributed environment. I
would appreciate if some one could send some pointers regarding how to
choose a proper hardware configuration, software (R or other software,
esp. MATLAB), issues on setting up the cluster, etc. Has anyone here
some experience of R on a cluster? Does it provide significant benefits
as regards processing time? Is setting up the cluster more difficult
than using R on it?
Thanks.
Saroj K Mohapatra, MD
Research Associate
Tainsky Lab
Karmanos Cancer Institute
Wayne State University School of Medicine
110 E. Warren, Room 311
Detroit MI 48201
313-833-0715 x2424
[EMAIL PROTECTED]
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html