Re: [Rd] Process to Incorporate Functions from {parallely} into base R's {parallel} package

2020-11-11 Thread Martin Maechler
> Duncan Murdoch 
> on Sat, 7 Nov 2020 15:44:32 -0500 writes:

> If these are easy changes, maybe someone will incorporate
> them.  You'll make the argument stronger for doing that if
> you can explain why it's better to do that than to keep
> them in parallely.

> Duncan Murdoch

Thank you, Duncan, Henrik, and James Joseph.

>From reading, I agree that this is something worth updating in
R's own `parallel` (and I have tried and checked it does not
break our own  'make check-all').

Henrik (or anyone): Is there a small repr.ex. I could add to
parallel/tests/*.R which will show the advantage of allowing an
empty 'user'  here?

Martin Maechler


> On 07/11/2020 1:39 p.m., Henrik Bengtsson wrote:
>> FWIW, there are indeed a few low hanging bug fixes in
>> 'parallelly' that should be easy to incorporate into
>> 'parallel' without adding extra maintenance.  For
>> example, in parallel::makePSOCKcluster(), it is not
>> possible to disable SSH option '-l USER' so that it can
>> be set in ~/.ssh/config.  The remote user name will be
>> the user name of your local machine and if you try to set
>> user=NULL, you'll end up with an invalid SSH call.  The
>> current behavior means that you are forced to specify the
>> remote user name in your R code.  All that it takes is to
>> fix this is to update:
>> 
>> cmd <- paste(rshcmd, "-l", user, machine, cmd)
>> 
>> to something like:
>> 
>> cmd <- paste(rshcmd, if (length(user) == 1L) paste("-l",
>> user), machine, cmd)
>> 
>> This is one example of what I've patched in
>> parallelly::makeClusterPSOCK() over the years.  Another
>> is the use of reverse tunneling in SSH - that completely
>> avoids the need to know and specify your public IP and
>> reconfiguring the firewalls from the remote server back
>> to your local machine so that the worker can connect back
>> to your local machine.  Not many users have the
>> permission to reconfigure firewalls and it's also
>> extremely tedious.  Reverse SSH tunneling is super
>> simply; all you need to to is something like:
>> 
>> rshopts <- c(sprintf("-R %d:%s:%d", rscript_port, master,
>> port), rshopts)
>> 
>> /Henrik
>> 
>> On Fri, Nov 6, 2020 at 4:37 PM Duncan Murdoch
>>  wrote:
>>> 
>>> On 06/11/2020 4:47 p.m., Balamuta, James Joseph wrote:
 Hi all,
 
 Henrik Bengtsson has done some fantastic work with
 {future} and, more importantly, greatly improved
 constructing and deconstructing a parallelized
 environment within R. It was with great joy that I saw
 Henrik slowly split off some functionality of {future}
 into {parallelly} package. Reading over the package’s
 README, he states:
 
> The functions and features added to this package are
> written to be backward compatible with the parallel
> package, such that they may be incorporated there
> later.  The parallelly package comes with an open
> invitation for the R Core Team to adopt all or parts
> of its code into the parallel package.
 
 https://github.com/HenrikBengtsson/parallelly
 
 I’m wondering what the appropriate process would be to
 slowly merge some functions from {parallelly} into the
 base R {parallel} package. Should this be done with
 targeted issues on Bugzilla for different fields Henrik
 has identified? Or would an omnibus patch bringing in
 all suggested modifications be preferred? Or is it best
 to discuss via the list-serv appropriate contributions?
>>> 
>>> One way is to convince R Core that incorporating this
>>> into the parallel package would
>>> 
>>> - make less work for them, or - add a lot to R that
>>> couldn't happen if it was a contributed package.
>>> 
>>> The fact that it's good isn't a good reason to put it
>>> into a base package, which would largely mean
>>> transferring Henrik's workload to R Core.  There are
>>> lots of good packages, and their maintainers should
>>> continue to maintain them.
>>> 
>>> Duncan Murdoch
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Process to Incorporate Functions from {parallely} into base R's {parallel} package

2020-11-07 Thread Duncan Murdoch
If these are easy changes, maybe someone will incorporate them.  You'll 
make the argument stronger for doing that if you can explain why it's 
better to do that than to keep them in parallely.


Duncan Murdoch

On 07/11/2020 1:39 p.m., Henrik Bengtsson wrote:

FWIW, there are indeed a few low hanging bug fixes in 'parallelly'
that should be easy to incorporate into 'parallel' without adding
extra maintenance.  For example, in parallel::makePSOCKcluster(), it
is not possible to disable SSH option '-l USER' so that it can be set
in ~/.ssh/config.  The remote user name will be the user name of your
local machine and if you try to set user=NULL, you'll end up with an
invalid SSH call.   The current behavior means that you are forced to
specify the remote user name in your R code.  All that it takes is to
fix this is to update:

   cmd <- paste(rshcmd, "-l", user, machine, cmd)

to something like:

   cmd <- paste(rshcmd, if (length(user) == 1L) paste("-l", user), machine, cmd)

This is one example of what I've patched in
parallelly::makeClusterPSOCK() over the years.  Another is the use of
reverse tunneling in SSH - that completely avoids the need to know and
specify your public IP and reconfiguring the firewalls from the remote
server back to your local machine so that the worker can connect back
to your local machine.  Not many users have the permission to
reconfigure firewalls and it's also extremely tedious.  Reverse SSH
tunneling is super simply; all you need to to is something like:

rshopts <- c(sprintf("-R %d:%s:%d", rscript_port, master, port), rshopts)

/Henrik

On Fri, Nov 6, 2020 at 4:37 PM Duncan Murdoch  wrote:


On 06/11/2020 4:47 p.m., Balamuta, James Joseph wrote:

Hi all,

Henrik Bengtsson has done some fantastic work with {future} and, more 
importantly, greatly improved constructing and deconstructing a parallelized 
environment within R. It was with great joy that I saw Henrik slowly split off 
some functionality of {future} into {parallelly} package. Reading over the 
package’s README, he states:


The functions and features added to this package are written to be backward 
compatible with the parallel package, such that they may be incorporated there 
later.
The parallelly package comes with an open invitation for the R Core Team to 
adopt all or parts of its code into the parallel package.


https://github.com/HenrikBengtsson/parallelly

I’m wondering what the appropriate process would be to slowly merge some 
functions from {parallelly} into the base R {parallel} package. Should this be 
done with targeted issues on Bugzilla for different fields Henrik has 
identified? Or would an omnibus patch bringing in all suggested modifications 
be preferred? Or is it best to discuss via the list-serv appropriate 
contributions?


One way is to convince R Core that incorporating this into the parallel
package would

   - make less work for them, or
   - add a lot to R that couldn't happen if it was a contributed package.

The fact that it's good isn't a good reason to put it into a base
package, which would largely mean transferring Henrik's workload to R
Core.  There are lots of good packages, and their maintainers should
continue to maintain them.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Process to Incorporate Functions from {parallely} into base R's {parallel} package

2020-11-07 Thread Henrik Bengtsson
FWIW, there are indeed a few low hanging bug fixes in 'parallelly'
that should be easy to incorporate into 'parallel' without adding
extra maintenance.  For example, in parallel::makePSOCKcluster(), it
is not possible to disable SSH option '-l USER' so that it can be set
in ~/.ssh/config.  The remote user name will be the user name of your
local machine and if you try to set user=NULL, you'll end up with an
invalid SSH call.   The current behavior means that you are forced to
specify the remote user name in your R code.  All that it takes is to
fix this is to update:

  cmd <- paste(rshcmd, "-l", user, machine, cmd)

to something like:

  cmd <- paste(rshcmd, if (length(user) == 1L) paste("-l", user), machine, cmd)

This is one example of what I've patched in
parallelly::makeClusterPSOCK() over the years.  Another is the use of
reverse tunneling in SSH - that completely avoids the need to know and
specify your public IP and reconfiguring the firewalls from the remote
server back to your local machine so that the worker can connect back
to your local machine.  Not many users have the permission to
reconfigure firewalls and it's also extremely tedious.  Reverse SSH
tunneling is super simply; all you need to to is something like:

rshopts <- c(sprintf("-R %d:%s:%d", rscript_port, master, port), rshopts)

/Henrik

On Fri, Nov 6, 2020 at 4:37 PM Duncan Murdoch  wrote:
>
> On 06/11/2020 4:47 p.m., Balamuta, James Joseph wrote:
> > Hi all,
> >
> > Henrik Bengtsson has done some fantastic work with {future} and, more 
> > importantly, greatly improved constructing and deconstructing a 
> > parallelized environment within R. It was with great joy that I saw Henrik 
> > slowly split off some functionality of {future} into {parallelly} package. 
> > Reading over the package’s README, he states:
> >
> >> The functions and features added to this package are written to be 
> >> backward compatible with the parallel package, such that they may be 
> >> incorporated there later.
> >> The parallelly package comes with an open invitation for the R Core Team 
> >> to adopt all or parts of its code into the parallel package.
> >
> > https://github.com/HenrikBengtsson/parallelly
> >
> > I’m wondering what the appropriate process would be to slowly merge some 
> > functions from {parallelly} into the base R {parallel} package. Should this 
> > be done with targeted issues on Bugzilla for different fields Henrik has 
> > identified? Or would an omnibus patch bringing in all suggested 
> > modifications be preferred? Or is it best to discuss via the list-serv 
> > appropriate contributions?
>
> One way is to convince R Core that incorporating this into the parallel
> package would
>
>   - make less work for them, or
>   - add a lot to R that couldn't happen if it was a contributed package.
>
> The fact that it's good isn't a good reason to put it into a base
> package, which would largely mean transferring Henrik's workload to R
> Core.  There are lots of good packages, and their maintainers should
> continue to maintain them.
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Process to Incorporate Functions from {parallely} into base R's {parallel} package

2020-11-06 Thread Duncan Murdoch

On 06/11/2020 4:47 p.m., Balamuta, James Joseph wrote:

Hi all,

Henrik Bengtsson has done some fantastic work with {future} and, more 
importantly, greatly improved constructing and deconstructing a parallelized 
environment within R. It was with great joy that I saw Henrik slowly split off 
some functionality of {future} into {parallelly} package. Reading over the 
package’s README, he states:


The functions and features added to this package are written to be backward 
compatible with the parallel package, such that they may be incorporated there 
later.
The parallelly package comes with an open invitation for the R Core Team to 
adopt all or parts of its code into the parallel package.


https://github.com/HenrikBengtsson/parallelly

I’m wondering what the appropriate process would be to slowly merge some 
functions from {parallelly} into the base R {parallel} package. Should this be 
done with targeted issues on Bugzilla for different fields Henrik has 
identified? Or would an omnibus patch bringing in all suggested modifications 
be preferred? Or is it best to discuss via the list-serv appropriate 
contributions?


One way is to convince R Core that incorporating this into the parallel 
package would


 - make less work for them, or
 - add a lot to R that couldn't happen if it was a contributed package.

The fact that it's good isn't a good reason to put it into a base 
package, which would largely mean transferring Henrik's workload to R 
Core.  There are lots of good packages, and their maintainers should 
continue to maintain them.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Process to Incorporate Functions from {parallely} into base R's {parallel} package

2020-11-06 Thread Balamuta, James Joseph
Hi all,

Henrik Bengtsson has done some fantastic work with {future} and, more 
importantly, greatly improved constructing and deconstructing a parallelized 
environment within R. It was with great joy that I saw Henrik slowly split off 
some functionality of {future} into {parallelly} package. Reading over the 
package’s README, he states:

> The functions and features added to this package are written to be backward 
> compatible with the parallel package, such that they may be incorporated 
> there later.
> The parallelly package comes with an open invitation for the R Core Team to 
> adopt all or parts of its code into the parallel package.

https://github.com/HenrikBengtsson/parallelly

I’m wondering what the appropriate process would be to slowly merge some 
functions from {parallelly} into the base R {parallel} package. Should this be 
done with targeted issues on Bugzilla for different fields Henrik has 
identified? Or would an omnibus patch bringing in all suggested modifications 
be preferred? Or is it best to discuss via the list-serv appropriate 
contributions?

Best,

JJB

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel