Re: [R] Parallel assignments and goto

2018-02-27 Thread Thomas Mailund
I did try assign. That was the slowest version from what my profiling could 
tell, as far as I recall, which really surprised me. I had expected it to be 
the fastest. The second slowest was using the [[ operator on environments. Or 
it might be the reverse for those two. They were both slower than the other 
versions I posted here.

Cheers

On 27 Feb 2018, 17.16 +0100, Bert Gunter , wrote:
> No clue, but see ?assign perhaps if you have not done so already.
>
> -- Bert
>
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along and 
> sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
> > On Tue, Feb 27, 2018 at 6:51 AM, Thomas Mailund  
> > wrote:
> > > Interestingly, the <<- operator is also a lot faster than using a 
> > > namespace explicitly, and only slightly slower than using <- with local 
> > > variables, see below. But, surely, both must at some point insert values 
> > > in a given environment — either the local one, for <-, or an enclosing 
> > > one, for <<- — so I guess I am asking if there is a more low-level 
> > > assignment operation I can get my hands on without diving into C?
> > >
> > >
> > > factorial <- function(n, acc = 1) {
> > >     if (n == 1) acc
> > >     else factorial(n - 1, n * acc)
> > > }
> > >
> > > factorial_tr_manual <- function (n, acc = 1)
> > > {
> > >     repeat {
> > >         if (n <= 1)
> > >             return(acc)
> > >         else {
> > >             .tailr_n <- n - 1
> > >             .tailr_acc <- acc * n
> > >             n <- .tailr_n
> > >             acc <- .tailr_acc
> > >             next
> > >         }
> > >     }
> > > }
> > >
> > > factorial_tr_automatic_1 <- function(n, acc = 1) {
> > >     .tailr_n <- n
> > >     .tailr_acc <- acc
> > >     callCC(function(escape) {
> > >         repeat {
> > >             n <- .tailr_n
> > >             acc <- .tailr_acc
> > >             if (n <= 1) {
> > >                 escape(acc)
> > >             } else {
> > >                 .tailr_n <<- n - 1
> > >                 .tailr_acc <<- n * acc
> > >             }
> > >         }
> > >     })
> > > }
> > >
> > > factorial_tr_automatic_2 <- function(n, acc = 1) {
> > >     .tailr_env <- rlang::get_env()
> > >     callCC(function(escape) {
> > >         repeat {
> > >             if (n <= 1) {
> > >                 escape(acc)
> > >             } else {
> > >                 .tailr_env$.tailr_n <- n - 1
> > >                 .tailr_env$.tailr_acc <- n * acc
> > >                 .tailr_env$n <- .tailr_env$.tailr_n
> > >                 .tailr_env$acc <- .tailr_env$.tailr_acc
> > >             }
> > >         }
> > >     })
> > > }
> > >
> > > microbenchmark::microbenchmark(factorial(1000),
> > >                                factorial_tr_manual(1000),
> > >                                factorial_tr_automatic_1(1000),
> > >                                factorial_tr_automatic_2(1000))
> > > Unit: microseconds
> > >                            expr     min      lq      mean   median        
> > > uq      max neval
> > >                 factorial(1000) 884.137 942.060 1076.3949 977.6235 
> > > 1042.5035 2889.779   100
> > >       factorial_tr_manual(1000) 110.215 116.919  130.2337 118.7350  
> > > 122.7495  255.062   100
> > >  factorial_tr_automatic_1(1000) 179.897 183.437  212.8879 187.8250  
> > > 195.7670  979.352   100
> > >  factorial_tr_automatic_2(1000) 508.353 534.328  601.9643 560.7830  
> > > 587.8350 1424.260   100
> > >
> > > Cheers
> > >
> > > On 26 Feb 2018, 21.12 +0100, Thomas Mailund , 
> > > wrote:
> > > > Following up on this attempt of implementing the tail-recursion 
> > > > optimisation — now that I’ve finally had the chance to look at it again 
> > > > — I find that non-local return implemented with callCC doesn’t actually 
> > > > incur much overhead once I do it more sensibly. I haven’t found a good 
> > > > way to handle parallel assignments that isn’t vastly slower than simply 
> > > > introducing extra variables, so I am going with that solution. However, 
> > > > I have now run into another problem involving those local variables — 
> > > > and assigning to local variables in general.
> > > >
> > > > Consider again the factorial function and three different ways of 
> > > > implementing it using the tail recursion optimisation:
> > > >
> > > > factorial <- function(n, acc = 1) {
> > > >     if (n == 1) acc
> > > >     else factorial(n - 1, n * acc)
> > > > }
> > > >
> > > > factorial_tr_manual <- function (n, acc = 1)
> > > > {
> > > >     repeat {
> > > >         if (n <= 1)
> > > >             return(acc)
> > > >         else {
> > > >             .tailr_n <- n - 1
> > > >             .tailr_acc <- acc * n
> > > >             n <- .tailr_n
> > > >             acc <- .tailr_acc
> > > >             next
> > > >         }
> > > >     }
> > > > }
> > > >
> > > > 

Re: [R] Parallel assignments and goto

2018-02-27 Thread Bert Gunter
No clue, but see ?assign perhaps if you have not done so already.

-- Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Tue, Feb 27, 2018 at 6:51 AM, Thomas Mailund 
wrote:

> Interestingly, the <<- operator is also a lot faster than using a
> namespace explicitly, and only slightly slower than using <- with local
> variables, see below. But, surely, both must at some point insert values in
> a given environment — either the local one, for <-, or an enclosing one,
> for <<- — so I guess I am asking if there is a more low-level assignment
> operation I can get my hands on without diving into C?
>
>
> factorial <- function(n, acc = 1) {
> if (n == 1) acc
> else factorial(n - 1, n * acc)
> }
>
> factorial_tr_manual <- function (n, acc = 1)
> {
> repeat {
> if (n <= 1)
> return(acc)
> else {
> .tailr_n <- n - 1
> .tailr_acc <- acc * n
> n <- .tailr_n
> acc <- .tailr_acc
> next
> }
> }
> }
>
> factorial_tr_automatic_1 <- function(n, acc = 1) {
> .tailr_n <- n
> .tailr_acc <- acc
> callCC(function(escape) {
> repeat {
> n <- .tailr_n
> acc <- .tailr_acc
> if (n <= 1) {
> escape(acc)
> } else {
> .tailr_n <<- n - 1
> .tailr_acc <<- n * acc
> }
> }
> })
> }
>
> factorial_tr_automatic_2 <- function(n, acc = 1) {
> .tailr_env <- rlang::get_env()
> callCC(function(escape) {
> repeat {
> if (n <= 1) {
> escape(acc)
> } else {
> .tailr_env$.tailr_n <- n - 1
> .tailr_env$.tailr_acc <- n * acc
> .tailr_env$n <- .tailr_env$.tailr_n
> .tailr_env$acc <- .tailr_env$.tailr_acc
> }
> }
> })
> }
>
> microbenchmark::microbenchmark(factorial(1000),
>factorial_tr_manual(1000),
>factorial_tr_automatic_1(1000),
>factorial_tr_automatic_2(1000))
> Unit: microseconds
>expr min  lq  mean   median
>  uq  max neval
> factorial(1000) 884.137 942.060 1076.3949 977.6235
> 1042.5035 2889.779   100
>   factorial_tr_manual(1000) 110.215 116.919  130.2337 118.7350
>  122.7495  255.062   100
>  factorial_tr_automatic_1(1000) 179.897 183.437  212.8879 187.8250
>  195.7670  979.352   100
>  factorial_tr_automatic_2(1000) 508.353 534.328  601.9643 560.7830
>  587.8350 1424.260   100
>
> Cheers
>
> On 26 Feb 2018, 21.12 +0100, Thomas Mailund ,
> wrote:
> > Following up on this attempt of implementing the tail-recursion
> optimisation — now that I’ve finally had the chance to look at it again — I
> find that non-local return implemented with callCC doesn’t actually incur
> much overhead once I do it more sensibly. I haven’t found a good way to
> handle parallel assignments that isn’t vastly slower than simply
> introducing extra variables, so I am going with that solution. However, I
> have now run into another problem involving those local variables — and
> assigning to local variables in general.
> >
> > Consider again the factorial function and three different ways of
> implementing it using the tail recursion optimisation:
> >
> > factorial <- function(n, acc = 1) {
> > if (n == 1) acc
> > else factorial(n - 1, n * acc)
> > }
> >
> > factorial_tr_manual <- function (n, acc = 1)
> > {
> > repeat {
> > if (n <= 1)
> > return(acc)
> > else {
> > .tailr_n <- n - 1
> > .tailr_acc <- acc * n
> > n <- .tailr_n
> > acc <- .tailr_acc
> > next
> > }
> > }
> > }
> >
> > factorial_tr_automatic_1 <- function(n, acc = 1) {
> > callCC(function(escape) {
> > repeat {
> > if (n <= 1) {
> > escape(acc)
> > } else {
> > .tailr_n <- n - 1
> > .tailr_acc <- n * acc
> > n <- .tailr_n
> > acc <- .tailr_acc
> > }
> > }
> > })
> > }
> >
> > factorial_tr_automatic_2 <- function(n, acc = 1) {
> > .tailr_env <- rlang::get_env()
> > callCC(function(escape) {
> > repeat {
> > if (n <= 1) {
> > escape(acc)
> > } else {
> > .tailr_env$.tailr_n <- n - 1
> > .tailr_env$.tailr_acc <- n * acc
> > .tailr_env$n <- .tailr_env$.tailr_n
> > .tailr_env$acc <- .tailr_env$.tailr_acc
> > }
> > }
> > })
> > }
> >
> > The factorial_tr_manual function is how I would 

Re: [R] Parallel assignments and goto

2018-02-27 Thread Thomas Mailund
Interestingly, the <<- operator is also a lot faster than using a namespace 
explicitly, and only slightly slower than using <- with local variables, see 
below. But, surely, both must at some point insert values in a given 
environment — either the local one, for <-, or an enclosing one, for <<- — so I 
guess I am asking if there is a more low-level assignment operation I can get 
my hands on without diving into C?


factorial <- function(n, acc = 1) {
    if (n == 1) acc
    else factorial(n - 1, n * acc)
}

factorial_tr_manual <- function (n, acc = 1)
{
    repeat {
        if (n <= 1)
            return(acc)
        else {
            .tailr_n <- n - 1
            .tailr_acc <- acc * n
            n <- .tailr_n
            acc <- .tailr_acc
            next
        }
    }
}

factorial_tr_automatic_1 <- function(n, acc = 1) {
    .tailr_n <- n
    .tailr_acc <- acc
    callCC(function(escape) {
        repeat {
            n <- .tailr_n
            acc <- .tailr_acc
            if (n <= 1) {
                escape(acc)
            } else {
                .tailr_n <<- n - 1
                .tailr_acc <<- n * acc
            }
        }
    })
}

factorial_tr_automatic_2 <- function(n, acc = 1) {
    .tailr_env <- rlang::get_env()
    callCC(function(escape) {
        repeat {
            if (n <= 1) {
                escape(acc)
            } else {
                .tailr_env$.tailr_n <- n - 1
                .tailr_env$.tailr_acc <- n * acc
                .tailr_env$n <- .tailr_env$.tailr_n
                .tailr_env$acc <- .tailr_env$.tailr_acc
            }
        }
    })
}

microbenchmark::microbenchmark(factorial(1000),
                               factorial_tr_manual(1000),
                               factorial_tr_automatic_1(1000),
                               factorial_tr_automatic_2(1000))
Unit: microseconds
                           expr     min      lq      mean   median        uq    
  max neval
                factorial(1000) 884.137 942.060 1076.3949 977.6235 1042.5035 
2889.779   100
      factorial_tr_manual(1000) 110.215 116.919  130.2337 118.7350  122.7495  
255.062   100
 factorial_tr_automatic_1(1000) 179.897 183.437  212.8879 187.8250  195.7670  
979.352   100
 factorial_tr_automatic_2(1000) 508.353 534.328  601.9643 560.7830  587.8350 
1424.260   100

Cheers

On 26 Feb 2018, 21.12 +0100, Thomas Mailund , wrote:
> Following up on this attempt of implementing the tail-recursion optimisation 
> — now that I’ve finally had the chance to look at it again — I find that 
> non-local return implemented with callCC doesn’t actually incur much overhead 
> once I do it more sensibly. I haven’t found a good way to handle parallel 
> assignments that isn’t vastly slower than simply introducing extra variables, 
> so I am going with that solution. However, I have now run into another 
> problem involving those local variables — and assigning to local variables in 
> general.
>
> Consider again the factorial function and three different ways of 
> implementing it using the tail recursion optimisation:
>
> factorial <- function(n, acc = 1) {
>     if (n == 1) acc
>     else factorial(n - 1, n * acc)
> }
>
> factorial_tr_manual <- function (n, acc = 1)
> {
>     repeat {
>         if (n <= 1)
>             return(acc)
>         else {
>             .tailr_n <- n - 1
>             .tailr_acc <- acc * n
>             n <- .tailr_n
>             acc <- .tailr_acc
>             next
>         }
>     }
> }
>
> factorial_tr_automatic_1 <- function(n, acc = 1) {
>     callCC(function(escape) {
>         repeat {
>             if (n <= 1) {
>                 escape(acc)
>             } else {
>                 .tailr_n <- n - 1
>                 .tailr_acc <- n * acc
>                 n <- .tailr_n
>                 acc <- .tailr_acc
>             }
>         }
>     })
> }
>
> factorial_tr_automatic_2 <- function(n, acc = 1) {
>     .tailr_env <- rlang::get_env()
>     callCC(function(escape) {
>         repeat {
>             if (n <= 1) {
>                 escape(acc)
>             } else {
>                 .tailr_env$.tailr_n <- n - 1
>                 .tailr_env$.tailr_acc <- n * acc
>                 .tailr_env$n <- .tailr_env$.tailr_n
>                 .tailr_env$acc <- .tailr_env$.tailr_acc
>             }
>         }
>     })
> }
>
> The factorial_tr_manual function is how I would implement the function 
> manually while factorial_tr_automatic_1 is what my package used to come up 
> with. It handles non-local returns, because this is something I need in 
> general. Finally, factorial_tr_automatic_2 accesses the local variables 
> explicitly through the environment, which is what my package currently 
> produces.
>
> The difference between supporting non-local returns and not is tiny, but 
> explicitly accessing variables through their environment costs me about a 
> factor of five — something that surprised me.
>
> > 

Re: [R] Parallel assignments and goto

2018-02-26 Thread Thomas Mailund
Following up on this attempt of implementing the tail-recursion optimisation — 
now that I’ve finally had the chance to look at it again — I find that 
non-local return implemented with callCC doesn’t actually incur much overhead 
once I do it more sensibly. I haven’t found a good way to handle parallel 
assignments that isn’t vastly slower than simply introducing extra variables, 
so I am going with that solution. However, I have now run into another problem 
involving those local variables — and assigning to local variables in general.

Consider again the factorial function and three different ways of implementing 
it using the tail recursion optimisation:

factorial <- function(n, acc = 1) {
    if (n == 1) acc
    else factorial(n - 1, n * acc)
}

factorial_tr_manual <- function (n, acc = 1)
{
    repeat {
        if (n <= 1)
            return(acc)
        else {
            .tailr_n <- n - 1
            .tailr_acc <- acc * n
            n <- .tailr_n
            acc <- .tailr_acc
            next
        }
    }
}

factorial_tr_automatic_1 <- function(n, acc = 1) {
    callCC(function(escape) {
        repeat {
            if (n <= 1) {
                escape(acc)
            } else {
                .tailr_n <- n - 1
                .tailr_acc <- n * acc
                n <- .tailr_n
                acc <- .tailr_acc
            }
        }
    })
}

factorial_tr_automatic_2 <- function(n, acc = 1) {
    .tailr_env <- rlang::get_env()
    callCC(function(escape) {
        repeat {
            if (n <= 1) {
                escape(acc)
            } else {
                .tailr_env$.tailr_n <- n - 1
                .tailr_env$.tailr_acc <- n * acc
                .tailr_env$n <- .tailr_env$.tailr_n
                .tailr_env$acc <- .tailr_env$.tailr_acc
            }
        }
    })
}

The factorial_tr_manual function is how I would implement the function manually 
while factorial_tr_automatic_1 is what my package used to come up with. It 
handles non-local returns, because this is something I need in general. 
Finally, factorial_tr_automatic_2 accesses the local variables explicitly 
through the environment, which is what my package currently produces.

The difference between supporting non-local returns and not is tiny, but 
explicitly accessing variables through their environment costs me about a 
factor of five — something that surprised me.

> microbenchmark::microbenchmark(factorial(1000),
+                                factorial_tr_manual(1000),
+                                factorial_tr_automatic_1(1000),
+                                factorial_tr_automatic_2(1000))
Unit: microseconds
                           expr     min       lq     mean   median
                factorial(1000) 756.357 810.4135 963.1040 856.3315
      factorial_tr_manual(1000) 104.838 119.7595 198.7347 129.0870
 factorial_tr_automatic_1(1000) 112.354 125.5145 211.6148 135.5255
 factorial_tr_automatic_2(1000) 461.015 544.7035 688.5988 565.3240
       uq      max neval
 945.3110 4149.099   100
 136.8200 4190.331   100
 152.9625 5944.312   100
 600.5235 7798.622   100

The simple solution, of course, is to not do that, but then I can’t handle 
expressions inside calls to “with”. And I would really like to, because then I 
can combine tail recursion with pattern matching.

I can define linked lists and a length function on them like this:

library(pmatch)
llist := NIL | CONS(car, cdr : llist)

llength <- function(llist, acc = 0) {
    cases(llist,
          NIL -> acc,
          CONS(car, cdr) -> llength(cdr, acc + 1))
}

The tail-recursion I get out of transforming this function looks like this:

llength_tr <- function (llist, acc = 0) {
    .tailr_env <- rlang::get_env()
    callCC(function(escape) {
        repeat {
            if (!rlang::is_null(..match_env <- test_pattern(llist,
                                                            NIL)))
                with(..match_env, escape(acc))

            else if (!rlang::is_null(..match_env <-
                                     test_pattern(llist, CONS(car, cdr
                with(..match_env, {
                    .tailr_env$.tailr_llist <- cdr
                    .tailr_env$.tailr_acc <- acc + 1
                    .tailr_env$llist <- .tailr_env$.tailr_llist
                    .tailr_env$acc <- .tailr_env$.tailr_acc
                })
        }
    })
}

Maybe not the prettiest code, but you are not supposed to actually see it, of 
course.

There is not much gain in speed

Unit: milliseconds
                   expr      min       lq     mean   median       uq
    llength(test_llist) 70.74605 76.08734 87.78418 85.81193 94.66378
 llength_tr(test_llist) 45.16946 51.56856 59.09306 57.00101 63.07044
      max neval
 182.4894   100
 166.6990   100

but you don’t run out of stack space

> llength(make_llist(1000))
Error: evaluation nested too deeply: infinite recursion / options(expressions=)?
Error during wrapup: C stack usage  7990648 is too close to the 

Re: [R] Parallel assignments and goto

2018-02-14 Thread Fox, John
Dear Thomas,

This looks like a really interesting project, and I don't think that anyone 
responded to your message, though I may be mistaken.

I took at a look at implementing parallel assignment, and came up with:

passign <- function(..., envir=parent.frame()){
exprs <- list(...)
vars <- names(exprs)
exprs <- lapply(exprs, FUN=eval, envir=envir)
for (i in seq_along(exprs)){
assign(vars[i], exprs[[i]], envir=envir)
}
}

For example,

> fun <- function(){
+ a <- 10
+ passign(a=1, b=a + 2, c=3)
+ cat("a =", a, " b =", b, " c =", c, "\n")
+ }
> fun()
a = 1  b = 12  c = 3

This proves to be faster than what you tried, but still much slower than using 
a local variable (or variables) -- see below. I wouldn't be surprised if 
someone can come up with a faster implementation, but I suspect that the 
overhead of function calls will be hard to overcome. BTW, a version of my 
passign() that uses mapply() in place of a for loop (not shown) is even slower.

> factorial_tr_3 <- function (n, acc = 1) {
+ repeat {
+ if (n <= 1) 
+ return(acc)
+ else {
+ passign(n = n - 1, acc = acc * n)
+ next
+ }
+ }
+ }

> microbenchmark::microbenchmark(factorial(100),
+ factorial_tr_1(100),
+ factorial_tr_2(100),
+ factorial_tr_3(100))
Unit: microseconds
expr   minlq   mean medianuq   
max neval cld
  factorial(100)55.00969.290   100.4507   104.5515   131.174   
228.496   100 a  
 factorial_tr_1(100)10.22711.63714.496713.753015.515
89.565   100 a  
 factorial_tr_2(100) 21523.751 23038.417 24477.1734 24058.3635 25041.988 
45814.136   100   c
 factorial_tr_3(100)   806.789   861.797   914.3651   879.9565   925.444  
2139.329   100  b

Best,
 John

-
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
Web: socialsciences.mcmaster.ca/jfox/




> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Thomas
> Mailund
> Sent: Sunday, February 11, 2018 10:49 AM
> To: r-help@r-project.org
> Subject: [R] Parallel assignments and goto
> 
> Hi guys,
> 
> I am working on some code for automatically translating recursive functions 
> into
> looping functions to implemented tail-recursion optimisations. See
> https://github.com/mailund/tailr
> 
> As a toy-example, consider the factorial function
> 
> factorial <- function(n, acc = 1) {
> if (n <= 1) acc
> else factorial(n - 1, acc * n)
> }
> 
> I can automatically translate this into the loop-version
> 
> factorial_tr_1 <- function (n, acc = 1) {
> repeat {
> if (n <= 1)
> return(acc)
> else {
> .tailr_n <- n - 1
> .tailr_acc <- acc * acc
> n <- .tailr_n
> acc <- .tailr_acc
> next
> }
> }
> }
> 
> which will run faster and not have problems with recursion depths. However,
> I’m not entirely happy with this version for two reasons: I am not happy with
> introducing the temporary variables and this rewrite will not work if I try to
> over-scope an evaluation context.
> 
> I have two related questions, one related to parallel assignments — i.e.
> expressions to variables so the expression uses the old variable values and 
> not
> the new values until the assignments are all done — and one related to
> restarting a loop from nested loops or from nested expressions in `with`
> expressions or similar.
> 
> I can implement parallel assignment using something like rlang::env_bind:
> 
> factorial_tr_2 <- function (n, acc = 1) {
> .tailr_env <- rlang::get_env()
> repeat {
> if (n <= 1)
> return(acc)
> else {
> rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
> next
> }
> }
> }
> 
> This reduces the number of additional variables I need to one, but is a 
> couple of
> orders of magnitude slower than the first version.
> 
> > microbenchmark::microbenchmark(factorial(100),
> +factorial_tr_1(100),
> +factorial_tr_2(100))
> Unit: microseconds
>  expr  min   lq   meanmedian   uq 
>  max neval
>   factorial(100)   53.978   60.543   77.76203   71.0635   85.947  180.251 
>   100
>  factorial_tr_1(100)9.0229.903   11.52563   11.0430   11.984   28.464
> 100
>  factorial_tr_2(100) 5870.565 6109.905 6534.13607 6320.4830 6756.463
> 8177.635   100
> 
> 
> Is there another way to do parallel assignments that doesn’t cost this much in
> running time?
> 
> My other problem is the use of `next`. I would like to combine tail-recursion
> optimisation with pattern matching as in https://github.com/mailund/pmatch
> where I can, for example, define a linked list like this:
> 
> devtools::install_github("mailund/pmatch”)
> library(pmatch)
> llist := 

Re: [R] Parallel assignments and goto

2018-02-11 Thread Thomas Mailund
I admit I didn’t know about Recall, but you are right, there is no direct 
support for this tail-recursion optimisation. For good reasons — it would break 
a lot of NSE. I am not attempting to solve tail-recursion optimisation for all 
cases. That wouldn’t work by just rewriting functions. It might be doable with 
JIT or something like that, but my goal is less ambitious.

Using local, though, might be an approach. I will play around with that 
tomorrow.

Cheers

On 11 Feb 2018, 18.19 +0100, David Winsemius , wrote:
>
> > On Feb 11, 2018, at 7:48 AM, Thomas Mailund  
> > wrote:
> >
> > Hi guys,
> >
> > I am working on some code for automatically translating recursive functions 
> > into looping functions to implemented tail-recursion optimisations. See 
> > https://github.com/mailund/tailr
> >
> > As a toy-example, consider the factorial function
> >
> > factorial <- function(n, acc = 1) {
> > if (n <= 1) acc
> > else factorial(n - 1, acc * n)
> > }
> >
> > I can automatically translate this into the loop-version
> >
> > factorial_tr_1 <- function (n, acc = 1)
> > {
> > repeat {
> > if (n <= 1)
> > return(acc)
> > else {
> > .tailr_n <- n - 1
> > .tailr_acc <- acc * acc
> > n <- .tailr_n
> > acc <- .tailr_acc
> > next
> > }
> > }
> > }
> >
> > which will run faster and not have problems with recursion depths. However, 
> > I’m not entirely happy with this version for two reasons: I am not happy 
> > with introducing the temporary variables and this rewrite will not work if 
> > I try to over-scope an evaluation context.
> >
> > I have two related questions, one related to parallel assignments — i.e. 
> > expressions to variables so the expression uses the old variable values and 
> > not the new values until the assignments are all done — and one related to 
> > restarting a loop from nested loops or from nested expressions in `with` 
> > expressions or similar.
> >
> > I can implement parallel assignment using something like rlang::env_bind:
> >
> > factorial_tr_2 <- function (n, acc = 1)
> > {
> > .tailr_env <- rlang::get_env()
> > repeat {
> > if (n <= 1)
> > return(acc)
> > else {
> > rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
> > next
> > }
> > }
> > }
> >
> > This reduces the number of additional variables I need to one, but is a 
> > couple of orders of magnitude slower than the first version.
> >
> > > microbenchmark::microbenchmark(factorial(100),
> > + factorial_tr_1(100),
> > + factorial_tr_2(100))
> > Unit: microseconds
> > expr min lq mean median uq max neval
> > factorial(100) 53.978 60.543 77.76203 71.0635 85.947 180.251 100
> > factorial_tr_1(100) 9.022 9.903 11.52563 11.0430 11.984 28.464 100
> > factorial_tr_2(100) 5870.565 6109.905 6534.13607 6320.4830 6756.463 
> > 8177.635 100
> >
> >
> > Is there another way to do parallel assignments that doesn’t cost this much 
> > in running time?
> >
> > My other problem is the use of `next`. I would like to combine 
> > tail-recursion optimisation with pattern matching as in 
> > https://github.com/mailund/pmatch where I can, for example, define a linked 
> > list like this:
> >
> > devtools::install_github("mailund/pmatch”)
> > library(pmatch)
> > llist := NIL | CONS(car, cdr : llist)
> >
> > and define a function for computing the length of a list like this:
> >
> > list_length <- function(lst, acc = 0) {
> > force(acc)
> > cases(lst,
> > NIL -> acc,
> > CONS(car, cdr) -> list_length(cdr, acc + 1))
> > }
> >
> > The `cases` function creates an environment that binds variables in a 
> > pattern-description that over-scopes the expression to the right of `->`, 
> > so the recursive call in this example have access to the variables `cdr` 
> > and `car`.
> >
> > I can transform a `cases` call to one that creates the environment 
> > containing the bound variables and then evaluate this using `eval` or 
> > `with`, but in either case, a call to `next` will not work in such a 
> > context. The expression will be evaluated inside `bind` or `with`, and not 
> > in the `list_lenght` function.
> >
> > A version that *will* work, is something like this
> >
> > factorial_tr_3 <- function (n, acc = 1)
> > {
> > .tailr_env <- rlang::get_env()
> > .tailr_frame <- rlang::current_frame()
> > repeat {
> > if (n <= 1)
> > rlang::return_from(.tailr_frame, acc)
> > else {
> > rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
> > rlang::return_to(.tailr_frame)
> > }
> > }
> > }
> >
> > Here, again, for the factorial function since this is easier to follow than 
> > the list-length function.
> >
> > This solution will also work if you return values from inside loops, where 
> > `next` wouldn’t work either.
> >
> > Using `rlang::return_from` and `rlang::return_to` implements the right 
> > semantics, but costs me another order of magnitude in running time.
> >
> > microbenchmark::microbenchmark(factorial(100),
> > factorial_tr_1(100),
> > factorial_tr_2(100),
> > factorial_tr_3(100))
> > Unit: 

Re: [R] Parallel assignments and goto

2018-02-11 Thread David Winsemius

> On Feb 11, 2018, at 7:48 AM, Thomas Mailund  wrote:
> 
> Hi guys,
> 
> I am working on some code for automatically translating recursive functions 
> into looping functions to implemented tail-recursion optimisations. See 
> https://github.com/mailund/tailr
> 
> As a toy-example, consider the factorial function
> 
> factorial <- function(n, acc = 1) {
>if (n <= 1) acc
>else factorial(n - 1, acc * n)
> }
> 
> I can automatically translate this into the loop-version
> 
> factorial_tr_1 <- function (n, acc = 1) 
> {
>repeat {
>if (n <= 1) 
>return(acc)
>else {
>.tailr_n <- n - 1
>.tailr_acc <- acc * acc
>n <- .tailr_n
>acc <- .tailr_acc
>next
>}
>}
> }
> 
> which will run faster and not have problems with recursion depths. However, 
> I’m not entirely happy with this version for two reasons: I am not happy with 
> introducing the temporary variables and this rewrite will not work if I try 
> to over-scope an evaluation context.
> 
> I have two related questions, one related to parallel assignments — i.e. 
> expressions to variables so the expression uses the old variable values and 
> not the new values until the assignments are all done — and one related to 
> restarting a loop from nested loops or from nested expressions in `with` 
> expressions or similar.
> 
> I can implement parallel assignment using something like rlang::env_bind:
> 
> factorial_tr_2 <- function (n, acc = 1) 
> {
>.tailr_env <- rlang::get_env()
>repeat {
>if (n <= 1) 
>return(acc)
>else {
>rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
>next
>}
>}
> }
> 
> This reduces the number of additional variables I need to one, but is a 
> couple of orders of magnitude slower than the first version.
> 
>> microbenchmark::microbenchmark(factorial(100),
> +factorial_tr_1(100),
> +factorial_tr_2(100))
> Unit: microseconds
> expr  min   lq   meanmedian   uq  
> max neval
>  factorial(100)   53.978   60.543   77.76203   71.0635   85.947  180.251  
>  100
> factorial_tr_1(100)9.0229.903   11.52563   11.0430   11.984   28.464  
>  100
> factorial_tr_2(100) 5870.565 6109.905 6534.13607 6320.4830 6756.463 8177.635  
>  100
> 
> 
> Is there another way to do parallel assignments that doesn’t cost this much 
> in running time?
> 
> My other problem is the use of `next`. I would like to combine tail-recursion 
> optimisation with pattern matching as in https://github.com/mailund/pmatch 
> where I can, for example, define a linked list like this:
> 
> devtools::install_github("mailund/pmatch”)
> library(pmatch)
> llist := NIL | CONS(car, cdr : llist)
> 
> and define a function for computing the length of a list like this:
> 
> list_length <- function(lst, acc = 0) {
>  force(acc)
>  cases(lst,
>NIL -> acc,
>CONS(car, cdr) -> list_length(cdr, acc + 1))
> }
> 
> The `cases` function creates an environment that binds variables in a 
> pattern-description that over-scopes the expression to the right of `->`, so 
> the recursive call in this example have access to the variables `cdr` and 
> `car`.
> 
> I can transform a `cases` call to one that creates the environment containing 
> the bound variables and then evaluate this using `eval` or `with`, but in 
> either case, a call to `next` will not work in such a context. The expression 
> will be evaluated inside `bind` or `with`, and not in the `list_lenght` 
> function.
> 
> A version that *will* work, is something like this
> 
> factorial_tr_3 <- function (n, acc = 1) 
> {
>.tailr_env <- rlang::get_env()
>.tailr_frame <- rlang::current_frame()
>repeat {
>if (n <= 1) 
>rlang::return_from(.tailr_frame, acc)
>else {
>rlang::env_bind(.tailr_env, n = n - 1, acc = acc * n)
>rlang::return_to(.tailr_frame)
>}
>}
> }
> 
> Here, again, for the factorial function since this is easier to follow than 
> the list-length function.
> 
> This solution will also work if you return values from inside loops, where 
> `next` wouldn’t work either.
> 
> Using `rlang::return_from` and `rlang::return_to` implements the right 
> semantics, but costs me another order of magnitude in running time.
> 
> microbenchmark::microbenchmark(factorial(100),
>   factorial_tr_1(100),
>   factorial_tr_2(100),
>   factorial_tr_3(100))
> Unit: microseconds
>expr   min lqmean medianuq 
>max neval
>  factorial(100)52.47960.264093.4306967.513083.925   
> 2062.481   100
> factorial_tr_1(100) 8.875 9.652549.1959510.694511.217   
> 3818.823   100
>