On 2014-12-24 at 09:55:46 -0500, Chris Marshall wrote:
> Very cool!  Thanks for expanding the space of perl and PDL computation!  In
> your work, did you determine anything PDL3 would need to do a better job to
> support using R from perl?
> 

Sure, there were a couple things that would have been nice to have:

For Data::Frame,

- It's a small thing, but a way to "plug-in" to the stringification for
  PDL subclasses would make implementing subclasses easier. Right
  now, PDL's `string` method is a bit of a black-box because it
  stringifies all the elements at once. Instead, I had to write my own
  string1d function [^stringifiable].

- Make a hash-based PDL the default. While using the `initialize` function
  combined with `FOREIGNBUILDARGS` is an easy way to get PDL working
  with Moo[se], it is extra code [^moo-hash-pdl].

- It might be useful to have annotations of all functions that do not
  change the values of elements. I am using that for enum-like data
  where I want the levels (the possible values of the enum) to be copied
  over to new enum-like PDLs. So I wrap the following methods:

      around qw(slice uniq dice) => sub { ... };

  but I'm not sure if that covers everything [^around-enum].

  My thoughts on this: perhaps the PDL class has too many methods by
  default. There should be a way to pare that down using roles, but
  deciding what goes in each role does not seem straightforward to me at
  this time.

For Statistics::NiceR,

- The way that R stores data is inside a SEXP C structure. You can reach
  inside and get at the data by using a macro which points to the memory
  address like:

       SEXP r_sexp_integer, r_sexp_real;
       INTEGER(r_sexp_integer)[ idx ] /* access the int32_t value at idx */

       REAL(r_sexp_real)[ idx ] /* access the double value at idx */

  Currently, I'm just using memcpy() to get the R data into a PDL. I
  haven't used pdl_wrap() on the R data yet, but I plan to soon. But
  what I'm wondering is: can I change the way PDL allocates data so that
  it will create the R's SEXP C structure in the background — perhaps
  limited to a scope? This might be YAGNI, but it might have
  implications for things like GPU support. Instead of having to
  explicitly create GPU arrays all the time, there should be a way of
  indicating that a piece of code will be using a different allocator
  than usual.

- Speaking of different allocation types, it might be useful to look at
  how other tools extend their built-in types. I'll give some R
  examples:

  - R's bigmemory <http://cran.r-project.org/web/packages/bigmemory/index.html>,
    <http://www.stat.yale.edu/~mjk56/temp/bigmemory-vignette.pdf>,
    
<http://2013.hpcs.ca/wp-content/uploads/2013/07/HPCS2013-Parallel-Work-with-R.pdf>.

    Not only does this support mmap'ed files (like PDL::IO::{FastRaw,FlexRaw}),
    but they also have associated packages that have specialised
    versions things like linear regression (in biglm) and k-means
    clustering (in biganalytics).

  - R's GMP <http://cran.r-project.org/web/packages/gmp/index.html>.

    It's a wrapper for the GMP library for big integers/rationals, but
    it also lets you create matrices of big numbers which can be used
    for solving a system of equations (solve.bigz).


[^stringifiable]: Role that lets elements stringify themselves
                  
<https://github.com/zmughal/p5-Data-Frame/blob/master/lib/PDL/Role/Stringifiable.pm>.

[^moo-hash-pdl]: 
<https://github.com/zmughal/p5-Data-Frame/blob/master/lib/PDL/Factor.pm> has 
the following code:

    use Moo;
    extends 'PDL';
    around new => sub {
        my $orig = shift;
        my ($class, @args) = @_;
        # snip...
        unshift @args, _data => $enum;
        my $self = $orig->($class, @args);
        # snip...
    }

    sub FOREIGNBUILDARGS {
        my ($self, %args) = @_;
        ( $args{_data} );
    }

    sub initialize {
        bless { PDL => PDL::null() }, shift;
    }

[^around-enum]: 
<https://github.com/zmughal/p5-Data-Frame/blob/master/lib/PDL/Role/Enumerable.pm#L46>.

Cheers,
- Zaki Mughal


> --Chris
> 
> On Tue, Dec 23, 2014 at 9:19 PM, Zakariyya Mughal <[email protected]>
> wrote:
> 
> > Hi everyone,
> >
> > I have (finally) uploaded modules for working with the R interpreter
> > with Perl. The CPAN links are below, but to get a taste of what the API
> > looks like, check out my blog post <
> > http://enetdown.org/dot-plan/posts/2014/12/24/a_fast_and_natural_interface_to_R_from_Perl/
> > >.
> >
> > - Statistics::NiceR <http://p3rl.org/Statistics::NiceR>
> > - Data::Frame <http://p3rl.org/Data::Frame>
> >
> > I'd love to have feedback on how to improve them.
> >
> > Regards and happy hacking,
> > - Zaki Mughal
> >
> > _______________________________________________
> > Perldl mailing list
> > [email protected]
> > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
> >

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to