Thanks for the detailed and referenced response!  More below.

--Chris


On Wed, Dec 24, 2014 at 6:31 PM, Zakariyya Mughal <[email protected]>
wrote:

> On 2014-12-24 at 09:55:46 -0500, Chris Marshall wrote:
> > Very cool!  Thanks for expanding the space of perl and PDL computation!
> In
> > your work, did you determine anything PDL3 would need to do a better job
> to
> > support using R from perl?
> >
>
> Sure, there were a couple things that would have been nice to have:
>
> For Data::Frame,
>
> - It's a small thing, but a way to "plug-in" to the stringification for
>   PDL subclasses would make implementing subclasses easier. Right
>   now, PDL's `string` method is a bit of a black-box because it
>   stringifies all the elements at once. Instead, I had to write my own
>   string1d function [^stringifiable].
>

Good point.  This is one of the reasons to start PDL3 from the core out so
that things like stringification can be handled naturally in subclasses.


> - Make a hash-based PDL the default. While using the `initialize` function
>   combined with `FOREIGNBUILDARGS` is an easy way to get PDL working
>   with Moo[se], it is extra code [^moo-hash-pdl].
>

Yes, and maybe a tweak to the logic for PDL-2.x so that it works seamlessly
with PDL3 stuff.


> - It might be useful to have annotations of all functions that do not
>   change the values of elements. I am using that for enum-like data
>   where I want the levels (the possible values of the enum) to be copied
>   over to new enum-like PDLs. So I wrap the following methods:
>
>       around qw(slice uniq dice) => sub { ... };
>
>   but I'm not sure if that covers everything [^around-enum].
>

Interesting.  It seems like this is meta-info support for PDL specific
operations.  I wonder if there are other examples where similar function
meta-data is being used for this type of problem?


>
>   My thoughts on this: perhaps the PDL class has too many methods by
>   default. There should be a way to pare that down using roles, but
>   deciding what goes in each role does not seem straightforward to me at
>   this time.
>

The current PDL module is based on the kitchen sink approach for
computation in that all of the standard features are always imported and
made available.  There is an existing feature request (or bug) to fix the
import/export handling of PDL so that it matches expected usages for perl
modules.  It seems to me that the kitchen sink set of modules is more
appropriate for interactive use rather than programming.  We could add
support for that in the PDL shells.  On the other hand, with as many
functions as PDL has, it could be a bit of a pain if things are spun out
into enclosing namespaces that are too small.


>
> For Statistics::NiceR,
>
> - The way that R stores data is inside a SEXP C structure. You can reach
>   inside and get at the data by using a macro which points to the memory
>   address like:
>
>        SEXP r_sexp_integer, r_sexp_real;
>        INTEGER(r_sexp_integer)[ idx ] /* access the int32_t value at idx */
>
>        REAL(r_sexp_real)[ idx ] /* access the double value at idx */
>
>   Currently, I'm just using memcpy() to get the R data into a PDL. I
>   haven't used pdl_wrap() on the R data yet, but I plan to soon. But
>   what I'm wondering is: can I change the way PDL allocates data so that
>   it will create the R's SEXP C structure in the background — perhaps
>   limited to a scope? This might be YAGNI, but it might have
>   implications for things like GPU support. Instead of having to
>   explicitly create GPU arrays all the time, there should be a way of
>   indicating that a piece of code will be using a different allocator
>   than usual.
>

The aspects of PDL3 relevant to this are:
* Improved type support including general, user defined ones
* Make PDL data and computation usable from C or Perl
* PDL-2.x already has some of this in a hand-rolled form

I took a quick look at the R internals and SEXP stuff.  It looks a lot like
the R flavor of Perl's SV*


> - Speaking of different allocation types, it might be useful to look at
>   how other tools extend their built-in types. I'll give some R
>   examples:
>
>   - R's bigmemory <
> http://cran.r-project.org/web/packages/bigmemory/index.html>,
>     <http://www.stat.yale.edu/~mjk56/temp/bigmemory-vignette.pdf>,
>     <
> http://2013.hpcs.ca/wp-content/uploads/2013/07/HPCS2013-Parallel-Work-with-R.pdf
> >.
>
>     Not only does this support mmap'ed files (like
> PDL::IO::{FastRaw,FlexRaw}),
>     but they also have associated packages that have specialised
>     versions things like linear regression (in biglm) and k-means
>     clustering (in biganalytics).
>
>   - R's GMP <http://cran.r-project.org/web/packages/gmp/index.html>.
>
>     It's a wrapper for the GMP library for big integers/rationals, but
>     it also lets you create matrices of big numbers which can be used
>     for solving a system of equations (solve.bigz).
>

Thanks for the references.  I would like to see PDL's type support improved
and having use cases to exercise against any ideas for the new architecture
are a help.



> [^stringifiable]: Role that lets elements stringify themselves
>                   <
> https://github.com/zmughal/p5-Data-Frame/blob/master/lib/PDL/Role/Stringifiable.pm
> >.
>
> [^moo-hash-pdl]: <
> https://github.com/zmughal/p5-Data-Frame/blob/master/lib/PDL/Factor.pm>
> has the following code:
>
>     use Moo;
>     extends 'PDL';
>     around new => sub {
>         my $orig = shift;
>         my ($class, @args) = @_;
>         # snip...
>         unshift @args, _data => $enum;
>         my $self = $orig->($class, @args);
>         # snip...
>     }
>
>     sub FOREIGNBUILDARGS {
>         my ($self, %args) = @_;
>         ( $args{_data} );
>     }
>
>     sub initialize {
>         bless { PDL => PDL::null() }, shift;
>     }
>
> [^around-enum]: <
> https://github.com/zmughal/p5-Data-Frame/blob/master/lib/PDL/Role/Enumerable.pm#L46
> >.
>
> Cheers,
> - Zaki Mughal
>
>
> > --Chris
> >
> > On Tue, Dec 23, 2014 at 9:19 PM, Zakariyya Mughal <[email protected]
> >
> > wrote:
> >
> > > Hi everyone,
> > >
> > > I have (finally) uploaded modules for working with the R interpreter
> > > with Perl. The CPAN links are below, but to get a taste of what the API
> > > looks like, check out my blog post <
> > >
> http://enetdown.org/dot-plan/posts/2014/12/24/a_fast_and_natural_interface_to_R_from_Perl/
> > > >.
> > >
> > > - Statistics::NiceR <http://p3rl.org/Statistics::NiceR>
> > > - Data::Frame <http://p3rl.org/Data::Frame>
> > >
> > > I'd love to have feedback on how to improve them.
> > >
> > > Regards and happy hacking,
> > > - Zaki Mughal
> > >
> > > _______________________________________________
> > > Perldl mailing list
> > > [email protected]
> > > http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
> > >
>
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to