Re: [Perldl] PDL::Tiny --- what should be in it?

2015-01-11 Thread David Mertens
Hey Chris,

I think if we try to extract and (minimally) generalize the Prima object
system, we'll give Stevan a highly performant C-based object system upon
which to build p5-mop. If ever there was a time to introduce a minimal C
object system into the Perl core, p5-mop would be it.

As an added bonus, if we get involved in this sort of effort, we can lend
more manpower to the effort, which has usually been a two-man show. This
will increase the likelihood that p5-mop gets fully implemented, and bring
some more awareness to PDL.

But then again, I've spoken about Prima's object system and not
successfully extracted it (yet). Eo is written, a known and tested quantity.

It just strikes me as a profound coincidence that p5-mop still hasn't been
finalized, and we're bandying about the notion of a new C-based object
system for PDL. I could be wrong, but it seems to me that this is a moment
to be seized. Why not merge forces?

David

On Sun, Jan 11, 2015 at 3:36 PM, Chris Marshall 
wrote:

> Hi David-
>
> I think if we start trying to get something in the perl5 core we'll
> rediscover the pain that Stevan Little found.  Right now my thought was to
> use the existing perl5 MOP (i.e., Mo[o[se]) to generate the PDL::Tiny
> classes and using that information to generate the C object
> binding/implementation.  I'm looking at the Enlightenment Object model as a
> starting point for the C object model to avoid re-inventing the wheel.  One
> nice thing there is that the EO library can be called from either C or
> *real* C++ code so you can have the best of both worlds without the problem
> of forcing the use of a specific C++ compiler everywhere
>
> --Chris
>
>
> On 1/8/2015 20:41, David Mertens wrote:
>
>> Hey Chris, porters,
>>
>> I was thinking again about this project. One thing that occurs to me is
>> that p5mop-redux, Stevan Little's attempt at creating something like Moose
>> that could be pushed into core Perl, has been stalled for many months. I am
>> not sure if p5mop-redux has a very good C API; indeed, I am not sure if it
>> has a C object API at all. I wonder if we might consider stepping in an
>> lending a hand to help build a C object API, which would serve as the
>> foundation for the mop.
>>
>> If we had a solid C object system with the high potential of getting
>> pushed into the core, we would be in excellent shape to create the next
>> generation of PDL.
>>
>> Thoughts?
>> David
>>
>>
>


-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan
___
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


Re: [Perldl] PDL::Tiny --- what should be in it?

2015-01-11 Thread Chris Marshall

Hi David-

I think if we start trying to get something in the perl5 core we'll 
rediscover the pain that Stevan Little found.  Right now my thought was 
to use the existing perl5 MOP (i.e., Mo[o[se]) to generate the PDL::Tiny 
classes and using that information to generate the C object 
binding/implementation.  I'm looking at the Enlightenment Object model 
as a starting point for the C object model to avoid re-inventing the 
wheel.  One nice thing there is that the EO library can be called from 
either C or *real* C++ code so you can have the best of both worlds 
without the problem of forcing the use of a specific C++ compiler 
everywhere


--Chris

On 1/8/2015 20:41, David Mertens wrote:

Hey Chris, porters,

I was thinking again about this project. One thing that occurs to me 
is that p5mop-redux, Stevan Little's attempt at creating something 
like Moose that could be pushed into core Perl, has been stalled for 
many months. I am not sure if p5mop-redux has a very good C API; 
indeed, I am not sure if it has a C object API at all. I wonder if we 
might consider stepping in an lending a hand to help build a C object 
API, which would serve as the foundation for the mop.


If we had a solid C object system with the high potential of getting 
pushed into the core, we would be in excellent shape to create the 
next generation of PDL.


Thoughts?
David




___
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


Re: [Perldl] PDL::Tiny --- what should be in it?

2015-01-08 Thread David Mertens
Wrong! There's another! https://github.com/stevan/p5-mop-again-seriously-wtf,
which appears to be the most recent (and follows the older
https://github.com/stevan/p5-mop-XS).

And in case you wondered why Stevan Little's work on p5-mop hit a weird
stall:
http://blogs.perl.org/users/stevan_little/2014/05/on-prototyping-in-public-part-duex.html

On Thu, Jan 8, 2015 at 8:43 PM, David Mertens 
wrote:

> Link: https://github.com/stevan/p5-mop-redux
>
> On Thu, Jan 8, 2015 at 8:41 PM, David Mertens 
> wrote:
>
>> Hey Chris, porters,
>>
>> I was thinking again about this project. One thing that occurs to me is
>> that p5mop-redux, Stevan Little's attempt at creating something like Moose
>> that could be pushed into core Perl, has been stalled for many months. I am
>> not sure if p5mop-redux has a very good C API; indeed, I am not sure if it
>> has a C object API at all. I wonder if we might consider stepping in an
>> lending a hand to help build a C object API, which would serve as the
>> foundation for the mop.
>>
>> If we had a solid C object system with the high potential of getting
>> pushed into the core, we would be in excellent shape to create the next
>> generation of PDL.
>>
>> Thoughts?
>> David
>>
>> On Mon, Dec 15, 2014 at 2:45 PM, Chris Marshall 
>> wrote:
>>
>>> Agreed.
>>>
>>> The need to avoid cache-busting code and poor performance is one
>>> motivation for JIT compiling to avoid memory sloshing fromfunction
>>> pointer->pointer->pointers.  Implementing benchmarks andperformance metrics
>>> alongside the new development will beessential to avoiding unnecessary
>>> performance bottlenecks and to determine the right level to compute at...
>>>
>>> --Chris
>>>
>>> On Mon, Dec 15, 2014 at 2:00 PM, David Mertens >> > wrote:

 Something that I think will be critical, especially if we start
 JIT-compiling stuff or allowing for subclassing, is the customized code
 could lead to a performance hit if it leads to code cache misses. I
 recently came across a great explanation here:
 http://igoro.com/archive/gallery-of-processor-cache-effects/

 One of the files in the Perl interpreter's core code is called
 pp_hot.c. According to comments at the top of the file, these functions are
 consolidated into a single c (and later object) file to "encourage CPU
 cache hits on hot code." If we create more and more code paths that get
 executed, we increase the time spent loading the machine code into the L1
 cache, and we also increase the likelihood of evicting parts of pp_hot and
 other important execution paths.

 David

 On Mon, Dec 15, 2014 at 12:45 PM, David Mertens <
 dcmertens.p...@gmail.com> wrote:
>
> FWIW, it looks like Julia views are like affine slices in PDL. As I
> have said before, almost nothing out there has the equivalent of
> non-contiguous, non-strided support like we get with which, where, and
> their ilk. GSL vectors do not, either. Matlab only supports it as a
> temporary object, and eliminates it after the line has executed. Not sure
> about Numpy here.
>
> David
>
> On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall <
> devel.chm...@gmail.com> wrote:
>
>> > On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
>> zaki.mug...@gmail.com> wrote:
>> >
>> > ...snip...
>> >
>> > ## Levels of measurement
>> >
>> >   When using R, one of the nice things it does is warn or give
>> >   an error when you try to do an operation that would be invalid on
>> a certain
>> >   type of data. One such type of data is categorical data, which R
>> calls
>> >   factors and for which I made a subclass of PDL called
>> PDL::Factor. Some of
>> >   this behvaviour is inspired by the statistical methodology of
>> levels of
>> >   measurement .
>> I believe
>> >   SAS even explicitly allows assigning levels of measurment to
>> variables.
>>
>> +1, it would be nice if new PDL types supported varying
>> levels of computation including by levels of measurement
>>
>> > ...snip...
>> >
>> >   `NA` is R's equivalent of `BAD` values. For `mean()` this makes
>> sense for
>> >   categorical data. For logical vectors, it does something else:
>>
>> I would like to see more generalized support for bad value computions
>> since in some cases BAD is used for missing, in others BAD is used
>> for invalid,...
>>
>> > ...snip...
>> >
>> >   Thinking in terms of levels of measurement can help with another
>> experiment
>> >   I'm doing which based around tracking the units of measure used
>> for numerical
>> >   things in Perl. Code is here <
>> https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
>> >.
>> >
>> >   What I do there is use Moo roles to add a unit att

Re: [Perldl] PDL::Tiny --- what should be in it?

2015-01-08 Thread David Mertens
Link: https://github.com/stevan/p5-mop-redux

On Thu, Jan 8, 2015 at 8:41 PM, David Mertens 
wrote:

> Hey Chris, porters,
>
> I was thinking again about this project. One thing that occurs to me is
> that p5mop-redux, Stevan Little's attempt at creating something like Moose
> that could be pushed into core Perl, has been stalled for many months. I am
> not sure if p5mop-redux has a very good C API; indeed, I am not sure if it
> has a C object API at all. I wonder if we might consider stepping in an
> lending a hand to help build a C object API, which would serve as the
> foundation for the mop.
>
> If we had a solid C object system with the high potential of getting
> pushed into the core, we would be in excellent shape to create the next
> generation of PDL.
>
> Thoughts?
> David
>
> On Mon, Dec 15, 2014 at 2:45 PM, Chris Marshall 
> wrote:
>
>> Agreed.
>>
>> The need to avoid cache-busting code and poor performance is one
>> motivation for JIT compiling to avoid memory sloshing fromfunction
>> pointer->pointer->pointers.  Implementing benchmarks andperformance metrics
>> alongside the new development will beessential to avoiding unnecessary
>> performance bottlenecks and to determine the right level to compute at...
>>
>> --Chris
>>
>> On Mon, Dec 15, 2014 at 2:00 PM, David Mertens 
>> wrote:
>>>
>>> Something that I think will be critical, especially if we start
>>> JIT-compiling stuff or allowing for subclassing, is the customized code
>>> could lead to a performance hit if it leads to code cache misses. I
>>> recently came across a great explanation here:
>>> http://igoro.com/archive/gallery-of-processor-cache-effects/
>>>
>>> One of the files in the Perl interpreter's core code is called pp_hot.c.
>>> According to comments at the top of the file, these functions are
>>> consolidated into a single c (and later object) file to "encourage CPU
>>> cache hits on hot code." If we create more and more code paths that get
>>> executed, we increase the time spent loading the machine code into the L1
>>> cache, and we also increase the likelihood of evicting parts of pp_hot and
>>> other important execution paths.
>>>
>>> David
>>>
>>> On Mon, Dec 15, 2014 at 12:45 PM, David Mertens <
>>> dcmertens.p...@gmail.com> wrote:

 FWIW, it looks like Julia views are like affine slices in PDL. As I
 have said before, almost nothing out there has the equivalent of
 non-contiguous, non-strided support like we get with which, where, and
 their ilk. GSL vectors do not, either. Matlab only supports it as a
 temporary object, and eliminates it after the line has executed. Not sure
 about Numpy here.

 David

 On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall <
 devel.chm...@gmail.com> wrote:

> > On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
> zaki.mug...@gmail.com> wrote:
> >
> > ...snip...
> >
> > ## Levels of measurement
> >
> >   When using R, one of the nice things it does is warn or give
> >   an error when you try to do an operation that would be invalid on
> a certain
> >   type of data. One such type of data is categorical data, which R
> calls
> >   factors and for which I made a subclass of PDL called PDL::Factor.
> Some of
> >   this behvaviour is inspired by the statistical methodology of
> levels of
> >   measurement .
> I believe
> >   SAS even explicitly allows assigning levels of measurment to
> variables.
>
> +1, it would be nice if new PDL types supported varying
> levels of computation including by levels of measurement
>
> > ...snip...
> >
> >   `NA` is R's equivalent of `BAD` values. For `mean()` this makes
> sense for
> >   categorical data. For logical vectors, it does something else:
>
> I would like to see more generalized support for bad value computions
> since in some cases BAD is used for missing, in others BAD is used
> for invalid,...
>
> > ...snip...
> >
> >   Thinking in terms of levels of measurement can help with another
> experiment
> >   I'm doing which based around tracking the units of measure used
> for numerical
> >   things in Perl. Code is here <
> https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
> >.
> >
> >   What I do there is use Moo roles to add a unit attribute to
> numerical types
> >   (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
> through an
> >   operation by either operator overloading or calling a function
> such as
> >   `sum()`, the unit will be carried along with it and be manipulated
> >   appropriately (you can take the mean of Kelvin, but not degrees
> Celsius). I
> >   know that units of measure are messy to implement, but being able
> to support
> >   auxiliary operations like this will

Re: [Perldl] PDL::Tiny --- what should be in it?

2015-01-08 Thread David Mertens
Hey Chris, porters,

I was thinking again about this project. One thing that occurs to me is
that p5mop-redux, Stevan Little's attempt at creating something like Moose
that could be pushed into core Perl, has been stalled for many months. I am
not sure if p5mop-redux has a very good C API; indeed, I am not sure if it
has a C object API at all. I wonder if we might consider stepping in an
lending a hand to help build a C object API, which would serve as the
foundation for the mop.

If we had a solid C object system with the high potential of getting pushed
into the core, we would be in excellent shape to create the next generation
of PDL.

Thoughts?
David

On Mon, Dec 15, 2014 at 2:45 PM, Chris Marshall 
wrote:

> Agreed.
>
> The need to avoid cache-busting code and poor performance is one
> motivation for JIT compiling to avoid memory sloshing fromfunction
> pointer->pointer->pointers.  Implementing benchmarks andperformance metrics
> alongside the new development will beessential to avoiding unnecessary
> performance bottlenecks and to determine the right level to compute at...
>
> --Chris
>
> On Mon, Dec 15, 2014 at 2:00 PM, David Mertens 
> wrote:
>>
>> Something that I think will be critical, especially if we start
>> JIT-compiling stuff or allowing for subclassing, is the customized code
>> could lead to a performance hit if it leads to code cache misses. I
>> recently came across a great explanation here:
>> http://igoro.com/archive/gallery-of-processor-cache-effects/
>>
>> One of the files in the Perl interpreter's core code is called pp_hot.c.
>> According to comments at the top of the file, these functions are
>> consolidated into a single c (and later object) file to "encourage CPU
>> cache hits on hot code." If we create more and more code paths that get
>> executed, we increase the time spent loading the machine code into the L1
>> cache, and we also increase the likelihood of evicting parts of pp_hot and
>> other important execution paths.
>>
>> David
>>
>> On Mon, Dec 15, 2014 at 12:45 PM, David Mertens > > wrote:
>>>
>>> FWIW, it looks like Julia views are like affine slices in PDL. As I have
>>> said before, almost nothing out there has the equivalent of non-contiguous,
>>> non-strided support like we get with which, where, and their ilk. GSL
>>> vectors do not, either. Matlab only supports it as a temporary object, and
>>> eliminates it after the line has executed. Not sure about Numpy here.
>>>
>>> David
>>>
>>> On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall >> > wrote:
>>>
 > On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
 zaki.mug...@gmail.com> wrote:
 >
 > ...snip...
 >
 > ## Levels of measurement
 >
 >   When using R, one of the nice things it does is warn or give
 >   an error when you try to do an operation that would be invalid on a
 certain
 >   type of data. One such type of data is categorical data, which R
 calls
 >   factors and for which I made a subclass of PDL called PDL::Factor.
 Some of
 >   this behvaviour is inspired by the statistical methodology of
 levels of
 >   measurement .
 I believe
 >   SAS even explicitly allows assigning levels of measurment to
 variables.

 +1, it would be nice if new PDL types supported varying
 levels of computation including by levels of measurement

 > ...snip...
 >
 >   `NA` is R's equivalent of `BAD` values. For `mean()` this makes
 sense for
 >   categorical data. For logical vectors, it does something else:

 I would like to see more generalized support for bad value computions
 since in some cases BAD is used for missing, in others BAD is used
 for invalid,...

 > ...snip...
 >
 >   Thinking in terms of levels of measurement can help with another
 experiment
 >   I'm doing which based around tracking the units of measure used for
 numerical
 >   things in Perl. Code is here <
 https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
 >.
 >
 >   What I do there is use Moo roles to add a unit attribute to
 numerical types
 >   (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
 through an
 >   operation by either operator overloading or calling a function such
 as
 >   `sum()`, the unit will be carried along with it and be manipulated
 >   appropriately (you can take the mean of Kelvin, but not degrees
 Celsius). I
 >   know that units of measure are messy to implement, but being able
 to support
 >   auxiliary operations like this will go a long way to making PDL
 flexible.

 Yes!  The use of method modifiers offer some powerful development
 tools to implement various high level features.  I'm hoping that
 it can be used to augment core functionality to support many of
 the more powerful or flexibl

Re: [Perldl] PDL::Tiny --- what should be in it?

2014-12-15 Thread Chris Marshall
Agreed.

The need to avoid cache-busting code and poor performance is one motivation
for JIT compiling to avoid memory sloshing fromfunction
pointer->pointer->pointers.  Implementing benchmarks andperformance metrics
alongside the new development will beessential to avoiding unnecessary
performance bottlenecks and to determine the right level to compute at...

--Chris

On Mon, Dec 15, 2014 at 2:00 PM, David Mertens 
wrote:
>
> Something that I think will be critical, especially if we start
> JIT-compiling stuff or allowing for subclassing, is the customized code
> could lead to a performance hit if it leads to code cache misses. I
> recently came across a great explanation here:
> http://igoro.com/archive/gallery-of-processor-cache-effects/
>
> One of the files in the Perl interpreter's core code is called pp_hot.c.
> According to comments at the top of the file, these functions are
> consolidated into a single c (and later object) file to "encourage CPU
> cache hits on hot code." If we create more and more code paths that get
> executed, we increase the time spent loading the machine code into the L1
> cache, and we also increase the likelihood of evicting parts of pp_hot and
> other important execution paths.
>
> David
>
> On Mon, Dec 15, 2014 at 12:45 PM, David Mertens 
> wrote:
>>
>> FWIW, it looks like Julia views are like affine slices in PDL. As I have
>> said before, almost nothing out there has the equivalent of non-contiguous,
>> non-strided support like we get with which, where, and their ilk. GSL
>> vectors do not, either. Matlab only supports it as a temporary object, and
>> eliminates it after the line has executed. Not sure about Numpy here.
>>
>> David
>>
>> On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall 
>> wrote:
>>
>>> > On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
>>> zaki.mug...@gmail.com> wrote:
>>> >
>>> > ...snip...
>>> >
>>> > ## Levels of measurement
>>> >
>>> >   When using R, one of the nice things it does is warn or give
>>> >   an error when you try to do an operation that would be invalid on a
>>> certain
>>> >   type of data. One such type of data is categorical data, which R
>>> calls
>>> >   factors and for which I made a subclass of PDL called PDL::Factor.
>>> Some of
>>> >   this behvaviour is inspired by the statistical methodology of levels
>>> of
>>> >   measurement . I
>>> believe
>>> >   SAS even explicitly allows assigning levels of measurment to
>>> variables.
>>>
>>> +1, it would be nice if new PDL types supported varying
>>> levels of computation including by levels of measurement
>>>
>>> > ...snip...
>>> >
>>> >   `NA` is R's equivalent of `BAD` values. For `mean()` this makes
>>> sense for
>>> >   categorical data. For logical vectors, it does something else:
>>>
>>> I would like to see more generalized support for bad value computions
>>> since in some cases BAD is used for missing, in others BAD is used
>>> for invalid,...
>>>
>>> > ...snip...
>>> >
>>> >   Thinking in terms of levels of measurement can help with another
>>> experiment
>>> >   I'm doing which based around tracking the units of measure used for
>>> numerical
>>> >   things in Perl. Code is here <
>>> https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
>>> >.
>>> >
>>> >   What I do there is use Moo roles to add a unit attribute to
>>> numerical types
>>> >   (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
>>> through an
>>> >   operation by either operator overloading or calling a function such
>>> as
>>> >   `sum()`, the unit will be carried along with it and be manipulated
>>> >   appropriately (you can take the mean of Kelvin, but not degrees
>>> Celsius). I
>>> >   know that units of measure are messy to implement, but being able to
>>> support
>>> >   auxiliary operations like this will go a long way to making PDL
>>> flexible.
>>>
>>> Yes!  The use of method modifiers offer some powerful development
>>> tools to implement various high level features.  I'm hoping that
>>> it can be used to augment core functionality to support many of
>>> the more powerful or flexible features such as JIT compiling, GPU
>>> computation, distributed computation,...
>>> >
>>> >   [Has anyone used udunits2? I made an Alien package for it. It's on
>>> CPAN.]
>>> >
>>> > ## DataShape and Blaze
>>>
>>> This looks a lot like what the PDL::Tiny core is shaping up to be.
>>> Another goal of PDL::Tiny is flexibility so that PDL can use and
>>> be used by/from other languages.
>>>
>>> >   I think it would be beneficial to look at the work being done by the
>>> Blaze
>>> >   project  with its DataShape specification
>>> >   . The idea behind it is to be able to
>>> use the
>>> >   various array-like APIs without having to worry what is going on in
>>> the
>>> >   backend  be it with a CPU-based, GPU-based, SciDB, or even a SQL
>>> server.
>>> >
>>> > ## Julia

Re: [Perldl] PDL::Tiny --- what should be in it?

2014-12-15 Thread David Mertens
Something that I think will be critical, especially if we start
JIT-compiling stuff or allowing for subclassing, is the customized code
could lead to a performance hit if it leads to code cache misses. I
recently came across a great explanation here:
http://igoro.com/archive/gallery-of-processor-cache-effects/

One of the files in the Perl interpreter's core code is called pp_hot.c.
According to comments at the top of the file, these functions are
consolidated into a single c (and later object) file to "encourage CPU
cache hits on hot code." If we create more and more code paths that get
executed, we increase the time spent loading the machine code into the L1
cache, and we also increase the likelihood of evicting parts of pp_hot and
other important execution paths.

David

On Mon, Dec 15, 2014 at 12:45 PM, David Mertens 
wrote:
>
> FWIW, it looks like Julia views are like affine slices in PDL. As I have
> said before, almost nothing out there has the equivalent of non-contiguous,
> non-strided support like we get with which, where, and their ilk. GSL
> vectors do not, either. Matlab only supports it as a temporary object, and
> eliminates it after the line has executed. Not sure about Numpy here.
>
> David
>
> On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall 
> wrote:
>
>> > On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
>> zaki.mug...@gmail.com> wrote:
>> >
>> > ...snip...
>> >
>> > ## Levels of measurement
>> >
>> >   When using R, one of the nice things it does is warn or give
>> >   an error when you try to do an operation that would be invalid on a
>> certain
>> >   type of data. One such type of data is categorical data, which R calls
>> >   factors and for which I made a subclass of PDL called PDL::Factor.
>> Some of
>> >   this behvaviour is inspired by the statistical methodology of levels
>> of
>> >   measurement . I
>> believe
>> >   SAS even explicitly allows assigning levels of measurment to
>> variables.
>>
>> +1, it would be nice if new PDL types supported varying
>> levels of computation including by levels of measurement
>>
>> > ...snip...
>> >
>> >   `NA` is R's equivalent of `BAD` values. For `mean()` this makes sense
>> for
>> >   categorical data. For logical vectors, it does something else:
>>
>> I would like to see more generalized support for bad value computions
>> since in some cases BAD is used for missing, in others BAD is used
>> for invalid,...
>>
>> > ...snip...
>> >
>> >   Thinking in terms of levels of measurement can help with another
>> experiment
>> >   I'm doing which based around tracking the units of measure used for
>> numerical
>> >   things in Perl. Code is here <
>> https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
>> >.
>> >
>> >   What I do there is use Moo roles to add a unit attribute to numerical
>> types
>> >   (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
>> through an
>> >   operation by either operator overloading or calling a function such as
>> >   `sum()`, the unit will be carried along with it and be manipulated
>> >   appropriately (you can take the mean of Kelvin, but not degrees
>> Celsius). I
>> >   know that units of measure are messy to implement, but being able to
>> support
>> >   auxiliary operations like this will go a long way to making PDL
>> flexible.
>>
>> Yes!  The use of method modifiers offer some powerful development
>> tools to implement various high level features.  I'm hoping that
>> it can be used to augment core functionality to support many of
>> the more powerful or flexible features such as JIT compiling, GPU
>> computation, distributed computation,...
>> >
>> >   [Has anyone used udunits2? I made an Alien package for it. It's on
>> CPAN.]
>> >
>> > ## DataShape and Blaze
>>
>> This looks a lot like what the PDL::Tiny core is shaping up to be.
>> Another goal of PDL::Tiny is flexibility so that PDL can use and
>> be used by/from other languages.
>>
>> >   I think it would be beneficial to look at the work being done by the
>> Blaze
>> >   project  with its DataShape specification
>> >   . The idea behind it is to be able to
>> use the
>> >   various array-like APIs without having to worry what is going on in
>> the
>> >   backend  be it with a CPU-based, GPU-based, SciDB, or even a SQL
>> server.
>> >
>> > ## Julia
>> >
>> >   Julia has been doing some amazing things with how they've grown out
>> their
>> >   language. I was looking to see if they have anything similar to the
>> dataflow
>> >   in PDL and I came across ArrayViews <
>> https://github.com/JuliaLang/ArrayViews.jl>.
>> >   It may be enlightening to see how they compose this feature onto
>> already
>> >   existing n-d arrays as opposed to how PDL does it.
>> >
>> >   I do not know what tradeoffs that brings, but it is a starting point
>> to think
>> >   about. I think similar approaches can be made to supp

Re: [Perldl] PDL::Tiny --- what should be in it?

2014-12-15 Thread David Mertens
FWIW, it looks like Julia views are like affine slices in PDL. As I have
said before, almost nothing out there has the equivalent of non-contiguous,
non-strided support like we get with which, where, and their ilk. GSL
vectors do not, either. Matlab only supports it as a temporary object, and
eliminates it after the line has executed. Not sure about Numpy here.

David

On Mon, Dec 15, 2014 at 11:32 AM, Chris Marshall 
wrote:
>
> > On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal <
> zaki.mug...@gmail.com> wrote:
> >
> > ...snip...
> >
> > ## Levels of measurement
> >
> >   When using R, one of the nice things it does is warn or give
> >   an error when you try to do an operation that would be invalid on a
> certain
> >   type of data. One such type of data is categorical data, which R calls
> >   factors and for which I made a subclass of PDL called PDL::Factor.
> Some of
> >   this behvaviour is inspired by the statistical methodology of levels of
> >   measurement . I
> believe
> >   SAS even explicitly allows assigning levels of measurment to variables.
>
> +1, it would be nice if new PDL types supported varying
> levels of computation including by levels of measurement
>
> > ...snip...
> >
> >   `NA` is R's equivalent of `BAD` values. For `mean()` this makes sense
> for
> >   categorical data. For logical vectors, it does something else:
>
> I would like to see more generalized support for bad value computions
> since in some cases BAD is used for missing, in others BAD is used
> for invalid,...
>
> > ...snip...
> >
> >   Thinking in terms of levels of measurement can help with another
> experiment
> >   I'm doing which based around tracking the units of measure used for
> numerical
> >   things in Perl. Code is here <
> https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
> >.
> >
> >   What I do there is use Moo roles to add a unit attribute to numerical
> types
> >   (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
> through an
> >   operation by either operator overloading or calling a function such as
> >   `sum()`, the unit will be carried along with it and be manipulated
> >   appropriately (you can take the mean of Kelvin, but not degrees
> Celsius). I
> >   know that units of measure are messy to implement, but being able to
> support
> >   auxiliary operations like this will go a long way to making PDL
> flexible.
>
> Yes!  The use of method modifiers offer some powerful development
> tools to implement various high level features.  I'm hoping that
> it can be used to augment core functionality to support many of
> the more powerful or flexible features such as JIT compiling, GPU
> computation, distributed computation,...
> >
> >   [Has anyone used udunits2? I made an Alien package for it. It's on
> CPAN.]
> >
> > ## DataShape and Blaze
>
> This looks a lot like what the PDL::Tiny core is shaping up to be.
> Another goal of PDL::Tiny is flexibility so that PDL can use and
> be used by/from other languages.
>
> >   I think it would be beneficial to look at the work being done by the
> Blaze
> >   project  with its DataShape specification
> >   . The idea behind it is to be able to
> use the
> >   various array-like APIs without having to worry what is going on in the
> >   backend  be it with a CPU-based, GPU-based, SciDB, or even a SQL
> server.
> >
> > ## Julia
> >
> >   Julia has been doing some amazing things with how they've grown out
> their
> >   language. I was looking to see if they have anything similar to the
> dataflow
> >   in PDL and I came across ArrayViews <
> https://github.com/JuliaLang/ArrayViews.jl>.
> >   It may be enlightening to see how they compose this feature onto
> already
> >   existing n-d arrays as opposed to how PDL does it.
> >
> >   I do not know what tradeoffs that brings, but it is a starting point
> to think
> >   about. I think similar approaches can be made to support sparse arrays.
>
> Julia views look a lot like what we call slices.
>
> >   In fact, one of Julia's strengths is how they use multimethods to
> handle new
> >   types with ease. See "The Design Impact of Multiple Dispatch"
> >    >
> >   for examples. [Perl 6 has built-in multimethods]
>
> Multi-methods may be a good way to support some of the new PDL
> capabilities in a way that can be expanded by plugins, at runtime,
> ...
>
>
> > ## MATLAB subclassing
> >
> > ...snip...
> >
> > ## GPU and threading
> >
> >   I think it would be best to offload GPU support to other libraries, so
> it
> >   would be good to extract what is common between libraries like
> >
> >   - MAGMA ,
> >   - ViennaCL ,
> >   - Blaze-lib  ,
> >   - VXL ,
> >   - Spark 

Re: [Perldl] PDL::Tiny --- what should be in it?

2014-12-15 Thread Chris Marshall
> On Sun, Dec 14, 2014 at 11:56 PM, Zakariyya Mughal 
wrote:
>
> ...snip...
>
> ## Levels of measurement
>
>   When using R, one of the nice things it does is warn or give
>   an error when you try to do an operation that would be invalid on a
certain
>   type of data. One such type of data is categorical data, which R calls
>   factors and for which I made a subclass of PDL called PDL::Factor. Some
of
>   this behvaviour is inspired by the statistical methodology of levels of
>   measurement . I
believe
>   SAS even explicitly allows assigning levels of measurment to variables.

+1, it would be nice if new PDL types supported varying
levels of computation including by levels of measurement

> ...snip...
>
>   `NA` is R's equivalent of `BAD` values. For `mean()` this makes sense
for
>   categorical data. For logical vectors, it does something else:

I would like to see more generalized support for bad value computions
since in some cases BAD is used for missing, in others BAD is used
for invalid,...

> ...snip...
>
>   Thinking in terms of levels of measurement can help with another
experiment
>   I'm doing which based around tracking the units of measure used for
numerical
>   things in Perl. Code is here <
https://github.com/zmughal/units-experiment/blob/master/overload_override.pl
>.
>
>   What I do there is use Moo roles to add a unit attribute to numerical
types
>   (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go
through an
>   operation by either operator overloading or calling a function such as
>   `sum()`, the unit will be carried along with it and be manipulated
>   appropriately (you can take the mean of Kelvin, but not degrees
Celsius). I
>   know that units of measure are messy to implement, but being able to
support
>   auxiliary operations like this will go a long way to making PDL
flexible.

Yes!  The use of method modifiers offer some powerful development
tools to implement various high level features.  I'm hoping that
it can be used to augment core functionality to support many of
the more powerful or flexible features such as JIT compiling, GPU
computation, distributed computation,...
>
>   [Has anyone used udunits2? I made an Alien package for it. It's on
CPAN.]
>
> ## DataShape and Blaze

This looks a lot like what the PDL::Tiny core is shaping up to be.
Another goal of PDL::Tiny is flexibility so that PDL can use and
be used by/from other languages.

>   I think it would be beneficial to look at the work being done by the
Blaze
>   project  with its DataShape specification
>   . The idea behind it is to be able to use
the
>   various array-like APIs without having to worry what is going on in the
>   backend  be it with a CPU-based, GPU-based, SciDB, or even a SQL server.
>
> ## Julia
>
>   Julia has been doing some amazing things with how they've grown out
their
>   language. I was looking to see if they have anything similar to the
dataflow
>   in PDL and I came across ArrayViews <
https://github.com/JuliaLang/ArrayViews.jl>.
>   It may be enlightening to see how they compose this feature onto already
>   existing n-d arrays as opposed to how PDL does it.
>
>   I do not know what tradeoffs that brings, but it is a starting point to
think
>   about. I think similar approaches can be made to support sparse arrays.

Julia views look a lot like what we call slices.

>   In fact, one of Julia's strengths is how they use multimethods to
handle new
>   types with ease. See "The Design Impact of Multiple Dispatch"
>   
>   for examples. [Perl 6 has built-in multimethods]

Multi-methods may be a good way to support some of the new PDL
capabilities in a way that can be expanded by plugins, at runtime,
...


> ## MATLAB subclassing
>
> ...snip...
>
> ## GPU and threading
>
>   I think it would be best to offload GPU support to other libraries, so
it
>   would be good to extract what is common between libraries like
>
>   - MAGMA ,
>   - ViennaCL ,
>   - Blaze-lib  ,
>   - VXL ,
>   - Spark ,
>   - Torch ,
>   - Theano ,
>   - Eigen , and
>   - Armadillo .
>
>   Eigen is interesting in particular because it has support for storing
in both
>   row-major and column-major data <
http://eigen.tuxfamily.org/dox-devel/group__TopicStorageOrders.html>.

We would benefit by supporting the commonalities needed to work
with other GPU computation libraries.  I'm not sure that all
PDL computations can be run efficiently if processed at the
library call level.  We may want our own JIT for performnce.

>   Another source of inspiration would 

Re: [Perldl] PDL::Tiny --- what should be in it?

2014-12-15 Thread Chris Marshall
PDL::Tiny would essentially provide the core types, computation, and
framework for the next-gen PDL3 implementation.  Since it is focused on the
core of what and how PDL would or should work, it gives us a laboratory to
quickly implement and refine these ideas.  Here are my thoughts on the
initial development:

   - Use github.com for faster code development rather than stability
   - Start with perl-level Moo OO structure for PDL
  - Roles and architecture are critical for new PDL3
  - We could even begin with perl-only implementations
  - Types, Arrays, Units, ...
  - Can work on JIT-PP code generation
  - What are the key dimensions of PDL computation?
  - Indexing, threadloops,...
   - Should be upgradable to full PDL capabilities
  - PDL-2.x via has-a support?
  - PDL3 as it is implementation
  - Other options for KISS and lightweight compute
  - What about testing and backward compatibility
 - Test against PDL-2.x t/ possible?
 - Options for test driven development
 - Performance evaluation/metrics
 - It should be possible to evaluate options for C-OO and C-PDL
  - Enlightenment Object model looks promising
  - Want PDL from perl and PDL from C to be equivalent
  - Would enable multi-threaded and parallel/GPU compute

Exactly how tiny is possible or desired could be a result of the
development.

--Chris

On Sun, Dec 14, 2014 at 4:22 PM, David Mertens 
wrote:
>
> Hey Chris,
>
> What exactly is the aim of this project? Is this a 90% reimplementation of
> PDL? If so, I would really like to have a well thought-out C API, so that I
> can easily create new PDLs from my C or C-like code. I would also really
> like to be able to call PDL functions from C.
>
> Whether those belong in a Tiny module I cannot say. It depends on what
> you're trying to make tiny. :-)
>
> David
>
> On Sun, Dec 14, 2014 at 11:31 AM, Chris Marshall 
> wrote:
>
>> To support POGL2 development (updating Perl OpenGL bindings to APIs 3.x,
>> 4.x, and the ES variants) and as a start at the PDL3 core implementation,
>> I'm preparing a PDL::Tiny module and am looking for your input on what you
>> think should or should not be in it.  Here are my general thoughts so far:
>>
>>- The basic PDL::Tiny object starts with Moo
>>   - This allows full meta-object programming via Moose
>>   - Interoperable with state of the art perl OO programming
>>   - KISS principle is satisfied
>>- Additional capabilities would be added via Roles
>>   - Data allocation
>>   - Data types
>>   - Computation support
>>   - Threading/vectorization
>>- PDL::Tiny should interoperate with PDL-2.x
>>   - Using PDL::Objects support
>>   - Allows PDL-2.x and PDL3 options
>>   - Perhaps a pure-perl implementation
>>
>>
>> This will give a concrete platform with which to develop PDL3 concepts.
>> In addition, I plan to set up a github project for this effort so I can
>> come up to speed with that platform and to encourage rapid development.
>>
>> I welcome your thoughts and suggestions
>>
>> Regards,
>> Chris (with my PDL3 and POGL2 hats on)
>>
>>
>>
>> ___
>> Perldl mailing list
>> Perldl@jach.hawaii.edu
>> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>>
>>
>
> --
>  "Debugging is twice as hard as writing the code in the first place.
>   Therefore, if you write the code as cleverly as possible, you are,
>   by definition, not smart enough to debug it." -- Brian Kernighan
>
___
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl


Re: [Perldl] PDL::Tiny --- what should be in it?

2014-12-14 Thread Zakariyya Mughal
Hi everyone,

The following are some suggestions. I'd love to help work on them. Sorry for
the length.

- Zaki Mughal

---

I announced adding data frames for PDL several months back
 and my
intention to embed R in Perl. Embedding R in Perl is actually complete now and
just about ready for CPAN 

thanks to the help of the gang on #inline 
.

In order to build the data frames and match R types, I created several
subclasses of PDL that handle a subset of PDL functions, but I haven't figured
out a way to wrap all of PDL's functionality systematically. I have several
thoughts on this.

## Levels of measurement

  When using R, one of the nice things it does is warn or give
  an error when you try to do an operation that would be invalid on a certain
  type of data. One such type of data is categorical data, which R calls
  factors and for which I made a subclass of PDL called PDL::Factor. Some of
  this behvaviour is inspired by the statistical methodology of levels of
  measurement . I believe
  SAS even explicitly allows assigning levels of measurment to variables.

  For example, if I try to apply the mean() function on all the columns of the
  Iris data set, I get this warning:

  ```r
  lapply( iris, mean )
  #> $Sepal.Length
  #> [1] 5.84
  #>
  #> $Sepal.Width
  #> [1] 3.057333
  #>
  #> $Petal.Length
  #> [1] 3.758
  #>
  #> $Petal.Width
  #> [1] 1.199333
  #>
  #> $Species
  #> [1] NA
  #>
  #> Warning message:
  #> In mean.default(X[[5L]], ...) :
  #>   argument is not numeric or logical: returning NA
  ```

  `NA` is R's equivalent of `BAD` values. For `mean()` this makes sense for
  categorical data. For logical vectors, it does something else:

  ```r
  which_setosa <- iris$Species == 'setosa' # this is a logical
  mean( which_setosa )
  #> [1] 0.333
  ```

  This means 1/3 of the logical data was true which may be useful for `mean()`
  to return in that case.

  Thinking in terms of levels of measurement can help with another experiment
  I'm doing which based around tracking the units of measure used for numerical
  things in Perl. Code is here 
.

  What I do there is use Moo roles to add a unit attribute to numerical types
  (Perl scalars, Number::Fraction, PDL, etc.) and whenever they go through an
  operation by either operator overloading or calling a function such as
  `sum()`, the unit will be carried along with it and be manipulated
  appropriately (you can take the mean of Kelvin, but not degrees Celsius). I
  know that units of measure are messy to implement, but being able to support
  auxiliary operations like this will go a long way to making PDL flexible.

  [Has anyone used udunits2? I made an Alien package for it. It's on CPAN.]

## DataShape and Blaze

  I think it would be beneficial to look at the work being done by the Blaze
  project  with its DataShape specification
  . The idea behind it is to be able to use the
  various array-like APIs without having to worry what is going on in the
  backend — be it with a CPU-based, GPU-based, SciDB, or even a SQL server.

## Julia

  Julia has been doing some amazing things with how they've grown out their
  language. I was looking to see if they have anything similar to the dataflow
  in PDL and I came across ArrayViews 
.
  It may be enlightening to see how they compose this feature onto already
  existing n-d arrays as opposed to how PDL does it.

  I do not know what tradeoffs that brings, but it is a starting point to think
  about. I think similar approaches can be made to support sparse arrays.

  In fact, one of Julia's strengths is how they use multimethods to handle new
  types with ease. See "The Design Impact of Multiple Dispatch" 

  for examples. [Perl 6 has built-in multimethods]

## MATLAB subclassing

  I use MATLAB daily. I came across this area of the documentation that talks
  about how to subclass. 


  Some of the information in there is good for knowing how *not* to implement
  things, but there is also some discussion on what is necessary for the
  storage types that might be worth looking at.

  [By the way, I have downloaded all of MATLAB File Central's code and I could 
do
  some analysis on the functions used there if that would be helpful.]

## GPU and threading

  I think it would be best to offload GPU support to other libraries, so it
  would be good to extract what is common between libraries like

  - MAGMA ,
  - Vi

Re: [Perldl] PDL::Tiny --- what should be in it?

2014-12-14 Thread David Mertens
Hey Chris,

What exactly is the aim of this project? Is this a 90% reimplementation of
PDL? If so, I would really like to have a well thought-out C API, so that I
can easily create new PDLs from my C or C-like code. I would also really
like to be able to call PDL functions from C.

Whether those belong in a Tiny module I cannot say. It depends on what
you're trying to make tiny. :-)

David

On Sun, Dec 14, 2014 at 11:31 AM, Chris Marshall 
wrote:
>
> To support POGL2 development (updating Perl OpenGL bindings to APIs 3.x,
> 4.x, and the ES variants) and as a start at the PDL3 core implementation,
> I'm preparing a PDL::Tiny module and am looking for your input on what you
> think should or should not be in it.  Here are my general thoughts so far:
>
>- The basic PDL::Tiny object starts with Moo
>   - This allows full meta-object programming via Moose
>   - Interoperable with state of the art perl OO programming
>   - KISS principle is satisfied
>- Additional capabilities would be added via Roles
>   - Data allocation
>   - Data types
>   - Computation support
>   - Threading/vectorization
>- PDL::Tiny should interoperate with PDL-2.x
>   - Using PDL::Objects support
>   - Allows PDL-2.x and PDL3 options
>   - Perhaps a pure-perl implementation
>
>
> This will give a concrete platform with which to develop PDL3 concepts.
> In addition, I plan to set up a github project for this effort so I can
> come up to speed with that platform and to encourage rapid development.
>
> I welcome your thoughts and suggestions
>
> Regards,
> Chris (with my PDL3 and POGL2 hats on)
>
>
>
> ___
> Perldl mailing list
> Perldl@jach.hawaii.edu
> http://mailman.jach.hawaii.edu/mailman/listinfo/perldl
>
>

-- 
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan
___
Perldl mailing list
Perldl@jach.hawaii.edu
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl