Re: [Pdl-devel] Faster PDL Development Cycle---But How?

Zakariyya Mughal Tue, 25 Aug 2015 10:43:52 -0700

On 2015-08-24 at 23:48:51 +0000, Chris Marshall wrote:
> PDL Developers-
> 
>      With the addition of two active and highly motivated PDL developers
> (Zakariyya Mughal and Guggle "Ed" Worth) we've made significant progress
> in cleaning up the PDL distribution itself and the development process
> itself.  PDL is now run through test builds automatically on git commit
> via the Travis-CI framework of github.  Many perl platforms and PDL
> configuration options are exercised.  PDL-2.013 was the best tested
> pre-release release ever.
> 
>      The current process we've been working toward is to make
> PDL development faster and more responsive by breaking up the current
> monolithic PDL distribution into a lean core (roughly the current
> PDL::Core, PDL:PP, and PDL::Slices) and spinning off the other modules
> for IO, Graphics, and Library interfaces as their own CPAN releases.
> This would enable the separate module/distributions to have a faster
> development-test-relese cycle since that process would not be held up by
> the testing of the full PDL distribution with all its subcomponents,
> even if they are completely independent/unrelated to the separate module
> changes being made.
> 
>      We're ready to make the split, but there is a catch...  How can we
> have the rapid agile development needed to bring the next generation
> PDL3 possible _without_ losing the "PDL just works" that has been one of
> the primary focus of PDL-2.x development since I volunteered as release
> manager circa PDL-2.4.3 [sic]?
> 
>      There has been some discussion, largely on #pdl, about how to best
> proceed.  One idea is to move to a constant release mode which could be
> expedited by adding co-maints to PDL.  I've not acted on that largely
> because I feel that PDL just working, easy to get and start to use, is
> essential to survive as a minority numeric computation engine (compared
> with R, NumPy, Octave/MATLAB).  How can we grow market share if it takes
> a perl expert to start using PDL?
> 
>      That said, I think the "big split" is the best way forward for PDL
> to grow and thrive.  The ideas for the PDL3 core engine show great
> promise for the kind of dynamic development as occurred when Karl first
> conceived and implemented the idea that would become PDL.
> Unfortunately, my experience with rapid sequential releases is a sort of
> "churn" where it is difficult to know if you'll be able to get a working
> module at any given release.  So what to do...
> 
>      One idea I had is change the stable PDL release distribution into a
> PDL bundle.  That would be the "stable PDL" that would be easy to get
> and install.  The sub-modules would then be able to have independent
> development forming the "experimental PDL" track.  Another way, a bit
> more crude, would be to make a fixed "stable PDL" release that would be
> the one to install.  Maybe we could use specific version information to
> work with cpan, cpanm,...
> 
>      Here's where we need your input for discussion and consensus.
> Please feel free to comment on any of the above, or to offer your own
> thoughts.  The goal is to select the preferred approach for modern PDL
> development and move out on it.  I would like to complete this discuss
> process within the next two weeks.  At that point we should be able to
> make a specific plan for any final comments with the agile development
> to begin shortly after.
> 
> Let the discussions begin!


Hello Chris,

First off, thank you for starting this conversation.

Ed and I have been working on and off as time permits on preparing for
the split. The work we've been doing hasn't really generated much
traffic on the pdl-devel mailing list, but the #pdl and PDLPorters
GitHub organisation shows a very different story. There is a lot going
on there every few days. The discussion on those two mediums is a little
more agile than the mailing list or SourceForge and helps with formulating

I highly recommend joining both by watching the repositories in
PDLPorters and following the IRC by either joining in a client or
tracking the backlog with <http://irclog.perlgeek.de/pdl/>.

I'd like to summarise some of what we came up with on GitHub/IRC:

 1. A split is necessary to not only make releases easier, but also
    development. We have worked on reducing the time required to build
    PDL across multiple environments down to a little over 1 hour.

    This is still too long when you have perhaps 1.5 hours of tuits that
    day. So the work inevitably gets spread out over weeks.

    A split would help decrease this friction.

 2. Making `cpanm PDL` always work has always part of the plan.
    Improving the PDL devops has helped with that. The plan is to
    continue doing that.

    But large refactors such as this split can be quite daunting. We
    can't be sure we will stick the landing right the first time. But
    the job needs to move forward or it will fail via analysis paralysis
    even before it has begun.

 3. Ed and I have been thinking about releasing a more agile, friendly
    fork of PDL under the PDLA namespace (for PDL Agile). The
    repositories will continue to live under the PDLPorters GitHub
    organisation.

    We will start by applying the split. This will be followed by
    improving code coverage, fixes to the 64-bit indexing, formalising
    the badvalue semantics for more functions, and bug-fixes.

    We plan on making sure that libraries such as PDL-Stats, PDL-IO-CSV,
    etc. remain compatible with this library. I believe there is a way
    to do this without making changes to the original code (via a subref
    in @INC).
 
 4. The modules that come from the split will each be improved so that
    they are easy to install on their own. We already have plans to
    write Alien::Base modules for all of them.

 5. In parallel with this, we will begin reaching out to distribution
    packagers. PDL has not been updated on many of them (some of which
    are on 2.4.x). This is already on the wishlist at 
<https://github.com/PDLPorters/pdl/issues/139>.

 6. The current PDL distribution will remain as it is. Bugfixes will
    continue on PDL and they will be backported from PDLA. This approach
    has worked well for IPython/Jupyter (which underwent a split earlier
    this summer)[^jupyter-split]. Back porting fixes was a large part
    of what they had to go through.

 7. Eventually, after we are sure that PDLA has maintained
    compatibility with PDL, the changes of PDLA will replace the
    current PDL repository.

Finally, I also have some ideas for PDL3 that I will post in about a
month's time. One of the top priorities on the feature list of PDL3's C
API needs to be the ability to do optmisations such as loop fusion. I
need to ponder on how to combine this with the Moo-like metaprogramming
that we envision. The Julia developers seem to be working on this, but
there are still big unresolved questions on the issue tracker.

By the way, I think it might be better to avoid putting a number in the
name of this next major version of PDL. It's a personal opinion that
stems from marketing issues that are similar to what happened with
Osborne 1 <https://en.wikipedia.org/wiki/Osborne_effect> and somewhat
with Perl 6. This isn't a strongly held opinion, but I feel that it is
worth bringing up.

[^jupyter-split]: http://blog.jupyter.org/2015/04/15/the-big-split/

Cheers,
- Zaki Mughal

> 
> --Chris

> ------------------------------------------------------------------------------

> _______________________________________________
> pdl-devel mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pdl-devel


------------------------------------------------------------------------------
_______________________________________________
pdl-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdl-devel

Re: [Pdl-devel] Faster PDL Development Cycle---But How?

Reply via email to