Re: A data transformation framework. A presentation inviting commentary.

Terry Reedy Wed, 21 Aug 2013 13:03:34 -0700

On 8/21/2013 12:29 PM, F.R. wrote:

Hi all,


In an effort to do some serious cleaning up of a hopelessly cluttered
working environment, I developed a modular data transformation system
that pretty much stands. I am very pleased with it. I expect huge time
savings. I would share it, if had a sense that there is an interest out
there and would appreciate comments. Here's a description. I named the
module TX:

You appear to have developed a framework for creating data flownetworks. Others exists, including Python itself and things built on topof Python, like yours. I am not familiar with others built on Python,but I would not be surprised if your occupies its own niche. It is easyenough to share on PyPI.

The nucleus of the TX system is a Transformer class, a wrapper for any
kind of transformation functionality. The Transformer takes input as
calling argument and returns it transformed. This design allows the
assembly of transformation chains, either nesting calls or better, using
the class Chain, derived from 'Transformer' and 'list'.

Python 3 is built around iterables and iterators. Iterables generalizethe notion of list to any structure that can be sequentially accessed. Acollection can be either concrete, existing all at once in some memory,or abstract, with members created as needed.

One can think of there being two types of iterator. One merely presentsthe items of a collection one at a time. The other transforms items oneat a time.

The advantage of 'lazy' collections' is that they scale up much betterto processing, say, a billion items. If your framework keeps the inputlist and all intermediate lists, as you seem to say, then your frameworkis memory constrained. Python (mostly) shifted from list to iterables asthe common data interchange type partly for this reason.

You are right that keeping data around can help debugging. Without that,each iterator must be properly tested if its operation is not transparent.


> A Chain consists

of a sequence of Transformers and is functionally equivalent to an
individual Transformer. A high degree of modularity results: Chains
nest.

Because iterators are also iterables, they nest. A transformer iteratordoes not care if its input is a concrete non-iterator iterable, a sourceiterator representing an abstract collection, or another transformer.


 Another consequence is that many transformation tasks can be

handled with a relatively modest library of a few basic prefabricated
Transformers from which many different Chains can be assembled on the
fly.

This is precisely the idea of the itertool modules. I suspect thatitertools.tee is equivalent to Tx.split (from the deleted code).Application areas need more specialized iterators. There are many invarious stdlib modules.

A custom Transformer to bridge an eventual gap is quickly written
and tested, because the task likely is trivial.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: A data transformation framework. A presentation inviting commentary.

Reply via email to