On 8/21/2013 12:29 PM, F.R. wrote:
Hi all,

In an effort to do some serious cleaning up of a hopelessly cluttered
working environment, I developed a modular data transformation system
that pretty much stands. I am very pleased with it. I expect huge time
savings. I would share it, if had a sense that there is an interest out
there and would appreciate comments. Here's a description. I named the
module TX:

You appear to have developed a framework for creating data flow networks. Others exists, including Python itself and things built on top of Python, like yours. I am not familiar with others built on Python, but I would not be surprised if your occupies its own niche. It is easy enough to share on PyPI.

The nucleus of the TX system is a Transformer class, a wrapper for any
kind of transformation functionality. The Transformer takes input as
calling argument and returns it transformed. This design allows the
assembly of transformation chains, either nesting calls or better, using
the class Chain, derived from 'Transformer' and 'list'.

Python 3 is built around iterables and iterators. Iterables generalize the notion of list to any structure that can be sequentially accessed. A collection can be either concrete, existing all at once in some memory, or abstract, with members created as needed.

One can think of there being two types of iterator. One merely presents the items of a collection one at a time. The other transforms items one at a time.

The advantage of 'lazy' collections' is that they scale up much better to processing, say, a billion items. If your framework keeps the input list and all intermediate lists, as you seem to say, then your framework is memory constrained. Python (mostly) shifted from list to iterables as the common data interchange type partly for this reason.

You are right that keeping data around can help debugging. Without that, each iterator must be properly tested if its operation is not transparent.

> A Chain consists
of a sequence of Transformers and is functionally equivalent to an
individual Transformer. A high degree of modularity results: Chains
nest.

Because iterators are also iterables, they nest. A transformer iterator does not care if its input is a concrete non-iterator iterable, a source iterator representing an abstract collection, or another transformer.

 Another consequence is that many transformation tasks can be
handled with a relatively modest library of a few basic prefabricated
Transformers from which many different Chains can be assembled on the
fly.

This is precisely the idea of the itertool modules. I suspect that itertools.tee is equivalent to Tx.split (from the deleted code). Application areas need more specialized iterators. There are many in various stdlib modules.

A custom Transformer to bridge an eventual gap is quickly written
and tested, because the task likely is trivial.

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to