Re: [Pytables-users] Main differences between PyTables and Relational

Anthony Scopatz Wed, 25 Apr 2012 15:20:10 -0700

Hello Alvaro,

Thanks for writing this up.  I think this would go nicely in our docs if
you are willing to let us add it ;).


My one comment is that in your NOTE you say that "*
SQL tables do not play well with Numpy containers.
*"  I think that this would be better phrased as saying that when
converting SQL Tables to/from NumPy record arrays or PyTables Tables you
are adding or removing the ordering of rows and columns.  Additionally
going from NumPy / PyTables to SQL requires that you perform a unique() or
set() operation on the rows.

So it isn't that they *can't* play nicely together, but rather you have to
understand how they do.   Thanks again.

Be Well
Anthony

On Wed, Apr 25, 2012 at 4:41 PM, Alvaro Tejero Cantero <alv...@minin.es>wrote:

> * Hello list,
> The relational model has a strong foundation and I have spent a few hours
> thinking about what in PyTables is structurally different from it. Here are
> my thoughts. I would be delighted if you could add/comment/correct on these
> ideas. This could eventually help people with a relational/SQL background
> who want to know how to best use the idioms of PyTables for their data
> modeling
>
> ---
> I make a distinction between relational and SQL (see CJ Date’s "SQL and
> relational theory" for more on that). From a purely structural point of
> view, the following differences are apparent:
>
>    1. relations vs. sequences. Relations are sets (i.e. not ordered) of
>    tuples (again, not ordered).
>       1. rows: In PyTables, every container has an implicit row number,
>       meaning there is always a candidate key and order matters. Although
>       strictly an implementation-level concern, row numbers in PyTables are 
> not
>       stored but computed, thanks to the in-disk adjacency of the records. 
> This
>       is important for large datasets, where adding storage of row numbers 
> means
>       roughly a doubling of diskspace.
>       2. columns: In PyTables columns are ordered. That is not the case
>       in a purely relational system but it is the case in SQL.
>    2. Flat tablespace vs. hierarchical tablespace. SQL tables live in a
>    global namespace. PyTables objects can be put inside Groups. Each approach
>    can be mapped onto the other by name mangling. Groups in PyTables are like
>    tables of tables -- for each node in a group there is a full table (or
>    another group...). This introduces a possible ambiguity in data modeling:
>
> Consider a table of car parts, one column is Part ID and the other is
> Model ID, indicating in what car models a particular part is built in. In
> PyTables you can construct the same table /or/ create a /models group and
> create one table per model consisting of a single column of Part IDs e.g.
> /model/sedan, /model/cabrio... etc. The same is possible in a relational
> setting (dividing the tables according to one attribute, and naming them
> according to the attribute value, e.g. model_sedan, model_cabrio...). The
> defining difference is that the interface to manipulate that list is the
> same (it is a table) whereas in PyTables one listing is a Table object and
> the other is a list of Nodes, and the API for both is a bit different.
>
>    1. Attributes of tables and integrity. Any Node (Groups and Tables
>    included) can receive a limited amount of metadata in PyTables, by using
>    the attached attributeset. In SQL, metadata is limited to some keywords, to
>    be used upon table creation, that establish constraints on the columns that
>    have a functional significance. A prime example of this is identifying
>    foreign keys. SQL allows to use this information structurally at the time
>    of joins, whereas in PyTables one is free to implement this or any other
>    navigational scheme in a customized way using the attributes of the table.
>
> When designing such a scheme it has to be remembered that PyTables tables
> have always an implicit column containing the row numbers, and this is
> likely to be used as a key.
>
> NOTE: I intentionally excluded here implementation issues whenever they
> are not related to structural ones, e.g. SQL tables do not play well with
> Numpy containers and are thus ill-suited for big data with Python. Another
> example would be all the features related to transactions/concurrency and
> authorization, which are orthogonal to the data model.
>
> *
>
> -á.
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Main differences between PyTables and Relational

Reply via email to