Hi Anthony,


>> How does hierarchical help here? do you create a 'singer_name'/song
>> table? or a 'genre name'/song ?. Most of the time the physical layout
>> in the form of a hierarchy is just an annoyance.
>
> I have to say that I disagree.  The hierarchical features make it so that
> the data maps well to both Python objects and file systems.  I feel that
> both of these are more natural to work with than having to construct a
> query of joins, groupbys, etc which reconstructs my data. So while this
> is just my opinion, I feel that hierarchies are much more natural to work
> with.
>

I try to see the difference; I think I could be helped by an example.
SQL gives you views to hide the data layout. You will just have 'a
table' that you can read by rows. Nothing else. What happens in the
case outlined before if I want to group songs by genre? if I encode
the relation in the hierarchy, I will have to write the code that
generates the view, but since there are many ways to create the
hierarchy, the code will be specific to one data layout... I'd love to
be wrong here.


>> > Also, my sense is that there would be a fair bit of overhead in this
>> > interface
>> > layer, which might not get you the speed boost you desire.  I could be
>> > wrong
>> > about this though.
>>
>> I think you're right in the wrapping of the results via the Python
>> interface to SQLite. I suspect you're not about the queries executed
>> in the virtual table, because that is left for you to implement and
>> thus you could turn the query terms (that are handed over to you) into
>> in-kernel expressions if you so wish (http://www.sqlite.org/vtab.html)
>
>
> This was informed by my experience with SQLAlchemy which in some
> situations added an excessively long computation times.  With the PyTables
> infrastructure, we would at least have the option of writing the
> performance
> critical parts in C or Cython...

You are talking here about writing specific /queries/ in Cython? just
for clarification.

>> > If I saw a proof-of-concept implementation, I may grok better the
>> > purpose.
>> > Do you have any code to share?
>>
>> No, but I have an example ER diagram which is only part of what I
>> need.  You are welcome to have a look at it[2]

> Sorry the text in this image is too small for me to read.

I uploaded a larger version on the same location.

> Writing data-specific relational layers for your applications on top of
> HDF5 with PyTables is not hard (IMHO).  Add in the features of NumPy
> to perform in-memory manipulations and you have pretty much everything
> that you need.  I think this is why we don't have formal implementation
> of the SQ Language in PyTables.

Yes, fair enough. There is just no canonical way of doing it.

What do you think of storing in the .attrs something like "This column
of this table matches (in the sense of foreign key) that column in
that table" ?

Or would you store these relations in a global repository of sorts - a
specific table?

> I guess what i don't understand still is why - if you wanted to do this -
> would
> you use the SQLite vtabs?  This seems to have the worst of the SQL world
> in terms of vendor lock in, compatibility with other SQL implementations,
> etc.

That is true. I did a quick search and couldn't find if/how e.g
Postgres has such an extensibility mechanism.

At the same time, there's some commonality between SQLite and
PyTables: single-file, no concurrency approach. If you need a
full-fledged RDBMS with authorization etc. you are in another league
and some abstractions may be difficult to map to PyTables.

And RDBMS have received, recently, features that are of great interest
for scientific users - for example, indexes optimized for spatial or
interval queries[7][8].

> Instead, why not just write a SQLAlchemy dialect [6] that is backed by
> PyTables?

I considered this. I don't know how difficult it is. Do you think that
this would be the way to go for implementing a thin relational layer
on top of PyTables?

As I have no practical experience with SQLAlchemy, I cannot foresee
e.g. those performance drops that you were pinpointing above.

> Yes, this isn't 'self-contained' in that we know have a dependency on
> SQLAlchemy.
> However, if done right this would be an *optional* dependency.  Are there
> reasons
> to not do this that I am missing?  I think that including something like
> this as a
> subpackage in PyTables is more reasonable than interfacing with SQLite
> in specific.

More reasonable in a general sense, I don't know. The mirror statement
would be to say that adding support for Numpy containers to the Python
database adapter would be a reasonable thing to do.


> Thanks for fielding my questions here.

A pleasure. I am trying to wrap my head around all the possibilities
here. I think a documented PyTables use-case for a moderately complex
scientific database could do a lot for its story.

Cheers,

Álvaro.

[7] http://www.sqlite.org/rtree.html
[8] http://www.postgresql.org/docs/8.1/static/gist.html

> Be Well
> Anthony
>
>
> [6] http://docs.sqlalchemy.org/en/latest/#dialect-documentation
>
>>
>>
>> Cheers,
>>
>> Álvaro.
>> --
>> [1] http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html
>> [2] http://dl.dropbox.com/u/2467197/ER-simple.png (yellow tables link
>> to HDF5 data, or other tables with the real measurements, white tables
>> are computed).
>> [3] http://www.scidb.org/
>> [4] See p.26-29 and 32
>>
>> http://www.itea-wsmr.org/ITEA%20Papers%20%20Presentations/2006%20ITEA%20Papers%20and%20Presentations/folk_HDF5_databases_pres.pdf
>> [5]
>> https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py#L826
>>
>>
>> > Be Well
>> > Anthony
>> >
>> > On Thu, Apr 12, 2012 at 11:03 AM, Alvaro Tejero Cantero
>> > <alv...@minin.es>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> The topic of introducing some kind of relational management in
>> >> PyTables comes up with certain frequency.
>> >>
>> >> Would it be possible to combine the virtues of RDBMS and hdf5's speed
>> >> via a mechanism such as SQLite Virtual Tables?
>> >>
>> >> http://www.sqlite.org/vtab.html
>> >>
>> >> I wonder if the required x* functions could be written for PyTables,
>> >> or if it being in Python is an obstacle to this kind of interfacing
>> >> with SQLite.
>> >>
>> >> Something like that would be a truly powerful solution in use cases
>> >> that don't require concurrency.
>> >>
>> >> Cheers,
>> >>
>> >> -á.
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> For Developers, A Lot Can Happen In A Second.
>> >> Boundary is the first to Know...and Tell You.
>> >> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
>> >> http://p.sf.net/sfu/Boundary-d2dvs2
>> >> _______________________________________________
>> >> Pytables-users mailing list
>> >> Pytables-users@lists.sourceforge.net
>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > For Developers, A Lot Can Happen In A Second.
>> > Boundary is the first to Know...and Tell You.
>> > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
>> > http://p.sf.net/sfu/Boundary-d2dvs2
>> > _______________________________________________
>> > Pytables-users mailing list
>> > Pytables-users@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> For Developers, A Lot Can Happen In A Second.
>> Boundary is the first to Know...and Tell You.
>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
>> http://p.sf.net/sfu/Boundary-d2dvs2
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> ------------------------------------------------------------------------------
> For Developers, A Lot Can Happen In A Second.
> Boundary is the first to Know...and Tell You.
> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
> http://p.sf.net/sfu/Boundary-d2dvs2
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Reply via email to