Re: [Pytables-users] SQLite Virtual Tables

Alvaro Tejero Cantero Fri, 13 Apr 2012 04:42:21 -0700

Hi Anthony,

> I can see how the virtual table interface could be made to work with
> PyTables,
> but I guess I don't understand why you would want to.  It seems like in this
> case you are querying using SQL rather than the more expressive Python.

Yes, you'd be querying using SQL.
SQL is a documented declarative syntax for queries over relations.
Python offers many procedural routes to achieve e.g. joins, all of
them custom. If (a == b) |  (c==d) is more expressive to you than
WHERE a=b OR c=d , then you can use SQLAlchemy [1], which wraps SQL in
a Pythonic query syntax.

> Moreover, you'd be sacrificing all of the 'H' in HDF5 features to obtain
> this.

What is the benefit of 'H'ierarchical that you have in mind? To me
hierarchy seems less expressive than general relations. After all,
file systems are hierarchical and you're going to HDF5 still (and
losing the panoply of filesystem-based tools with it). So clearly, the
differential benefit of HDF5 is not at all in the hierarchical
character.

Take a list of e.g. songs with a foreign key 'singer' pointing at one
row in the table of singers, and a foreign key 'genre' pointing at the
genre_songs table which in turns points to 'genres' (n:m)
relationship.

How does hierarchical help here? do you create a 'singer_name'/song
table? or a 'genre name'/song ?. Most of the time the physical layout
in the form of a hierarchy is just an annoyance.

> Also, my sense is that there would be a fair bit of overhead in this
> interface
> layer, which might not get you the speed boost you desire.  I could be wrong
> about this though.

I think you're right in the wrapping of the results via the Python
interface to SQLite. I suspect you're not about the queries executed
in the virtual table, because that is left for you to implement and
thus you could turn the query terms (that are handed over to you) into
in-kernel expressions if you so wish (http://www.sqlite.org/vtab.html)

> If I saw a proof-of-concept implementation, I may grok better the purpose.
> Do you have any code to share?

No, but I have an example ER diagram which is only part of what I
need.  You are welcome to have a look at it[2] and tell me how you'd
achieve to support the jungle of relationships there with the H of
HDF5. In SQL I have a syntax to declare all those relationships. In
HDF5 I must decide for one hierarchical cut of those relations and
since it won't be enough, implement the relational layer on top of it,
perhaphs using attrs to store paths everywhere. It can be done, but
the support out of the box at this point for this is next to nil
(maybe integrating something like recarray.joinby [5] would be
useful?)

It looks to me, at this moment, that as soon as the data model gets
complicated HDF5 is in trouble, and as soon as very large, contiguous,
read-only, datasets are involved relational RDBMSs are in trouble
(subsetting, speed). Since this is not a happy situation, several
people are interested in combining the strengths of both [3][4] and my
e-mail was just highlighting that there may be a way to go that may
make a self-contained, clear, understandable package for the scenarios
where PyTables is most often deployed  (single-user).

Or I am not seeing something obvious?

Cheers,

Álvaro.
--
[1] http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html
[2] http://dl.dropbox.com/u/2467197/ER-simple.png (yellow tables link
to HDF5 data, or other tables with the real measurements, white tables
are computed).
[3] http://www.scidb.org/
[4] See p.26-29 and 32
http://www.itea-wsmr.org/ITEA%20Papers%20%20Presentations/2006%20ITEA%20Papers%20and%20Presentations/folk_HDF5_databases_pres.pdf
[5] https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py#L826

> Be Well
> Anthony
>
> On Thu, Apr 12, 2012 at 11:03 AM, Alvaro Tejero Cantero <alv...@minin.es>
> wrote:
>>
>> Hi,
>>
>> The topic of introducing some kind of relational management in
>> PyTables comes up with certain frequency.
>>
>> Would it be possible to combine the virtues of RDBMS and hdf5's speed
>> via a mechanism such as SQLite Virtual Tables?
>>
>> http://www.sqlite.org/vtab.html
>>
>> I wonder if the required x* functions could be written for PyTables,
>> or if it being in Python is an obstacle to this kind of interfacing
>> with SQLite.
>>
>> Something like that would be a truly powerful solution in use cases
>> that don't require concurrency.
>>
>> Cheers,
>>
>> -á.
>>
>>
>> ------------------------------------------------------------------------------
>> For Developers, A Lot Can Happen In A Second.
>> Boundary is the first to Know...and Tell You.
>> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
>> http://p.sf.net/sfu/Boundary-d2dvs2
>> _______________________________________________
>> Pytables-users mailing list
>> Pytables-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> ------------------------------------------------------------------------------
> For Developers, A Lot Can Happen In A Second.
> Boundary is the first to Know...and Tell You.
> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
> http://p.sf.net/sfu/Boundary-d2dvs2
> _______________________________________________
> Pytables-users mailing list
> Pytables-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Pytables-users mailing list
Pytables-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] SQLite Virtual Tables

Reply via email to