Hi Anthony, > I can see how the virtual table interface could be made to work with > PyTables, > but I guess I don't understand why you would want to. It seems like in this > case you are querying using SQL rather than the more expressive Python.
Yes, you'd be querying using SQL. SQL is a documented declarative syntax for queries over relations. Python offers many procedural routes to achieve e.g. joins, all of them custom. If (a == b) | (c==d) is more expressive to you than WHERE a=b OR c=d , then you can use SQLAlchemy [1], which wraps SQL in a Pythonic query syntax. > Moreover, you'd be sacrificing all of the 'H' in HDF5 features to obtain > this. What is the benefit of 'H'ierarchical that you have in mind? To me hierarchy seems less expressive than general relations. After all, file systems are hierarchical and you're going to HDF5 still (and losing the panoply of filesystem-based tools with it). So clearly, the differential benefit of HDF5 is not at all in the hierarchical character. Take a list of e.g. songs with a foreign key 'singer' pointing at one row in the table of singers, and a foreign key 'genre' pointing at the genre_songs table which in turns points to 'genres' (n:m) relationship. How does hierarchical help here? do you create a 'singer_name'/song table? or a 'genre name'/song ?. Most of the time the physical layout in the form of a hierarchy is just an annoyance. > Also, my sense is that there would be a fair bit of overhead in this > interface > layer, which might not get you the speed boost you desire. I could be wrong > about this though. I think you're right in the wrapping of the results via the Python interface to SQLite. I suspect you're not about the queries executed in the virtual table, because that is left for you to implement and thus you could turn the query terms (that are handed over to you) into in-kernel expressions if you so wish (http://www.sqlite.org/vtab.html) > If I saw a proof-of-concept implementation, I may grok better the purpose. > Do you have any code to share? No, but I have an example ER diagram which is only part of what I need. You are welcome to have a look at it[2] and tell me how you'd achieve to support the jungle of relationships there with the H of HDF5. In SQL I have a syntax to declare all those relationships. In HDF5 I must decide for one hierarchical cut of those relations and since it won't be enough, implement the relational layer on top of it, perhaphs using attrs to store paths everywhere. It can be done, but the support out of the box at this point for this is next to nil (maybe integrating something like recarray.joinby [5] would be useful?) It looks to me, at this moment, that as soon as the data model gets complicated HDF5 is in trouble, and as soon as very large, contiguous, read-only, datasets are involved relational RDBMSs are in trouble (subsetting, speed). Since this is not a happy situation, several people are interested in combining the strengths of both [3][4] and my e-mail was just highlighting that there may be a way to go that may make a self-contained, clear, understandable package for the scenarios where PyTables is most often deployed (single-user). Or I am not seeing something obvious? Cheers, Álvaro. -- [1] http://www.rmunn.com/sqlalchemy-tutorial/tutorial.html [2] http://dl.dropbox.com/u/2467197/ER-simple.png (yellow tables link to HDF5 data, or other tables with the real measurements, white tables are computed). [3] http://www.scidb.org/ [4] See p.26-29 and 32 http://www.itea-wsmr.org/ITEA%20Papers%20%20Presentations/2006%20ITEA%20Papers%20and%20Presentations/folk_HDF5_databases_pres.pdf [5] https://github.com/numpy/numpy/blob/master/numpy/lib/recfunctions.py#L826 > Be Well > Anthony > > On Thu, Apr 12, 2012 at 11:03 AM, Alvaro Tejero Cantero <alv...@minin.es> > wrote: >> >> Hi, >> >> The topic of introducing some kind of relational management in >> PyTables comes up with certain frequency. >> >> Would it be possible to combine the virtues of RDBMS and hdf5's speed >> via a mechanism such as SQLite Virtual Tables? >> >> http://www.sqlite.org/vtab.html >> >> I wonder if the required x* functions could be written for PyTables, >> or if it being in Python is an obstacle to this kind of interfacing >> with SQLite. >> >> Something like that would be a truly powerful solution in use cases >> that don't require concurrency. >> >> Cheers, >> >> -á. >> >> >> ------------------------------------------------------------------------------ >> For Developers, A Lot Can Happen In A Second. >> Boundary is the first to Know...and Tell You. >> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >> http://p.sf.net/sfu/Boundary-d2dvs2 >> _______________________________________________ >> Pytables-users mailing list >> Pytables-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Pytables-users mailing list > Pytables-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/pytables-users > ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ Pytables-users mailing list Pytables-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/pytables-users