[sqlite] Announcing the Madis project

Elefterios Stamatogiannakis Tue, 23 Feb 2010 09:28:41 -0800

Madis is a extensible relational database system built upon the SQLite 
database and with extensions written in Python (via APSW SQLite 
wrapper). Its is developed at:


http://madis.googlecode.com

Due to Madis’ SQLite core, the database format of Madis is exactly the 
same as SQLite’s one. This means that all SQLite databases are directly 
usable with Madis.

Madis' enables to quickly develop (in Python) and test new relational 
functions and virtual tables. SQL syntax extensions were also created to 
simplify queries which use these functions.

Madis' main goal is to promote the handling of data related tasks within 
an extended relational model. In doing so, it promotes the database, 
from a support role (storing and retrieving data), to being a full data 
processing system on its own. Madis already includes functions for file 
import/export, keyword analysis, data mining tasks, fast indexing, 
pivoting, statistics and workflow execution.

Some of the functionality of Madis is:

- Programmable (in Python) row functions (via APSW):

mterm> select detectlang('Il en est des livres comme du feu de nos foyers');
french

- Programmable (in Python) aggregate functions (via APSW):

mterm> select concatterms(a)
from (select "term1+term2" as a UNION select "term2 term3" as a);
term1+term2 term2 term3

- Programmable (in Python) virtual tables (via APSW):

mterm> select * from file('./demo/continents.tsv') limit 2;
Asia|AF
Europe|AL

- Multisets (functions that return multiple rows/columns):

mterm> select * from table1;
a             |b
-------------------------------------------
'car wood bike'| 'first group'
'car car wood'  |'first group'
'car wood'        |'first group'
'car wood ice'  |'first group'
'ice'                  |'second group'
'car ice'            |'second group'
'car cream'      |'second group'
'icecream ice car'  |'second group'

mterm> select b, freqitemsets(a, 'threshold:2', 'noautothres:1', 
'maxlen:2') from table1 group by b
b            | itemset_id | itemset_length | itemset_frequency | item
---------------------------------------------------------------------
first group  | 1          | 1              | 4                 | wood
first group  | 2          | 1              | 4                 | car
first group  | 3          | 2              | 4                 | car
first group  | 3          | 2              | 4                 | wood
second group | 1          | 1              | 3                 | ice
second group | 2          | 1              | 3                 | car
second group | 3          | 2              | 2                 | car
second group | 3          | 2              | 2                 | ice

- On the fly multidimensional indexing (the cache virtual table):

The index is based on kdtrees and is extremely fast with queries 
involving multiple constraints.

mterm> select country.c2, continent.c1
   from file('countries.tsv') as country,
        file('continents.tsv') as continent
   where country.c1=continent.c2;
Aruba|Americas
Antigua and Barbuda|Americas
United Arab Emirates|Asia
Afghanistan|Asia
. . . . . . . . .
Query executed in 0 min 2 sec 40 msec

mterm> select country.c2, continent.c1
   from file('countries.tsv') as country,
        (CACHE file 'continents.tsv') as continent
   where country.c1=continent.c2;
Aruba|Americas
Antigua and Barbuda|Americas
United Arab Emirates|Asia
Afghanistan|Asia
. . . . . . . . .
Query executed in 0 min 0 sec 71 msec

- Workflows:

mterm> exec flow file 'workflow.sql';

Above query uses Madis' SQL inverted syntax.

- Pivoting:

http://madis.googlecode.com/svn/publish/row.html#pivoting

--

All above functionality has been created via row/aggregate/vtable Python 
extensions (APSW offers these in very nice API), and the aforementioned 
SQL syntax extensions.

In practise Madis has been proven to be very fast in data analysis tasks 
and in the development of data processing workflows.

A little note:

The high quality of APSW's and SQLite's code, has helped immensely in 
developing Madis. We have strained both of these projects as much as we 
could, and they coped beautifully. We literally had queries that spanned 
multiple pages, which executed in seconds.


_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

[sqlite] Announcing the Madis project

Reply via email to