Re: [sqlite] Semantic Database Design

Mov GP 0 Sun, 01 Jul 2007 12:01:18 -0700

2007/7/1, Trevor Talbot <[EMAIL PROTECTED]>:


On 7/1/07, Mov GP 0 <[EMAIL PROTECTED]> wrote:

> I'm searching for a database backend for a .NET project that involves a
> semantic database backend. The problem is that I want to store any
generic
> kind of custom .NET datatype and not only SQL-primitives.

>    CREATE TABLE @TableName
>    (
>       Object INT,
>       Data @Datatype
>    )
>
> The @Datatype Parameter is a special object that implements something
like a
> "ISQLiteDatatype" Interface that provides information about the class
that
> is needed for getting information about the used datatype. This
includes:
>
>    -   how to serialize (use also ISerializeable)
>    -   how to sort (use also ICompareable)
>    -   is it possible to do byte ordering with the serialized data?
>    -   minimal and maximal Datasize and -lenght?
>    -   etc.
>
> SQL is not capable to handle this issue, but I think it should be
possible
> using the API. At least in MS SQL Server it is possible, but you can't
> create new datatypes on the fly, but you have to upload and register the
> datatype's class to the server as DLL.

I'm not sure I understand what you're looking for here.  The SQL
interface (for any DBMS) is not going to act like a .NET object; that
behavior is contained within your application.  What exactly are you
trying to gain with custom data types within the DB?  Are you trying
to standardize an interface for other applications to use when talking
to the same DB?



The implementation of this is also possible with C++. I just see it from the
.NET perspective because I'm a .NET developer. Currently I have no knowledge
about how SQLite is working, because I've never used it. I'm currently
looking at serval possible database candidates including also Oracle and MS
SQL Server and want to learn about the strenght and weaknesses of each.

SQLite might be better suited for this issue, because as part of the
> application it can access the sorting and serialisation functions
directly
> rather than importing DLLs. But the question is how I can do that.

SQLite is an embedded database engine, so when you need to make
customizations like this, nobody other than your application is going
to be using the DB.  Since you're not modifying fundamental DBMS
functionality, you don't need to do anything to SQLite; just build a
wrapper in your application that uses BLOBs to store everything, and
build the rest of your application on top of that wrapper.

The same approach will work with pretty much any other DBMS as well.



What I want is more efficient sorting and searching. Therefore I want tables
that are optimized for sorting, searching, and storing a specific datatype.
Using only BLOBS is a possible solution. But searching ie. a number within
unsorted and variable size (from 1Bi to 2GiBi) BLOB table is not as
efficient than searching in a specialized fixed-size-integer table.

As example you can think about sorting geographical coordinates. In an
classical RDBMS this would look like:

  CREATE TABLE Pos
  (
     Pos_ID INT,
     lat FLOAT,
     long FLOAT
  )

It is very easy and effective to sort this table for latitude and longitude.
You can also create such a table automatically using XML Schema. But when
you have an OWL ontology you can't create the table automatically anymore.
You would have to write specialized proxy classes that are using SPARQL
queries against the data to result in tables. The queries would be
specialized on specific schemes.

Now we come to the possibility to store this as BLOB. Because RDF can use
any datatype you will have to store everything from booleans to images,
music and videos. You can't make any good algorithms for sorting and
searching that mess. Searching for a specific number would require to search
the whole database in a serial manner and thus resuling in a O(N)
searchtime, where N is the size of the whole database. The really advantage
of using BLOBs is the speed while inserting data into the table, because
there is no lookup needed to determine where to store the data.

So my idea is to partition the data into separate tables. This will also
result in a float table. Because you can sort this table you can also make a
book search. Therefore you will result in a searchtime is about O(log(N)),
while N is the number of integers rather than the size of the whole
database.

You can realize such a thing by creating a table for each primitive datatype
by hand and serialize the objects into this elementar datatypes and the
semantics of the object. But because RDF has a very extensible Datatype
system a approach that handles also user defined datatypes would boost the
performance of the serialisation and unserialisation process.

Further speedup would be possible to partition the tables of the primitive
types even further. Ie. there might be a partition that holds ie. floats for
latitute and another partition for holding floats for longitude. This
approach might reduce the searching time almost to the time a search in a
specialised relational table takes.

In the example of coordinates the RDBMS will be still faster because you get
longitude and latitude with only one search rather than having the need of
making multiple searches. As example I will use the search for points that
located in a specific area.

In the RDBMS you will have to (1) simply access the points table (2)
[presort for longitude and] search all points where the longitude is between
two values and save the result in a temporary table and (3) [presort the
latitude and] search all of the resulting points having its latitude in a
given range.

In the Semantic Database you will have to (1) search all objects that are
points and store them in a specific table (2) query all of the latitudes of
the points (jumping between tables each time) and keep only points having a
latitude in the given range and (3) query all longitudes (jumping between
tables each time) of the points and keep only the points having the
longitude in another given range.

But while RDBMS are very effective for sorting and searching, but they are
way not as flexible. What I want is a hybrid of the two solutions that keeps
the flexibility of a semantic database but also with improved speed using
presorting and partitioning of the data.

The question is if I can use SQLite for realizing such a database layout.


ys, MovGP0

Re: [sqlite] Semantic Database Design

Reply via email to