Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Salvatore Insalaco
2008/5/22 Marc Weber [EMAIL PROTECTED]:
 I'd like to illustrate two different ideas using a small example:
 (A)
data CD = CD { title :: String, tracks :: [ Track ] }
data Track = Track { track :: String, cd :: CD }
data PDB = PDB { cds :: Set CD, tracks :: Set Track }

 because it's not using foreign ids but kind of pointers I'll call this
 the pointer method

This doesn't look like a relational structure at all in Haskell.
Let's take the CD and Track relations. In a relational database you
have something like:
CD (1, 'Querying about you')
Track (1, 'Inserting some love', 1)
Track (2, 'Updating my feelings', 1)
Track (3, 'Deleting my hopes', 1)

In an imperative language you can do something similar in memory using
objects (you can in haskell to with IORefs and so on, but let's stay
on data). You get something like:

0x000 CD('Querying about you')
0x004 Track('Inserting some love, 0x004)
...

In Haskell when you say:
data Track = Track { track :: String, cd :: CD }

You are not storing in Track a reference, a pointer or something
similar to a CD, you are storing a *value* (low level you probably
have a pointer, but you have not pointer semantics). As you noticed,
you cannot update the CD title without changing each Track. That's a
way to store information, and a good way too, but it's not a
relational structure by any extent.

If you want to use this structure for your relational data you need two things:
1) Something that will convert from a value-based representation of
data to something relational (aka ORM in the OO world... a FRM? VRM?).
2) A relational storage (internal or external).

If you want to use normal Haskell ADT, are you sure that a
relational storage is what you want? Keeping that in memory doesn't
give you some advantages of relational databases (e.g. uniform
representation), and the impedance between the functional and the
relational world is not easy to manage.

Maybe I misunderstood what you are trying to accomplish, and you only
want to do a generic data structure with fast lookups on the content
of the items? Or do you really need relational semantics?

Salvatore
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Marc Weber
On Thu, May 22, 2008 at 08:16:54AM +0200, Salvatore Insalaco wrote:
 2008/5/22 Marc Weber [EMAIL PROTECTED]:
  I'd like to illustrate two different ideas using a small example:
  (A)
 data CD = CD { title :: String, tracks :: [ Track ] }
 data Track = Track { track :: String, cd :: CD }
 data PDB = PDB { cds :: Set CD, tracks :: Set Track }
 
  because it's not using foreign ids but kind of pointers I'll call this
  the pointer method
 
 This doesn't look like a relational structure at all in Haskell.
 Let's take the CD and Track relations. In a relational database you
 have something like:
 CD (1, 'Querying about you')
 Track (1, 'Inserting some love', 1)
 Track (2, 'Updating my feelings', 1)
 Track (3, 'Deleting my hopes', 1)
 
 In an imperative language you can do something similar in memory using
 objects (you can in haskell to with IORefs and so on, but let's stay
 on data). You get something like:
 
 0x000 CD('Querying about you')
 0x004 Track('Inserting some love, 0x004)
You are right. But ghc does represent those things as pointers.. :)
So it is indeed unless you ask ghc to not use (eg by using strict fields
etc), correct? IORefs are not that good because you can't read them
within STM. But you are right: Using IORefs was my first idea.

 similar to a CD, you are storing a *value* (low level you probably
 have a pointer, but you have not pointer semantics). As you noticed,
 you cannot update the CD title without changing each Track. That's a
 way to store information, and a good way too, but it's not a
 relational structure by any extent.
I agree it's not in general and higly implementation dependant
Maybe I'm totally wrong. I imagine ghc having some internal
representation of data types which come close to 

struct CD {
  Track * tracks;
  title ** char;
} cd;

struct Track {
  cd * CD;
  title ** char;
} track;

(does'nt compile but you get the idea. My C knowldge is good enough for
reading only).

So if you start seeing the whole database as list of connected structs
vio pointers adding / deleting/ inserting is quite the same as adding
some nodes to a Data.Map. You replace the nodes you have to replace and
finally get a poiter pointing to te new database state. As long as you
don't loose the original pointer you can easily rollpack.
Consider 

let x = Cd ...
forkIO $ ( do something with x } -- (1)
print x -- (2) 

How can ghc know when running line (2) that (1) hasen't changed the
record? I see two solutions:
a) give the forked process a copy (Then my design will collapse)
   but this is expensive to copy data without knowing you ned to
b) use pointers and replace x ony on updating. Thus if (1) changes the
   title a new struct wil be created poiting to the old list but a new
   title String. line (2) doesn't have to care at all.
I'm guessing that analyzing a) when values have to be copied is
complicated - thus b) is implemented ? quicksilver has told me so as
well - But I'm not sure that's why I'm asking.

 If you want to use normal Haskell ADT, are you sure that a
 relational storage is what you want? Keeping that in memory doesn't
 give you some advantages of relational databases (e.g. uniform
 representation), and the impedance between the functional and the
 relational world is not easy to manage.
 
 Maybe I misunderstood what you are trying to accomplish, and you only
 want to do a generic data structure with fast lookups on the content
 of the items?
Exactly.. Of course I only want a generic data structure with fast
lookups and content of items.. But I don't want to write the insert
delete and update functions for each table again and again..
Does this already exist?

Marc Weber
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Marc Weber
On Wed, May 21, 2008 at 06:07:15PM -0700, Dan Weston wrote:
  Consider SQLite [1], which is a software library that implements a 
[..]
  It has a C API which you can wrap as needed with the FFI, and you wouldn't 
  need more than a dozen or so functions to start with (it understands SQL 
  too).
So it has kind of API enabling me inserting rows without using SQL?
I still have to do some marshalling to / from C and synchronize db
layout and haskell data types.

Marc Weber
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Ketil Malde
Salvatore Insalaco [EMAIL PROTECTED] writes:

 This doesn't look like a relational structure at all in Haskell.

I believe you are abusing terminology here.  'Relation' refers to a
table (since it represents a subset of AxBxC.., i.e. a relation), not
to references between tables.

Mutability and mutability of references is of course important in most
relational databases, but I'm not convinced an immutable database
wouldn't be interesting and useful in a functional programming
language. 

I've always (well not that I use them often) been annoyed at RDBMS
lack of discriminated unions.

The TH based approach by HAppS looks cool, but I think simply a
slightly more general Data.Map (supporting multiple indices, search
by named field and so on) could be a useful thing.

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Marc Weber
On Thu, May 22, 2008 at 10:56:03AM +0200, Ketil Malde wrote:
 Salvatore Insalaco [EMAIL PROTECTED] writes:
 
  This doesn't look like a relational structure at all in Haskell.
 
 I believe you are abusing terminology here.  'Relation' refers to a
Yes. Sorry. I thought the relational in relational databases refers to
references between tables. But you a right a relation is a set of rows
forming a table. Thanks for clarifying. 

Thanks Marc
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Salvatore Insalaco
 Consider

 let x = Cd ...
 forkIO $ ( do something with x } -- (1)
 print x -- (2)

 How can ghc know when running line (2) that (1) hasen't changed the
 record? I see two solutions:
 a) give the forked process a copy (Then my design will collapse)
   but this is expensive to copy data without knowing you ned to
 b) use pointers and replace x ony on updating. Thus if (1) changes the
   title a new struct wil be created poiting to the old list but a new
   title String. line (2) doesn't have to care at all.

GHC knows that because in Haskell isn't possible to update x. x is
not a variable, it's a binding.
To put it simply: with IORefs (and STRefs, MVars, ...) you have
references to values that you can change (inside their respective
monads), much like variables, but data declarations are values, not
references to values (even if GHC stores them as pointers you cannot
treat them as such), so you cannot update it.

So, in your example, you have more or less a relation (CD) where all
the columns are part of primary key (and so they are not mutable).

Salvatore
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Marc Weber
Hi Salvatore

On Thu, May 22, 2008 at 11:01:01AM +0200, Salvatore Insalaco wrote:
  Consider
 
  let x = Cd ...
  forkIO $ ( do something with x } -- (1)
  print x -- (2)
 
  How can ghc know when running line (2) that (1) hasen't changed the
  record? I see two solutions:
  a) give the forked process a copy (Then my design will collapse)
but this is expensive to copy data without knowing you ned to
  b) use pointers and replace x ony on updating. Thus if (1) changes the
title a new struct wil be created poiting to the old list but a new
title String. line (2) doesn't have to care at all.
 
 GHC knows that because in Haskell isn't possible to update x. x is
 not a variable, it's a binding.
 To put it simply: with IORefs (and STRefs, MVars, ...) you have
 references to values that you can change (inside their respective
 monads), much like variables, but data declarations are values, not
 references to values (even if GHC stores them as pointers you cannot
 treat them as such), so you cannot update it.
Sorry - maybe I'm unable to express using the correct terminology..
So I'll just give a small example how I think it could magically work?

  data CD = CD { title :: String, tracks :: [ Track ] }
  data Track = Track { track :: String, cd :: CD }
  data PDB = PDB { cds :: Set CD, tracks :: Set Track }

Let's fill the database with 1 track and a cd:

0x3 = pointer to DB rec
0x1: adress of CD
0x5: adress of Track
0x4, 0x9, 0x9: start adress of linked list connected by pointers..
In the final solution should use finger trees or such to speed up
deletion / replacing elements

0x3 database:
0x8 cds :
tuple1 0x1 : (0x6 My song) (0x4 [ 0x5, ... ])
  ^ pointer to str^ pointer to track list, 0x5 
= pointer to track

0x9 tracks:
tuple1 0x5 : ( 0x7 track 1) 0x1 
^ reference to cd

Now I query the track, and update it (replacing the title)..
It's a little bit tricky, because when updating the track I need to
update the cd as well (circular referency). All new pointers are
starting from 0x20

So in haskell it would look like this:
let updatedCd = 0x22  CD (0x6 My song)  (0x20 ( 0x23 : ...)
updatedTrack = 0x23 Track ( 0x21 updated track title ) 0x22
in (0x27) DB (0x24 (updatedCd:otherCds)) (0x25 
(updatedTrack:otherTracks))

Now my new address to access the database is 0x25. So pretty every
adress has been changed but 0x6, ..., otherCds and otherTracks
A query running using db 0x3 will not notice any change on its snapshot.
Are these actions called rebinding?

Of course if you have a lot of relations writing this
let 
in
will become tedious and error prone.. That's why I'd like to use
template haskell to automatically derive it.

Thanks for listening

Marc Weber
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Salvatore Insalaco
2008/5/22 Marc Weber [EMAIL PROTECTED]:
 So in haskell it would look like this:
let updatedCd = 0x22  CD (0x6 My song)  (0x20 ( 0x23 : ...)
updatedTrack = 0x23 Track ( 0x21 updated track title ) 0x22
in (0x27) DB (0x24 (updatedCd:otherCds)) (0x25 
 (updatedTrack:otherTracks))

Mmmm I don't think that this is a good way to go.
Let me do a counter-example:

data A = A String
data B = B String [A]
data C = C String [B]
data D = D String [C]

Suppose to have some As, Bs, Cs, Ds in your database. Now you want to
update the String of A. As you cannot update stuff in Haskell
mantaining the same pointer, you've got a new A. So you must find
all Bs that had this A in their list, and update that.
Unfortunately lists are not mutable too, so you are creating a new
list; so you need to create new containing Bs too. But then you must
change Cs... and so on.
A little change like changing the String in A requires updating the whole DB.

Salvatore
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Marc Weber
On Thu, May 22, 2008 at 12:48:42PM +0200, Salvatore Insalaco wrote:
 2008/5/22 Marc Weber [EMAIL PROTECTED]:
  So in haskell it would look like this:
 let updatedCd = 0x22  CD (0x6 My song)  (0x20 ( 0x23 : ...)
 updatedTrack = 0x23 Track ( 0x21 updated track title ) 0x22
 in (0x27) DB (0x24 (updatedCd:otherCds)) (0x25 
  (updatedTrack:otherTracks))
 
 Mmmm I don't think that this is a good way to go.
 Let me do a counter-example:
 
 data A = A String
 data B = B String [A]
 data C = C String [B]
 data D = D String [C]
 
 A little change like changing the String in A requires updating the whole 
 DB.
You're right. Very bad idea unless you only insert once a year and only
have queries the whole day.
The only way to fix this is by separating relational data from record
data.
 data A = Map RecordDataA RelationalDataA
 data B = Map RecordDataB RelationalDataB

So when changing a field in RecordDataA only the relational data B would
have to be updated.. but I see that that's not that good either.
Fine. Then the only way to go is using uniq ids as keys the way it's
already done everywhere

Thanks
Marc W
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Marc Weber
On Thu, May 22, 2008 at 03:34:36PM +0200, Marc Weber wrote:
 On Thu, May 22, 2008 at 09:11:28AM -0400, Isaac Dupree wrote:
   to whoever in this thread hasn't realized it:
   Map String (Map Int Foo) == Map (String,Int) Foo
   (at least to an approximation)
There is another difference if you want to query ,=,,=
say String = city and Int = age.
Now take Map (Int, String) rec and use this index to filter all tuples
having an age = 80 and beeing city

Marc
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-22 Thread Isaac Dupree

Marc Weber wrote:

On Thu, May 22, 2008 at 03:34:36PM +0200, Marc Weber wrote:

On Thu, May 22, 2008 at 09:11:28AM -0400, Isaac Dupree wrote:

 to whoever in this thread hasn't realized it:
 Map String (Map Int Foo) == Map (String,Int) Foo
 (at least to an approximation)

There is another difference if you want to query ,=,,=
say String = city and Int = age.
Now take Map (Int, String) rec and use this index to filter all tuples
having an age = 80 and beeing city


there are two problems:
Data.Map doesn't have a very good API for that (splitLookup is about the 
best you can get for ranges)
Whether tupled or not, the order of the two indices matters (Int,String) 
vs. (String,Int) for what you can look up efficiently.  It's essentially 
a binary tree either way (Map x (Map y rec)) or (Map (x,y) rec), sorted 
in the same order.  (tuples sort by lexicographical order)


-Isaac
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] relational data representation in memory using haskell?

2008-05-21 Thread Marc Weber
I'm kind of stuck that's why I'm posting here to ask wether this makes
sense at all, maybe someone else has already done it?

What I'd like to have:

Some way representing relational data which is typically stored in
databases such as Postgresql..

Rewriting something like Postgresql in haskell would take ages..
So I'd be satisfied with having in memory representation only (this
would fit the HAppS state system very well .. :)
Why ?
* type safety
* less conversions compared to SQL data
* no need to switch processes, parse SQL etc so maybe it's even faster?
  (a small benchmark showed that inserting 2 Ints into a list was 8
   times faster than using MySQL parsing 2 INSERT INTO x (1)
   statements )


I'd like to illustrate two different ideas using a small example:
(A)
data CD = CD { title :: String, tracks :: [ Track ] }
data Track = Track { track :: String, cd :: CD }
data PDB = PDB { cds :: Set CD, tracks :: Set Track }

because it's not using foreign ids but kind of pointers I'll call this
the pointer method

using uniq ids it would look like this:
(B)
data CD = CD { id : Int,  title :: String, tracks :: [Int ] }
data Track = Track { trackId :: Int,  track :: String, cd :: Int }
data IDB = IDB { cds :: Map Int CD, tracks :: Map Int Track }
I will call it I DB (I = using ids)

PDB: pro : * less work when doing joins (no need to look foreign rows up)
 con : * you need uniq ids or such when serializing to disk
   * When updating a track you'll also have to update the pointer
 stored in cds. and if you had another table shelfs.. this
 had to be updated as well..

IDB: the other way round


I find the idea not using any lookups when using joins appealing.

Of course having a simple
data Table = Table Map UniqId Rec 

isn't enough, sometimes you need more than one index or even a multi index:
data Table = Table { byId :: Map Int Rec
 , byNameAndAge :: Map String (Map Int (Set Rec)) }

Note that I've used Set here as well because this index does'nt have to
be uniq! starting to write an
insertTable :: Table - Rec - Table
more than twice is getting tedious..

Of course you can start using some type hackery to insert a rec
into all maps automatically.. but you'll get into trouble making
the type system apply the best index not the first matching one.
(I bet this could be done using HList etc somehow as well.. )
So my current attempt is defining the database using some data types and
make template haskell derive those insertIntoTable and update functions.

I've added the draft below. But before continuing spending much time on
it I'd like to get your advice: Is there a chance that it will pay off?

Some general considerations:
haskell solution con:
haskell can get close to C but in general it may be 10 times slower 
when
not caring too much about design or writing low level (see recent thread
about md5 or one where David Roundy has said something about a matrix 
thread:
only 10 times slower?)

Using a garbage collector on database data (some hundred MB)
might not be the optimal way because I feel you can tell exactly
when you no longer need a piece of allocated memory here?
So some time might be wasted.

projects tend to run longer as expected.. And if data no longer
fits into memory .. :(... - bad performance
I think systems such as postgresql do scale much better if you
have some gbs of data and only use the most recent X records
frequently.. So maybe you'll have to spend time later which
you've won by using a haskell relational data representation in
memory only.. Another solution: use clusters - I don't have any
experience.
 
pro:
much more safety (STM, type system ..) there are less
possibilities making compared to C / PHP etc

Do you also think (A) is more interesting because some load (looking up
foreign keys) is moved on insert / delete and update operations taking
less time in but are called more frequently thus maybe reducing peak
load on queries?

Of course some time would have to be spend on queries wich might
look like this:
let queryresult = $(query ( tables + constraints + relations ) ) db
automatically generating the query function taking into account expected
index cardinality etc..

Any comments, suggestions, links to existing solutions (except coddfish,
haskelldb) ?

Marc Weber


draft
= types represeting tables and db 
module RDMH.Types where
import Language.Haskell.TH
import Language.Haskell.TH.Syntax

data Uniqueness = Uniq | NotUniq deriving (Show, Eq)
data ModifyMode = InsertOnly | UpdateInsert | UpdateInsertDelete deriving 
(Show, Eq)

type TypeS = String -- a name of a data type (data A = ..)

data Index = I {
uniqueness :: Uniqueness
, key :: Exp  -- a fuction rec - 

Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-21 Thread Marc Weber
On Wed, May 21, 2008 at 05:05:21PM -0700, Jeremy Shaw wrote:
 At Thu, 22 May 2008 01:04:24 +0200,
 Marc Weber wrote:
 
  Some way representing relational data which is typically stored in
  databases such as Postgresql..
  
  Rewriting something like Postgresql in haskell would take ages..
  So I'd be satisfied with having in memory representation only (this
  would fit the HAppS state system very well .. :)
 
 Are you familiar with the HAppS IxSet library? 
Yes - not with all that sybwith-class stuff though.
There are some issues:
its dynamic : doesn't this waste some CPU cycles?
no multi indexes..
maybe some space leaks because the data type containing the Maps is
build after each filter maybe leaving unevaluating chunks - Saizan has
told me about it on HAppS.. And you can't extend it to the degree I'd
like to (eg throw a query at it and let the system figure out which
indexes to use)
And last but not least: It does'nt support relations at all yet.
So all the effort adding / checking foreign keys etc has to be done
anyway.

Thanks Marc
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] relational data representation in memory using haskell?

2008-05-21 Thread Dan Weston
Consider SQLite [1], which is a software library that implements a 
self-contained, serverless, zero-configuration, transactional SQL 
database engine.


It is embeddable, can reside completely in memory (including the data), 
and can be saved and restored to disk when needed. It neatly fills the 
niche between maps and a client/server database model.


It has a C API which you can wrap as needed with the FFI, and you 
wouldn't need more than a dozen or so functions to start with (it 
understands SQL too).


[1] http://www.sqlite.org/

Marc Weber wrote:

On Wed, May 21, 2008 at 05:05:21PM -0700, Jeremy Shaw wrote:

At Thu, 22 May 2008 01:04:24 +0200,
Marc Weber wrote:


Some way representing relational data which is typically stored in
databases such as Postgresql..

Rewriting something like Postgresql in haskell would take ages..
So I'd be satisfied with having in memory representation only (this
would fit the HAppS state system very well .. :)
Are you familiar with the HAppS IxSet library? 

Yes - not with all that sybwith-class stuff though.
There are some issues:
its dynamic : doesn't this waste some CPU cycles?
no multi indexes..
maybe some space leaks because the data type containing the Maps is
build after each filter maybe leaving unevaluating chunks - Saizan has
told me about it on HAppS.. And you can't extend it to the degree I'd
like to (eg throw a query at it and let the system figure out which
indexes to use)
And last but not least: It does'nt support relations at all yet.
So all the effort adding / checking foreign keys etc has to be done
anyway.

Thanks Marc
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe