Re: [HACKERS] Ad-hoc table type?

2008-10-06 Thread Decibel!

On Sep 29, 2008, at 6:16 AM, [EMAIL PROTECTED] wrote:

The hstore module, as I said,
looks really cool, I've contemplated something like it. I have a  
module
provides a set of accessors for an XML text column that works  
similarly,
but it parses the XML on each access and the application has to  
create the

XML. (I have XML creation modules for Java, PHP, C++, and standard C
bindings.)



Yeah, ad-hoc storage is always a huge problem in databases. For  
years the only way to do it was with EAV, which is tricky at best.


In my experience, there typically isn't an un-bounded set of possible  
attribute names. It's usually fairly constrained, but the problem is  
that you never know when a new one will just pop up.


It's very common right now for people to use either XML or YAML to  
deal with this. That has it's own set of problems.


There's a few major improvements to be had here:

1: We should have a flexible storage mechanism that can either be  
used with it's own native syntax, or can interface to other hash  
formats such XML or YAML. Of course, both XML and YAML allow an  
obscene amount of nesting, etc, but generally people are only using  
these in a very simple form to emulate a hash table. It would be  
interesting to allow casting hstore to and from other proprietary  
hash formats as well, such as perl hashes.


2: Storage of attribute names can quickly become *very* expensive.  
Even with short 6-10 character names, you can easily end up using  
half the storage for just attribute names. I'd like to see hstore  
support storing attribute names in a lookup table, or using some  
other means to reduce the storage overhead.


3: Related to #2, storing numbers stinks because you end up burning 1  
byte per digit. Some concept of data type for an attribute would  
improve this.


Sadly, I don't have time to work on any of this. But these things are  
issues to my company, and we do have money. ;)

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Ad-hoc table type?

2008-09-29 Thread tomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Sun, Sep 28, 2008 at 09:24:48PM -0700, David E. Wheeler wrote:
 On Sep 28, 2008, at 17:46, Tom Lane wrote:

 BTW, I think it is (or should be) possible to create an index on
 hstore-'mycol', so at least one of the reasons why you should *need*
 to switch to a real database column seems bogus.

[...]

 I'm not sure what that means. Can you create normal btree or hash indexes 
 on hstore columns? And is the index useful for both `@` and `?`?

That means that those operations are supported by a GiST (or GIN) index,
that is:

  find the records where col contains 'foo = 1, bar = 2'

is supported by the index. Likewise for is contained in and has key.
It's a bit like having mini-indexes on all keys (although I guess not
that efficient). Pretty cool, I'd say.

Regards
- -- tomás
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD4DBQFI4HnHBcgs9XrR2kYRAgmiAJ0U9UD8KqX5vLXOGBlW+WwPzzIpEQCY1caS
F4Uug9QD6e0Jw18EvNm28g==
=f8q5
-END PGP SIGNATURE-

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-29 Thread pgsql
 What you're talking about is a document based database like
 StrokeDB, CouchDB. With hstore you don't need to parse content of
 'aggregate' column, it provides necessary methods. Also, we tried
 to speedup selects using indexes. Probably, we need to refresh our
 interest to hstore, do you have any actual proposals ?

Proposals, not at this point. I'm trying to decide (a) if I have the time
and (b) do I do it with Postgres or SQLite. The hstore module, as I said,
looks really cool, I've contemplated something like it. I have a module
provides a set of accessors for an XML text column that works similarly,
but it parses the XML on each access and the application has to create the
XML. (I have XML creation modules for Java, PHP, C++, and standard C
bindings.)

It is more a conflict of data ideology, IMHO. There is a class of data
that is logically on the same level as other data, but is forced into a
secondary storage methodology. It isn't a pressing need as there are work
arounds, but don't you think a cleaner interface make sense? Also, what is
the overhead for the secondary storage mechanism? I think it would make
the life of application developers easier.



 Oleg

 On Sun, 28 Sep 2008, [EMAIL PROTECTED] wrote:

 [EMAIL PROTECTED] writes:
 Something like this:

 create adhoc table foo ();

 insert into foo (name, rank, serial) values ('joe', 'sargent', '42');

 In an ad-hoc table type, when an insert is made, and a column is not
 found, then a new varchar column is added.

 I know the idea has a lot of holes, and is probably a bad idea, but it
 answers an important problem of easily mapping programmatic types to a
 database.

 Seems like a table with one contrib/hstore column might be more
 relevant
 to this guy's idea of how to do database design.


 That's actually a very cool module, I hadn't seen it before. I've
 considered writing something like it, but more XML centric, but I'm not
 sure it answers the concept.

 I'm not sure if you have dealt with web site sessions and object
 persistence crap, but its a pain to get up and running and improving
 performance is a drag. Web guys tend to know very little about databases
 and tend, sadly, not to be very inquisitive about such things.

 Web session and user attribute objects are typically stored in a
 database
 as XML, JSON, or some other aggregated format in a single column
 (hstore).
 That works great for when you just need to access the data by the key,
 but
 if you want to use the data outside the web application for something
 like OLAP, you have to decide which attributes reside in the aggregate
 column or get promoted to a full fledged column. That's why you'll see
 tables with username, passwdhash, email, etc. in addition to an
 aggregated
 column of things like screen template, age, etc.

 So, how do you have a table of a generally arbitrary number of columns
 without creating some sort of aggregate column?  With an aggregate
 column,
 the data isn't on the same level as real column data, so you need to
 parse
 the aggregate to extract a value, and you have to do that for each
 value.
 On top of that, you then have to explain your aggregate strategy to the
 web guys.

 Being able to insert arbitrary named values, and extracting them
 similarly, IMHO works better and more naturally than some external
 aggregate system built on a column. I know it is a little outside the
 box thinking, what do you think?



   Regards,
   Oleg
 _
 Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
 Sternberg Astronomical Institute, Moscow University, Russia
 Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
 phone: +007(495)939-16-83, +007(495)939-23-83

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-29 Thread David E. Wheeler

On Sep 28, 2008, at 23:46, [EMAIL PROTECTED] wrote:

I'm not sure what that means. Can you create normal btree or hash  
indexes

on hstore columns? And is the index useful for both `@` and `?`?


That means that those operations are supported by a GiST (or GIN)  
index,

that is:

 find the records where col contains 'foo = 1, bar = 2'

is supported by the index. Likewise for is contained in and has  
key.

It's a bit like having mini-indexes on all keys (although I guess not
that efficient). Pretty cool, I'd say.


Yeah, that does sound good. I look forward to having an excuse for  
playing with this type…


Best,

David


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Ad-hoc table type?

2008-09-28 Thread pgsql
I was in a discussion with someone about the difference between ad-hoc
storage systems and SQL. Yes, I know, I was rolling my eyes as well. One
thing did strike me though was the idea that a table could contain a
variable number of columns.

Something like this:

create adhoc table foo ();

insert into foo (name, rank, serial) values ('joe', 'sargent', '42');

In an ad-hoc table type, when an insert is made, and a column is not
found, then a new varchar column is added.

I know the idea has a lot of holes, and is probably a bad idea, but it
answers an important problem of easily mapping programmatic types to a
database.

Anyone think its interesting?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Something like this:

 create adhoc table foo ();

 insert into foo (name, rank, serial) values ('joe', 'sargent', '42');

 In an ad-hoc table type, when an insert is made, and a column is not
 found, then a new varchar column is added.

 I know the idea has a lot of holes, and is probably a bad idea, but it
 answers an important problem of easily mapping programmatic types to a
 database.

Seems like a table with one contrib/hstore column might be more relevant
to this guy's idea of how to do database design.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread pgsql
 [EMAIL PROTECTED] writes:
 Something like this:

 create adhoc table foo ();

 insert into foo (name, rank, serial) values ('joe', 'sargent', '42');

 In an ad-hoc table type, when an insert is made, and a column is not
 found, then a new varchar column is added.

 I know the idea has a lot of holes, and is probably a bad idea, but it
 answers an important problem of easily mapping programmatic types to a
 database.

 Seems like a table with one contrib/hstore column might be more relevant
 to this guy's idea of how to do database design.


That's actually a very cool module, I hadn't seen it before. I've
considered writing something like it, but more XML centric, but I'm not
sure it answers the concept.

I'm not sure if you have dealt with web site sessions and object
persistence crap, but its a pain to get up and running and improving
performance is a drag. Web guys tend to know very little about databases
and tend, sadly, not to be very inquisitive about such things.

Web session and user attribute objects are typically stored in a database
as XML, JSON, or some other aggregated format in a single column (hstore).
That works great for when you just need to access the data by the key, but
if you want to use the data outside the web application for something
like OLAP, you have to decide which attributes reside in the aggregate
column or get promoted to a full fledged column. That's why you'll see
tables with username, passwdhash, email, etc. in addition to an aggregated
column of things like screen template, age, etc.

So, how do you have a table of a generally arbitrary number of columns
without creating some sort of aggregate column?  With an aggregate column,
the data isn't on the same level as real column data, so you need to parse
the aggregate to extract a value, and you have to do that for each value.
On top of that, you then have to explain your aggregate strategy to the
web guys.

Being able to insert arbitrary named values, and extracting them
similarly, IMHO works better and more naturally than some external
aggregate system built on a column. I know it is a little outside the
box thinking, what do you think?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Being able to insert arbitrary named values, and extracting them
 similarly, IMHO works better and more naturally than some external
 aggregate system built on a column. I know it is a little outside the
 box thinking, what do you think?

I'm failing to see the point.  Allowing columns to spring into existence
without any forethought seems to me to be all minuses and no pluses
worth mentioning.

* What if the column name is just a typo?

* What datatype should it have?  (Always varchar is just lame.)

* Should it have an index?  If so, should it be unique?

* If you keep doing this, you'll soon find yourself reading out
unbelievably wide tables (lots of columns), which won't be especially
easy or efficient to process on either the backend or the client side.
Plus you might run into the max-columns-per-tuple limit.

If you've expended enough thought to be sure that the column is not just
a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
command to tell the database the results of your genius.

I do see the point that switching from member of an hstore column to
real database column is pretty painful, but I don't see that allow
columns to spring into existence solves that in any meaningful way.
Is there some other way we could address such conversions?

BTW, I think it is (or should be) possible to create an index on
hstore-'mycol', so at least one of the reasons why you should *need*
to switch to a real database column seems bogus.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread Mark Mielke

Not that I'm agreeing with the direction but just as a thinking experiment:

Tom Lane wrote:

[EMAIL PROTECTED] writes:
  

Being able to insert arbitrary named values, and extracting them
similarly, IMHO works better and more naturally than some external
aggregate system built on a column. I know it is a little outside the
box thinking, what do you think?



I'm failing to see the point.  Allowing columns to spring into existence
without any forethought seems to me to be all minuses and no pluses
worth mentioning.

* What if the column name is just a typo?
  


If it's a field in a data structure from a language such as Java, it's 
not a typo.



* What datatype should it have?  (Always varchar is just lame.)
  


SQLite uses always varchar and it doesn't seem to be a problem. For 
simpler numbers like 0, the text form can be more compact, and the 
database may be portable across different hardware architectures.



* Should it have an index?  If so, should it be unique?
  


It might be cool for indexes to automatically appear as they become 
beneficial (and removed as they become problematic). Unique is a 
constraint which should be considered separate from whether it should be 
an index or not. I don't know if it would be useful or not.



* If you keep doing this, you'll soon find yourself reading out
unbelievably wide tables (lots of columns), which won't be especially
easy or efficient to process on either the backend or the client side.
Plus you might run into the max-columns-per-tuple limit.
  


Introduce variable field-order for tuples? Only provide values if 
non-null? :-)



If you've expended enough thought to be sure that the column is not just
a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
command to tell the database the results of your genius.

I do see the point that switching from member of an hstore column to
real database column is pretty painful, but I don't see that allow
columns to spring into existence solves that in any meaningful way.
Is there some other way we could address such conversions?

BTW, I think it is (or should be) possible to create an index on
hstore-'mycol', so at least one of the reasons why you should *need*
to switch to a real database column seems bogus.
  


I find the Oracle nested table and data structure support enticing 
although I do not have experience with it. It seems like it might be a 
more mature implementation of hstore? If hstore had everything that was 
required in terms of performance or flexibility, we wouldn't need fixed 
columns at all?


But yes - I tend to agree that the object persistent layer can be hidden 
away behind something like the Java object persistence model, 
automatically doing alter table or providing a configured mapping from a 
description file. This isn't a problem that needs to be solved at the 
database layer.


Cheers,
mark

--
Mark Mielke [EMAIL PROTECTED]



Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread pgsql
 [EMAIL PROTECTED] writes:
 Being able to insert arbitrary named values, and extracting them
 similarly, IMHO works better and more naturally than some external
 aggregate system built on a column. I know it is a little outside the
 box thinking, what do you think?

 I'm failing to see the point.  Allowing columns to spring into existence
 without any forethought seems to me to be all minuses and no pluses
 worth mentioning.

 * What if the column name is just a typo?

In an automated system like PHP, Java, etc. that's not too likely.


 * What datatype should it have?  (Always varchar is just lame.)

varchar or text is not just lame, SQLite used to do that exclusively.
One could argue that XML is nothing more than text.


 * Should it have an index?  If so, should it be unique?

The answer to that is, well, no, not unless the dba generates one or it is
declared. Just like any other column. All the rules that apply to create
table and alter table add column just apply naturally as would be
expected.

create adhoc table userdata(username varchar, email varchar, primary
key(email));


 * If you keep doing this, you'll soon find yourself reading out
 unbelievably wide tables (lots of columns), which won't be especially
 easy or efficient to process on either the backend or the client side.
 Plus you might run into the max-columns-per-tuple limit.

Well, I fully understand that it is not a general purpose unlimited
width sort of thing. In a programing environment, the target environment
for this type of feature, it is unlikely to be a run-away problem.


 If you've expended enough thought to be sure that the column is not just
 a typo, ISTM that you can afford to enter an ALTER TABLE ADD COLUMN
 command to tell the database the results of your genius.

Like I said, if you've never dealt with a live web site, maintained by a
team of web dudes, working furiously to keep their job and get paid,
your only hope to keep up with Oh! I needed to add the 'time to live' of
the session into the session data is to use an aggregate storage system.


 I do see the point that switching from member of an hstore column to
 real database column is pretty painful, but I don't see that allow
 columns to spring into existence solves that in any meaningful way.
 Is there some other way we could address such conversions?

Every other solution creates a second tier of data storage. You either
deal with data elements at the table level, or you create a roll your
own aggregate mechanism, or make a HUGE table of user,name,value table
and force a join and index scan for every select.  (A million users, 5-10
attributes each is an expensive join.)

 BTW, I think it is (or should be) possible to create an index on
 hstore-'mycol', so at least one of the reasons why you should *need*
 to switch to a real database column seems bogus.

Oh, yea, function indexes work great. I think you did that right?

For what its worth, I don't expect you to jump all over this. It really is
a divergence from classic SQL design. I'm not even sure I like it. In
fact, I don't like it, but the argument that you are being forced to
create a second class data storage mechanism or a relational join for data
that is logically in a single relation does cause one to ponder the
problem.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread David E. Wheeler

On Sep 28, 2008, at 17:46, Tom Lane wrote:


BTW, I think it is (or should be) possible to create an index on
hstore-'mycol', so at least one of the reasons why you should *need*
to switch to a real database column seems bogus.


The docs say:

  titleIndexes/title

  para
   typehstore/ has index support for literal@gt;/ and  
literal?/
   operators.  You can use either GiST or GIN index types.  For  
example:

  /para
  programlisting
CREATE INDEX hidx ON testhstore USING GIST(h);

CREATE INDEX hidx ON testhstore USING GIN(h);
  /programlisting

I'm not sure what that means. Can you create normal btree or hash  
indexes on hstore columns? And is the index useful for both `@` and `? 
`?


Thanks,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Ad-hoc table type?

2008-09-28 Thread Oleg Bartunov

What you're talking about is a document based database like
StrokeDB, CouchDB. With hstore you don't need to parse content of 
'aggregate' column, it provides necessary methods. Also, we tried 
to speedup selects using indexes. Probably, we need to refresh our 
interest to hstore, do you have any actual proposals ?


Oleg

On Sun, 28 Sep 2008, [EMAIL PROTECTED] wrote:


[EMAIL PROTECTED] writes:

Something like this:



create adhoc table foo ();



insert into foo (name, rank, serial) values ('joe', 'sargent', '42');



In an ad-hoc table type, when an insert is made, and a column is not
found, then a new varchar column is added.



I know the idea has a lot of holes, and is probably a bad idea, but it
answers an important problem of easily mapping programmatic types to a
database.


Seems like a table with one contrib/hstore column might be more relevant
to this guy's idea of how to do database design.



That's actually a very cool module, I hadn't seen it before. I've
considered writing something like it, but more XML centric, but I'm not
sure it answers the concept.

I'm not sure if you have dealt with web site sessions and object
persistence crap, but its a pain to get up and running and improving
performance is a drag. Web guys tend to know very little about databases
and tend, sadly, not to be very inquisitive about such things.

Web session and user attribute objects are typically stored in a database
as XML, JSON, or some other aggregated format in a single column (hstore).
That works great for when you just need to access the data by the key, but
if you want to use the data outside the web application for something
like OLAP, you have to decide which attributes reside in the aggregate
column or get promoted to a full fledged column. That's why you'll see
tables with username, passwdhash, email, etc. in addition to an aggregated
column of things like screen template, age, etc.

So, how do you have a table of a generally arbitrary number of columns
without creating some sort of aggregate column?  With an aggregate column,
the data isn't on the same level as real column data, so you need to parse
the aggregate to extract a value, and you have to do that for each value.
On top of that, you then have to explain your aggregate strategy to the
web guys.

Being able to insert arbitrary named values, and extracting them
similarly, IMHO works better and more naturally than some external
aggregate system built on a column. I know it is a little outside the
box thinking, what do you think?




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers