-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Feel free to redirect me to another list. I am only posting/writing  
this here because y'all have been so helpful in the past, and the  
guys that write Rose::DB are definitely database experts in my  
opinion. Plus this is the only database list I am on.

I am a sysadmin who is good at perl, and then I got handed the role  
of developer. I got a lot better at perl, but then I got handed the  
role of DBA. I'm trying to be good at that. This project is a few  
orders of magnitude over my head, but I like the challenge...and  
being employed.

I just have a quick question about database design.

Based on the praise for postgresql from certain members of this  
mailing list, my reading the postgre manual/(relevant parts of) Korry  
Douglas' book, and my blind hope that postgre handles HUGE tables  
better than mysql appears to, I am testing a switch of my product to  
postgresql. Being a perl programmer, the way I think is a little bit  
weird, and postgre's support for array's attracts me very much; I  
want to use them for the sake of using them, but I'm not sure its the  
best option for what I'm trying to do.

Disclaimer: this is not a Rose::DB related question (well, there is  
one included below somewhere). Again, tell me to F-O if this is the  
inappropriate place to ask...

So here's the scoop (slimmed down for relevancy):

table 1 'stuff':
        data TEXT,
        accounts INTEGER[]
        date_added TIMESTAMP

table 2 'accounts':
        id SERIAL PRIMARY KEY
        name    VARCHAR,
        email BLAH
        regex BLAH
        etc. BLAH

if it were legal, stuff.accounts would REFERENCE accounts.id, but  
apparently i cannot have an array of foreign keys. so it goes.

raw data is parsed (by a perl script, duh) into the 'stuff' table.  
when it looks like something that a certain account would be  
interested in (based on accounts.regex), then their id (accounts.id)  
is included in the array for for that row (stuff.accounts). make sense?

the stuff table is huge. the actual requirements are "infinite rows  
spanning an infinite amount of disk space," but i'm working with my  
manager to limit those requirements just a bit :-) zero or more  
clients will be interested in a row from the 'stuff' table, which is  
why i am using an array. the whole point is for indexing. i index  
that array, i can get quickly get rows back based on accounts.id's.  
the way i am thinking, the queries to find 'stuff' rows that interest  
an account will be:

SELECT * FROM stuff WHERE (1 = ANY(account))

where accounts.id = 1 for that particular account.

now, there will be hundreds of this account. is that a good way to go?

here's what i was doing before.

stuff.accounts was a TEXT field that was comprised of accounts.name's  
enclosed in %-signs. i don't know much, but i think for some magical  
reason the %-signs helped the mysql regex engine (there are no %  
signs in account.name) anchor its search. so for an account with an  
accounts.name of 'blah', the query was:

SELECT * FROM stuff WHERE (accounts IS NOT NULL AND accounts REGEXP  
'blah')

and stuff.accounts might have looked like: '%blah%some_other_name% 
whatever%'.

if this makes any sense and i don't sound like a crazy person who  
should not be administering a database, my question for you guys  
is... is the (1 = ANY(account)) call efficient? I am dealing with  
tens of millions of rows here in the stuff table, and I need to get  
data out of them _fast_. a lot of the rows will have no value for the  
stuff.accounts array field.

which way seems better, the one where i am doing regular expression  
matching based on a text field, or the one in which i am essentially  
"grepping" through an array for an ID number that may or may not be  
there?

- - -Neal

p.s. obligatory rose::db question so i don't feel like a jerk:

given then accounts INTEGER[] way of doing this, will rose db do this:

@accounts = $stuff->accounts;

and convert the ARRAY[id,id,id] into a perl array?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (Darwin)

iD8DBQFFrctsecZZ/XfYgxIRAjrBAJ9kd24r3cIL4pfg74HxiC6BJQpCUQCfQnKX
CJU78TeC1/gsUePNpyjU1ik=
=556M
-----END PGP SIGNATURE-----

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Rose-db-object mailing list
Rose-db-object@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/rose-db-object

Reply via email to