We have a sql server database which can have something like 50 meg of 
records of any one type. The way that the database was originally designed 
with activerecord/sql is that every time you get a couple of million 
records or so, it creates a new shard or a new table. 
In effect for one record type we might have 20 or 30 different tables (or 
shards) that all have the same schema. When you need to look up
a particular record, you get it's sched field and see which shard (tables) 
it is in and query those shards. This also involves a manual step
in that someone has to monitor the system and go to the web server and 
click on "create new shard" whenever there are too many records in
the currently active shard which is where newly created records go. 


 We are looking at a new approach that involves clustered indexes or non 
clustered indexes built on some of the fields we typically 
query on. It then occurred to me however that if we created a parent record 
for each sched and had the records use a foriegn key to that
parent, perhaps we could get similar performance for our data and we could 
not have to worry about the shards ? 

 This seems like more of a rails activerecord approach and I am very 
familiar with that way of doing things, but somewhat less familiar with 
larger data sets such as what I am describing, though when I thought about 
it, I do recall that there are large data sets used by some 
companies, though I forget if there was other strategies such as caching 
and such used as well. 

 We currently have 4 different sets of sharded record types. One of them 
looks something like below for where clauses and group/order.
Where I have a * it indicates that in the Activerecord find(:all) call it 
may or may not be passed, that is what is in the conditions part can vary. 
Such a lookup can return many thousands of records all having the same 
sched, script etc. 


 where clause: sched, *script

 group/order: message_id, timeslice, script, label, scale, scaled_units, 
*type


 My first pass was to create a clustered indexd on sched, script, and 
label. However I am wondering if there are easier approaches such as I 
indicated above that might make more sense ? 




-- 
You received this message because you are subscribed to the Google Groups "Ruby 
on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msg/rubyonrails-talk/-/CGYEOJIPS5YJ.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to