Re: [web2py] Re: tags and speed

Niphlod Thu, 18 Oct 2012 16:21:41 -0700

old mean bad thing about normalized vs denormalized model.
Don't know what you read in the cookbok, but here's the thing.
You have a game pacman that is tagged as arcade, and a game invaders tagged 
as horror


First things first: you may want to change "horror" to "needs parent 
around" later in the future. That's why you may want to create an external 
tags table and reference it in your games one.
tags table
1 arcade
2 horror

game_name tags
pacman   1
invaders  2

To display
pacman arcade
invaders horror
you just need a simple join. Changing record 2 of the tags table allows you 
to have
pacman arcade
invaders needs parents around

Next problem on the line: you want multiple tags for a single game 
(invaders needs both arcade and horror tags). 
Welcome to normalization hell. Books have been written about it in the last 
60-70 years, with terms like 3NF and Cardinality all around ^_^

Let's take this by examples

1) you change the tags table like this
id tag_name game_id
1 arcade 1
1 arcade 2
2 horror 2
getting tags for a single game is fast, change "horror" to "needs parents 
around" is fast, getting all possible tags is enough fast (depending on the 
number of rows of the tags table)
but
fetch the right tags for every game can be slow

2) you create a games table like this 
id game_name tags
1 pacman |arcade|
2 invaders |arcade|horror|
getting tags for a single game is fast, updating a tag for a game is quite 
fast
but
changing all "horror" to "needs parents around" can be fast only if done 
outside web2py with a manual replace, getting all possible tags can be slow

3) you create a games table like this
id game_name tags_id
1 pacman |1|
2 invaders |1|2|
getting tags for a single game is fast, updating a tag for a game is quite 
fast, changing "horrors" is fast
but
changing all "horror" to "needs parents around" can be fast only if done 
outside web2py with a manual replace, getting all possible tags can be slow

So, with no need to change tag names from "horror" to "needs parents 
around", I'd say the right way for displaying your table is 2) 
(list:string), else 3) (list:reference). 
If you need to "suggest" previously entered tags (i.e. you don't want to 
end having "arcade", "arcady", "arcadian", "'rcade" spat all around), I'd 
still go for 2) + one separate table (or a cached list) holding all 
previously entered tags, just to speedup the "suggestion" phase (hoping 
users won't screw up).
 
Method 1) is faster only when:
- tags are more than the games
- "cleaning normalization" logics are heavy
- you need to update tag names often
- the only need is displaying tags for a single game (direct query on a 
single table, the tags one)
- you need to compute something like "how many times the arcade tag is 
applied to all my games" (i.e. for a tag cloud)

There are several other methods, and all of them in the end "do the job". 
You just need to see what are your requirements and choose carefully. Your 
agenda is:
- spend the less time possible with queries you have to invoke often
- "pay the price" of your model in high computations for other things (or, 
let's save the results of those computations externally once every hour)
 


--

Re: [web2py] Re: tags and speed

Reply via email to