Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

Domas Mituzas Tue, 24 Nov 2009 21:40:03 -0800

Hi,
> That's just not at all true, not for PostgreSQL at least.  Say you
> have 100 million records in the page table, of which 20 million are
> is_redirect=1 and 80 million are is_redirect=0.  Say the average size
> of a record is 100 bytes, so on average 80 records fit in one page.
> The table is not clustered, or it is clustered on something other than
> is_redirect.  If you run select * from page where is_redirect=1 from a
> table which is not partitioned, you have to access pretty much all
> 1.25 million pages.  If you partition the table on is_redirect, you
> only have to access 250,000 pages.


But.... who will partition based on is_redirect? If it is for one-off  
task, you can just create two separate tables and do 'manual  
partitioning' even in sqlite :)
Even though your is_redirect queries may become faster, you just added  
*2 cost for every other index operation (as partitions require you to  
loop over all indexes for all the lookups not satisfied by  
partitioning key).

> I have no idea how this works on MySQL, or if it works on MySQL at
> all.  In MySQL, you could achieve the same thing through clustering,
> however.

Right, indexing can be used to achieve the result, without making  
other selects slower (kind of). Thats what indexing is for :)

> http://dev.mysql.com/doc/refman/5.1/en/partitioning-pruning.html
>
> "When the optimizer can make use of partition pruning in performing a
> query, execution of the query can be an order of magnitude faster than
> the same query against a nonpartitioned table containing the same
> column definitions and data. "

This is obvious, dropping a partition is faster than DELETE that has  
to go and maintain the index. I'm not talking about DELETE operations,  
but SELECTs, you seem to fail at reading there :-)

> a=# insert into category values ('Apple');
> ERROR:  duplicate key value violates unique constraint "category_pkey"

And how do native language collations work? (and since which version  
are they supported per-database? are they supported per-schema? ;-)
I got somewhat incorrect results once I used lt_LT.UTF-8 for my  
'initdb' - and default collation was providing incorrect order too, as  
well as unique constraints were not enforcing dictionary-order rules.

postgres=# create database xx encoding 'utf8';
CREATE DATABASE
postgres=# \c xx;
You are now connected to database "xx".
xx=# create table t1 (a varchar(255) primary key);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index  
"t1_pkey" for table "t1"
CREATE TABLE
xx=# insert into t1 values ('a');
INSERT 0 1
xx=# insert into t1 values ('b');
INSERT 0 1
xx=# insert into t1 values ('ą');
INSERT 0 1
xx=# select * from t1 order by a;
  a
---
  a
  b
  ą
(3 rows)

You guys seem to talk about stuff you never used and never really  
understood. Good for you, probably much easier that way.

Domas
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Size of DB/table of enwiki after import into MySQL

Reply via email to