mr. maas,
psychic me ........ you are a man.
i created an index, no change. but i already knew that because one of
the cases where an index is never used is when
> The key used to fetch the rows is not the same as the one used in the
> > ORDER BY:
>
> which is the case here.
(index creation stuff follows)
i'm just going to bite the bullet for a few months and do a massive code
rehaul. move all the matching code preferably to the sql-server. or
pre-compiled C code for the php [ make an extension ]. i think the mb_
functions in php are very slow ( reportedly ) ... so move those
somewhere else is a good idea. maybe the 'select ... like' clauses to
the sql-server
maybe the improvements you suggest will make a difference, maybe not,
but when i think about how fast mysql runs its
mysqlimport --local --fields-terminated-by='*X*X' dict_explicit
french_english french_english
compared with
mysql < a.sql (bunches of insert statements)
that's the kind of improvement i am hoping for. more than 500 queeries
a second i am getting now, i need to really really improve that if i am
going to parse html on the fly with the technology i have.
mysql> describe korean_english;
+-----------+---------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default |
Extra |
+-----------+---------------------+------+-----+---------------------+----------------+
| wordid | int(11) | | PRI | NULL |
auto_increment |
| word | varchar(130) | YES | MUL | NULL |
|
| syn | varchar(190) | YES | | NULL |
|
| def | blob | YES | | NULL |
|
| posn | int(2) | YES | | 1 |
|
| pos | varchar(13) | YES | | 1 |
|
| submitter | varchar(25) | YES | | NULL |
|
| doe | datetime | | | 0000-00-00 00:00:00 |
|
| wordsize | tinyint(3) unsigned | YES | | NULL |
|
+-----------+---------------------+------+-----+---------------------+----------------+
9 rows in set (0.00 sec)
mysql> explain select
-> word,def,wordid,pos,posn,wordsize,syn from korean_english where
word
-> like '운전할 %' order by wordsize desc;
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
| 1 | SIMPLE | korean_english | range | word_idx | word_idx |
391 | NULL | 1 | Using where; Using filesort |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
1 row in set (0.01 sec)
mysql> CREATE INDEX wordsize_index USING BTREE ON
korean_english(wordsize);
Query OK, 205265 rows affected (11.16 sec)
Records: 205265 Duplicates: 0 Warnings: 0
mysql> explain select word,def,wordid,pos,posn,wordsize,syn from
korean_english where word like '운전할 %' order by wordsize desc;e);
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
| 1 | SIMPLE | korean_english | range | word_idx | word_idx |
391 | NULL | 1 | Using where; Using filesort |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)
mysql>
2006-03-13 (월), 14:04 +0100, Jochem Maas 쓰시길:
> ...
>
> >>>word,def,wordid,pos,posn,wordsize,syn from korean_english where word
> >>>like '운전할 %' order by wordsize desc
> >>
> >>oh would you look at this....
> >>you're ordering by WORDSIZE.
> >>stick an index on WORDSIZE!!!
> >
> >
> >
> > http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
> >
> >
> > In some cases, MySQL cannot use indexes to resolve the ORDER BY,
>
> note 'In Some Cases'.
>
> so stick an index on WORDSIZE and find out.
>
> > although it still uses indexes to find the rows that match the WHERE
> > clause. These cases include the following:
> >
> > The key used to fetch the rows is not the same as the one used in the
> > ORDER BY:
> >
> > SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
> >
> > 2)
> >
> >
> >>you have an iceballs' chance in hell that I'm going to even read
> >>the 500+ lines of code that followed here ... let alone try to
> >>optimize it. ;-)
> >>
> >
> >
> > it was just a 40 line summary of 550 lines of code ........
>
> ah, talk about being caught out :-)
> regardless a 40line summary won't cut it either - you have to take
> the block as a whole.
>
> >
> >
> >
> > questions i may deem myself to answer:
> >
> >>>) i get this:
> >>>[EMAIL PROTECTED] mysql]# tail -f /var/lib/mysql/mysqld_query.log > out
> >>>
> >>>[EMAIL PROTECTED] mysql]# cat out | wc -l
> >>>15910 <<< ----------- that's line count
> >>
> >>how many queries?
> >
> > 15, 910 queries
>
> so roughly (15 * 60) seconds to run 15,000+
> queries and do the processing? that doesn't actually
> sound so bad.
>
> >
> >
> >
> > things already taken care of:
> > 1)
> >
> >>> 9795 Query select
> >>>word,def,wordid,pos,posn,wordsize,syn from korean_english where word
> >>>like '운전할' order by wordsize desc
> >>
> >>in cases when you are not using the wildcard tokens (percentage signs)
> >>try changing the query to use something like:
> >>
> >> ... word = '운전할' ...
> >
> >
> > your suggestion and a line from the query_log match exactly.
> >
> >
> > 2)
> >
> >><snip>
> >>
> >>>then, it sends each token to CallmatchThis (line 14) which calls
> >>>matchThis (line 27 - 47 below)
> >>>matchThis may be called twice (2 sql queeries)
> >>>(line 51) select * where word = '$token' and another (take that, but if
> >>>it's not there .... issue the next sql )
> >>>(line 55) select * where word like '$token%';
> >>
> >>Dont do "SELECT *" - always explicitly specify the fields you want.
> >>
> >
> > that was just a paraphrase. the previous email points to the line number
> > of the code summary
>
> never paraphrase code - you only end up with smart ass comments like mine!
>
> have you reordered you fields in the db yet? adn made as many VARCHARs as
> possible into CHARs?
>
> >
> >
> >>
> >>
> >>>------------------------------------------+
> >>>| korean_english | CREATE TABLE `korean_english` (
> >>> `wordid` int(11) NOT NULL auto_increment,
> >>> `word` varchar(130) default NULL,
> >>> `syn` varchar(190) default NULL,
> >>> `def` blob,
> >>> `posn` int(2) default '1',
> >>> `pos` varchar(13) default '1',
> >>> `submitter` varchar(25) default NULL,
> >>> `doe` datetime NOT NULL default '0000-00-00 00:00:00',
> >>> `wordsize` tinyint(3) unsigned default NULL,
> >>> PRIMARY KEY (`wordid`),
> >>> KEY `word_idx` (`word`),
> >>> KEY `wordid_idx` (`wordid`)
> >>>) ENGINE=MyISAM DEFAULT CHARSET=utf8 |
> >>
> >>reorder the fields so that the VARCHARS are at the end of the
> >>table (and the BLOB field at the very, very end of the table).
> >>
> >>also change VARCHARs to CHARs where you can.
> >>
> >>
> >>>+----------------+-----------------------------------------------------------------------------------------------------------------------------------------
> >>>-----------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>---------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> 1 function MainLoop()
> >
> >
>
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php