Re: [PHP-DB] Building indexes

2010-03-20 Thread listread

This all helps.   I think I need to spend some time experimenting.

Thanks for your help!

- Ron

On 3/19/2010 3:56 PM, Bastien Koert wrote:

[snip]
   

Here's an example of a tough query (there are lots of Smiths!).

SELECT voter1.County, voter1.`First Name`, voter1.`Middle Name`,
voter1.`Last Name`, voter1.`Residential ZipCode`,voter1.`House Number`,
voter1.`Mailing Address`, voter1.`Street Name`, voter1.`Residential City`,
voter1.`Voter ID`, voter1.`Congressional Distri`,voter1.idVoter,
voter1.County FROM voter1 WHERE voter1.`County` LIKE '%' AND voter1.`Last
Name` LIKE 'Smith%' AND voter1.`First Name` LIKE 'John%' AND
voter1.`Residential ZipCode` LIKE '%' LIMIT 0, 10;

Do you think the LIKE '%' hurts for the fields we don't have search
criteria?  We do that to keep the query definition simple and flexible.
 

[/snip]

Yes, those hurt and will cause you endless grief. A much better course
of action is to build the SQL dynamically to only query on the fields
where you have values to start with. If I alter your example

SELECT voter1.County, voter1.`First Name`, voter1.`Middle Name`,
  voter1.`Last Name`, voter1.`Residential ZipCode`,voter1.`House Number`,
  voter1.`Mailing Address`, voter1.`Street Name`, voter1.`Residential City`,
  voter1.`Voter ID`, voter1.`Congressional Distri`,voter1.idVoter,
  voter1.County FROM voter1
WHERE
voter1.`Last Name` LIKE 'Smith%' AND voter1.`First Name` LIKE 'John%'
LIMIT 0, 10;

This produces a much cleaner SQL and simpler query for the DB to use.

Also, I tend to prefer not using spaces in the field names. I prefer
to have an underscore to avoid any issues if I ever have to move
databases


   



--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] Building indexes

2010-03-19 Thread listread

On 3/18/2010 4:59 PM, Chris wrote:
 What do your queries end up looking like?

Here's an example of a tough query (there are lots of Smiths!).

SELECT voter1.County, voter1.`First Name`, voter1.`Middle Name`, 
voter1.`Last Name`, voter1.`Residential ZipCode`,voter1.`House Number`, 
voter1.`Mailing Address`, voter1.`Street Name`, voter1.`Residential 
City`, voter1.`Voter ID`, voter1.`Congressional Distri`,voter1.idVoter, 
voter1.County FROM voter1 WHERE voter1.`County` LIKE '%' AND 
voter1.`Last Name` LIKE 'Smith%' AND voter1.`First Name` LIKE 'John%' 
AND voter1.`Residential ZipCode` LIKE '%' LIMIT 0, 10;


Do you think the LIKE '%' hurts for the fields we don't have search 
criteria?  We do that to keep the query definition simple and flexible.


More upfront db prep can be justified since we have to do two or three 
hundred thousand queries in the course of validating all the signatures 
on a petition.


Thanks!

- Ron


listread wrote:

Chris,

I just assumed that everyone on this list was using MySQL...  That's 
what we're using (v. 5.1.45 GA) with InnoDB as the engine.


Most people are but there are lots of types of databases out there :)

(I just read your tutorial at 
http://www.designmagick.com/article/16/  It was very helpful - I look 
forward to checking out more of your articles.)


If it is just as efficient to use multiple separate indexes, that 
would make index building less complicated on our large db.


It is, though wildcard searches can't always use indexes.

If you do

field like 'abcdef%';

then an index can potentially be used because the db (mysql or 
otherwise) can look at the start of the string to see if it matches. 
The longer the string the more likely an index can be used (eg doing 
field like 'a%' probably won't use an index, it'll end up being 
quicker to scan the actual data).


If you do

field like '%abcdef%';

then an index can't be used since abcdef could appear anywhere in the 
string.


Without a large dataset, it hard to truly test a system and if you 
have a large dataset, like we do, it takes quite a while to build 
indexes.


Definitely, it's the best way to test and also the hardest since 
rebuilding the db takes so long.


Our project is a petition signature validation suite.  Since many of 
the handwritten names and addresses on petition sheets are difficult 
to read, the user needs to be able to do some fuzzy searching.   
Sometimes it's easier to read the address than it is the name.   The 
zip code is usually easy to read.  We almost always need to use LIKE 
queries, since some part of the name or address is typically hard to 
read.  (We try to use as many of the leading characters as we can and 
wildcard the remaining.)


I'd suggest fulltext but that won't work with innodb, only myisam. You 
could do something like keep the addresses and names in a separate 
myisam table just for searching, though that means a whole new import 
process and also means you'd end up having to do two queries (maybe a 
subquery or join, you'd have to test) - one do to full text search and 
one to get the rest of the data based on the result of the first.


What do your queries end up looking like?




--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] Building indexes

2010-03-19 Thread Bastien Koert
[snip]
 Here's an example of a tough query (there are lots of Smiths!).

 SELECT voter1.County, voter1.`First Name`, voter1.`Middle Name`,
 voter1.`Last Name`, voter1.`Residential ZipCode`,voter1.`House Number`,
 voter1.`Mailing Address`, voter1.`Street Name`, voter1.`Residential City`,
 voter1.`Voter ID`, voter1.`Congressional Distri`,voter1.idVoter,
 voter1.County FROM voter1 WHERE voter1.`County` LIKE '%' AND voter1.`Last
 Name` LIKE 'Smith%' AND voter1.`First Name` LIKE 'John%' AND
 voter1.`Residential ZipCode` LIKE '%' LIMIT 0, 10;

 Do you think the LIKE '%' hurts for the fields we don't have search
 criteria?  We do that to keep the query definition simple and flexible.
[/snip]

Yes, those hurt and will cause you endless grief. A much better course
of action is to build the SQL dynamically to only query on the fields
where you have values to start with. If I alter your example

SELECT voter1.County, voter1.`First Name`, voter1.`Middle Name`,
 voter1.`Last Name`, voter1.`Residential ZipCode`,voter1.`House Number`,
 voter1.`Mailing Address`, voter1.`Street Name`, voter1.`Residential City`,
 voter1.`Voter ID`, voter1.`Congressional Distri`,voter1.idVoter,
 voter1.County FROM voter1
WHERE
voter1.`Last Name` LIKE 'Smith%' AND voter1.`First Name` LIKE 'John%'
LIMIT 0, 10;

This produces a much cleaner SQL and simpler query for the DB to use.

Also, I tend to prefer not using spaces in the field names. I prefer
to have an underscore to avoid any issues if I ever have to move
databases


-- 

Bastien

Cat, the other other white meat

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] Building indexes

2010-03-18 Thread listread

Chris,

I just assumed that everyone on this list was using MySQL...  That's 
what we're using (v. 5.1.45 GA) with InnoDB as the engine.


(I just read your tutorial at http://www.designmagick.com/article/16/  
It was very helpful - I look forward to checking out more of your articles.)


If it is just as efficient to use multiple separate indexes, that would 
make index building less complicated on our large db.


Without a large dataset, it hard to truly test a system and if you have 
a large dataset, like we do, it takes quite a while to build indexes.


Our project is a petition signature validation suite.  Since many of the 
handwritten names and addresses on petition sheets are difficult to 
read, the user needs to be able to do some fuzzy searching.   Sometimes 
it's easier to read the address than it is the name.   The zip code is 
usually easy to read.  We almost always need to use LIKE queries, since 
some part of the name or address is typically hard to read.  (We try to 
use as many of the leading characters as we can and wildcard the remaining.)


For this part of the project joins are not needed.   We could eliminate 
most of the 55 columns, which would reduce the size of the db, but I 
don't know what that would do to speed...


Does any of this affect the approach you would suggest?

Thanks!!

- Ron




On 3/17/2010 11:56 PM, Chris wrote:

listread wrote:

Greetings!

We are working on a 4 million record db with about 55 fields.  We 
need to build indexes to speed queries, but it seems to take way too 
long.


I'm assuming your using mysql but that's just a guess.

What database? (mysql, oracle, postgres, firebird, sqlite, other)
If it's mysql, what db engine?


Here are some questions:

1) If a query uses two or three fields is it best to prepare an index 
with those two or three fields, or will the query be just as 
efficient if we build three separate index, one for each field?


Relates to above, but assuming mysql then it depends on what version.

multicolumn indexes work well in all mysql versions. separate indexes 
can be used and combined in newer versions (v5 I think it came in).


2) Is the index building process any slower, per index, if you are 
building more than one index at a time?


Most indexing I think takes an exclusive lock on the table, so index 1 
gets created, index 2 has to wait it's turn until index 1 finishes.


I wrote an article on this a while ago, might give you some more 
insights: http://www.designmagick.com/article/16/





--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] Building indexes

2010-03-18 Thread Chris

listread wrote:

Chris,

I just assumed that everyone on this list was using MySQL...  That's 
what we're using (v. 5.1.45 GA) with InnoDB as the engine.


Most people are but there are lots of types of databases out there :)

(I just read your tutorial at http://www.designmagick.com/article/16/  
It was very helpful - I look forward to checking out more of your 
articles.)


If it is just as efficient to use multiple separate indexes, that would 
make index building less complicated on our large db.


It is, though wildcard searches can't always use indexes.

If you do

field like 'abcdef%';

then an index can potentially be used because the db (mysql or 
otherwise) can look at the start of the string to see if it matches. The 
longer the string the more likely an index can be used (eg doing field 
like 'a%' probably won't use an index, it'll end up being quicker to 
scan the actual data).


If you do

field like '%abcdef%';

then an index can't be used since abcdef could appear anywhere in the 
string.


Without a large dataset, it hard to truly test a system and if you have 
a large dataset, like we do, it takes quite a while to build indexes.


Definitely, it's the best way to test and also the hardest since 
rebuilding the db takes so long.


Our project is a petition signature validation suite.  Since many of the 
handwritten names and addresses on petition sheets are difficult to 
read, the user needs to be able to do some fuzzy searching.   Sometimes 
it's easier to read the address than it is the name.   The zip code is 
usually easy to read.  We almost always need to use LIKE queries, since 
some part of the name or address is typically hard to read.  (We try to 
use as many of the leading characters as we can and wildcard the 
remaining.)


I'd suggest fulltext but that won't work with innodb, only myisam. You 
could do something like keep the addresses and names in a separate 
myisam table just for searching, though that means a whole new import 
process and also means you'd end up having to do two queries (maybe a 
subquery or join, you'd have to test) - one do to full text search and 
one to get the rest of the data based on the result of the first.


What do your queries end up looking like?

--
Postgresql  php tutorials
http://www.designmagick.com/


--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP-DB] Building indexes

2010-03-17 Thread listread

Greetings!

We are working on a 4 million record db with about 55 fields.  We need 
to build indexes to speed queries, but it seems to take way too long.  
Here are some questions:


1) If a query uses two or three fields is it best to prepare an index 
with those two or three fields, or will the query be just as efficient 
if we build three separate index, one for each field?
2) Is the index building process any slower, per index, if you are 
building more than one index at a time?


Does anyone have any suggestions about building indexes?

Thanks!

- Ron

--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP-DB] Building indexes

2010-03-17 Thread Chris

listread wrote:

Greetings!

We are working on a 4 million record db with about 55 fields.  We need 
to build indexes to speed queries, but it seems to take way too long.


I'm assuming your using mysql but that's just a guess.

What database? (mysql, oracle, postgres, firebird, sqlite, other)
If it's mysql, what db engine?


Here are some questions:

1) If a query uses two or three fields is it best to prepare an index 
with those two or three fields, or will the query be just as efficient 
if we build three separate index, one for each field?


Relates to above, but assuming mysql then it depends on what version.

multicolumn indexes work well in all mysql versions. separate indexes 
can be used and combined in newer versions (v5 I think it came in).


2) Is the index building process any slower, per index, if you are 
building more than one index at a time?


Most indexing I think takes an exclusive lock on the table, so index 1 
gets created, index 2 has to wait it's turn until index 1 finishes.


I wrote an article on this a while ago, might give you some more 
insights: http://www.designmagick.com/article/16/


--
Postgresql  php tutorials
http://www.designmagick.com/


--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php