Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Lupus Michaelis

Ashley Sheridan a écrit :

What's wrong with using the wildcards that are built into most SQL
variants?


  Performance issues. Like is an operator to avoid when possible.

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Ashley Sheridan
On Fri, 2009-09-04 at 17:00 +0200, Lupus Michaelis wrote:
 Ashley Sheridan a écrit :
  What's wrong with using the wildcards that are built into most SQL
  variants?
 
Performance issues. Like is an operator to avoid when possible.
 
 -- 
 Mickaël Wolff aka Lupus Michaelis
 http://lupusmic.org
 
You'll have far greater performance issues if you retrieve all those
records and attempt to do the same thing inside of PHP...

Thanks,
Ash
http://www.ashleysheridan.co.uk




--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Lupus Michaelis

Ashley Sheridan a écrit :


You'll have far greater performance issues if you retrieve all those
records and attempt to do the same thing inside of PHP...


  It's why I speak about « avoiding » and not « bannishing ». Like can 
be usefull, I used to use it. But it is not the a good answer to all 
problems. The problem with like operator is it can't use the index (or 
in a very limited way). So I try to warn about it.


  So said, I never submit an all-retrieving method. I know it isn't the 
solution too.


--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Eddie Drapkin
On Fri, Sep 4, 2009 at 12:02 PM, Lupus
Michaelismickael+...@lupusmic.org wrote:
 Ashley Sheridan a écrit :

 You'll have far greater performance issues if you retrieve all those
 records and attempt to do the same thing inside of PHP...

  It's why I speak about « avoiding » and not « bannishing ». Like can be
 usefull, I used to use it. But it is not the a good answer to all problems.
 The problem with like operator is it can't use the index (or in a very
 limited way). So I try to warn about it.

  So said, I never submit an all-retrieving method. I know it isn't the
 solution too.

 --
 Mickaël Wolff aka Lupus Michaelis
 http://lupusmic.org

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



So far, in this thread, there've been a few solutions:
1) LIKE in SQL.
2) REGEXP in SQL.
3) PCRE in PHP
4) Other fetch all methods in PHP.

The one thing that I'm seeing as a consistent agreement is that the
performance hit of whichever of the aforementioned measures is going
to be enough to be considering something else.   I briefly mentioned -
I apologize for the brevity of that email because I was in a hurry -
that a legitimate full text search engine is the right solution to
this problem.  The only problem with deploying a full text search
engine is going to be the difficulty in the deployment and perhaps
issues if you're on shared hosting (but then again I am of the opinion
that those who choose to run with shared hosting dig their own graves
in more ways than one).

What a full text search engine gives you is flexibility in your
searches, such that the initial question, when I read it, I thought
Oh, someone will tell him to use Sphinx or Solr as both have special
filters for word seperation and would handle this without any special
instruction.  Instead, this is never even brought up!

Why was using a full text search engine to do this sort of thing - not
to mention the other benefits that it would bring (responsiveness and
flexibility in searching, speed, decreased use of MySQL, etc. etc.) -
rejected so offhandedly?  I can't actually think of a better way to do
this without requiring a whole heap of overhead, either processing or
programming.

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Robert Cummings

Eddie Drapkin wrote:

On Fri, Sep 4, 2009 at 12:02 PM, Lupus
Michaelismickael+...@lupusmic.org wrote:

Ashley Sheridan a écrit :


You'll have far greater performance issues if you retrieve all those
records and attempt to do the same thing inside of PHP...

 It's why I speak about « avoiding » and not « bannishing ». Like can be
usefull, I used to use it. But it is not the a good answer to all problems.
The problem with like operator is it can't use the index (or in a very
limited way). So I try to warn about it.

 So said, I never submit an all-retrieving method. I know it isn't the
solution too.

--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php




So far, in this thread, there've been a few solutions:
1) LIKE in SQL.
2) REGEXP in SQL.
3) PCRE in PHP
4) Other fetch all methods in PHP.

The one thing that I'm seeing as a consistent agreement is that the
performance hit of whichever of the aforementioned measures is going
to be enough to be considering something else.   I briefly mentioned -
I apologize for the brevity of that email because I was in a hurry -
that a legitimate full text search engine is the right solution to
this problem.  The only problem with deploying a full text search
engine is going to be the difficulty in the deployment and perhaps
issues if you're on shared hosting (but then again I am of the opinion
that those who choose to run with shared hosting dig their own graves
in more ways than one).

What a full text search engine gives you is flexibility in your
searches, such that the initial question, when I read it, I thought
Oh, someone will tell him to use Sphinx or Solr as both have special
filters for word seperation and would handle this without any special
instruction.  Instead, this is never even brought up!

Why was using a full text search engine to do this sort of thing - not
to mention the other benefits that it would bring (responsiveness and
flexibility in searching, speed, decreased use of MySQL, etc. etc.) -
rejected so offhandedly?  I can't actually think of a better way to do
this without requiring a whole heap of overhead, either processing or
programming.


I've just sort of stopped in on this thread... but why isn't MySQL's 
FULLTEXT engine being considered?


Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Tommy Pham
- Original Message 
 From: Robert Cummings rob...@interjinn.com
 To: Eddie Drapkin oorza...@gmail.com
 Cc: Lupus Michaelis mickael+...@lupusmic.org; php-general@lists.php.net
 Sent: Friday, September 4, 2009 1:36:08 PM
 Subject: Re: [PHP] Searching on AlphaNumeric Content Only
 
 Eddie Drapkin wrote:
  On Fri, Sep 4, 2009 at 12:02 PM, Lupus
  Michaeliswrote:
  Ashley Sheridan a écrit :
 
  You'll have far greater performance issues if you retrieve all those
  records and attempt to do the same thing inside of PHP...
   It's why I speak about « avoiding » and not « bannishing ». Like can be
  usefull, I used to use it. But it is not the a good answer to all problems.
  The problem with like operator is it can't use the index (or in a very
  limited way). So I try to warn about it.
 
   So said, I never submit an all-retrieving method. I know it isn't the
  solution too.
 
  --
  Mickaël Wolff aka Lupus Michaelis
  http://lupusmic.org
 
  --
  PHP General Mailing List (http://www.php.net/)
  To unsubscribe, visit: http://www.php.net/unsub.php
 
 
  
  So far, in this thread, there've been a few solutions:
  1) LIKE in SQL.
  2) REGEXP in SQL.
  3) PCRE in PHP
  4) Other fetch all methods in PHP.
  
  The one thing that I'm seeing as a consistent agreement is that the
  performance hit of whichever of the aforementioned measures is going
  to be enough to be considering something else.   I briefly mentioned -
  I apologize for the brevity of that email because I was in a hurry -
  that a legitimate full text search engine is the right solution to
  this problem.  The only problem with deploying a full text search
  engine is going to be the difficulty in the deployment and perhaps
  issues if you're on shared hosting (but then again I am of the opinion
  that those who choose to run with shared hosting dig their own graves
  in more ways than one).
  
  What a full text search engine gives you is flexibility in your
  searches, such that the initial question, when I read it, I thought
  Oh, someone will tell him to use Sphinx or Solr as both have special
  filters for word seperation and would handle this without any special
  instruction.  Instead, this is never even brought up!
  
  Why was using a full text search engine to do this sort of thing - not
  to mention the other benefits that it would bring (responsiveness and
  flexibility in searching, speed, decreased use of MySQL, etc. etc.) -
  rejected so offhandedly?  I can't actually think of a better way to do
  this without requiring a whole heap of overhead, either processing or
  programming.
 
 I've just sort of stopped in on this thread... but why isn't MySQL's 
 FULLTEXT engine being considered?

If I remember correctly, it's only on available on MyISAM table type.  I don't
think the OP mentioned that he's using MyISAM or MySQL for that matter.

 
 Cheers,
 Rob.
 -- 
 http://www.interjinn.com
 Application and Templating Framework for PHP
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Paul M Foster
On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:

 On Fri, Sep 4, 2009 at 12:02 PM, Lupus
 Michaelismickael+...@lupusmic.org wrote:

 if you're on shared hosting (but then again I am of the opinion
 that those who choose to run with shared hosting dig their own graves
 in more ways than one).

Any time you or someone else would like to backstop me in setting up a
dedicated server on rackspace or somewhere else, for free or really
cheap, you let me know! Otherwise, those of us with less than complete
expertise in server setup are stuck with shared hosting. ;-}

Paul

-- 
Paul M. Foster

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Robert Cummings

Paul M Foster wrote:

On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:


On Fri, Sep 4, 2009 at 12:02 PM, Lupus
Michaelismickael+...@lupusmic.org wrote:



if you're on shared hosting (but then again I am of the opinion
that those who choose to run with shared hosting dig their own graves
in more ways than one).


Any time you or someone else would like to backstop me in setting up a
dedicated server on rackspace or somewhere else, for free or really
cheap, you let me know! Otherwise, those of us with less than complete
expertise in server setup are stuck with shared hosting. ;-}


For a few hundred bucks a year you can get a VPS (Virtual Private 
Server), which gives you root access. once you get a couple of clients 
on it, it will pay for itself year after year. If you have clients that 
don't use much resources, you can put a few on and even make a profit on 
the hosting alone.


Cheers,
Rob.
--
http://www.interjinn.com
Application and Templating Framework for PHP

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Paul M Foster
On Fri, Sep 04, 2009 at 08:15:41PM -0400, Robert Cummings wrote:

 Paul M Foster wrote:
 On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:

 On Fri, Sep 4, 2009 at 12:02 PM, Lupus
 Michaelismickael+...@lupusmic.org wrote:

 if you're on shared hosting (but then again I am of the opinion
 that those who choose to run with shared hosting dig their own graves
 in more ways than one).

 Any time you or someone else would like to backstop me in setting up a
 dedicated server on rackspace or somewhere else, for free or really
 cheap, you let me know! Otherwise, those of us with less than complete
 expertise in server setup are stuck with shared hosting. ;-}

 For a few hundred bucks a year you can get a VPS (Virtual Private
 Server), which gives you root access. once you get a couple of clients
 on it, it will pay for itself year after year. If you have clients that
 don't use much resources, you can put a few on and even make a profit on
 the hosting alone.

Oh sure. Well aware of it. The problem is not finding a VPS or working
with root access or anything like that. The problem is expertise. POP3,
SMTP, SSH, HTTP, DNS, firewall security while still allowing access to
outward facing servers, etc. It's more expertise than most people have,
including me. The servers I run are internal and don't have to deal with
the rigors of the internet, and only serve a couple of people. Setting
up multiple domains under an Apache server is black magic to me, for
instance. And then there's backups, and
what-do-I-do-if-the-server-fails, etc. Sheesh.

Paul

-- 
Paul M. Foster

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-04 Thread Tommy Pham
- Original Message 
 From: Paul M Foster pa...@quillandmouse.com
 To: php-general@lists.php.net
 Sent: Friday, September 4, 2009 9:15:08 PM
 Subject: Re: [PHP] Searching on AlphaNumeric Content Only
 
 On Fri, Sep 04, 2009 at 08:15:41PM -0400, Robert Cummings wrote:
 
  Paul M Foster wrote:
  On Fri, Sep 04, 2009 at 04:22:18PM -0400, Eddie Drapkin wrote:
 
  On Fri, Sep 4, 2009 at 12:02 PM, Lupus
  Michaeliswrote:
 
  if you're on shared hosting (but then again I am of the opinion
  that those who choose to run with shared hosting dig their own graves
  in more ways than one).
 
  Any time you or someone else would like to backstop me in setting up a
  dedicated server on rackspace or somewhere else, for free or really
  cheap, you let me know! Otherwise, those of us with less than complete
  expertise in server setup are stuck with shared hosting. ;-}
 
  For a few hundred bucks a year you can get a VPS (Virtual Private
  Server), which gives you root access. once you get a couple of clients
  on it, it will pay for itself year after year. If you have clients that
  don't use much resources, you can put a few on and even make a profit on
  the hosting alone.
 
 Oh sure. Well aware of it. The problem is not finding a VPS or working
 with root access or anything like that. The problem is expertise. POP3,
 SMTP, SSH, HTTP, DNS, firewall security while still allowing access to
 outward facing servers, etc. It's more expertise than most people have,
 including me. The servers I run are internal and don't have to deal with
 the rigors of the internet, and only serve a couple of people. Setting
 up multiple domains under an Apache server is black magic to me, for
 instance. And then there's backups, and
 what-do-I-do-if-the-server-fails, etc. Sheesh.

I stand corrected.  You paid for peace of mind :D


 
 Paul
 
 -- 
 Paul M. Foster
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ashley Sheridan
On Wed, 2009-09-02 at 21:30 -0700, Ben Dunlap wrote:
 Is there is a way to search only for the alphanumeric content of
  field in a db?  I have an itemID field that contains item #'s that include
  dashes, forward slashes, etc, and I want people to be able to search for an
  item # even if they don't enter the punctuation exactly.
 
 Not sure if there's anything specifically PHP-ish that will help you
 here, but I would be inclined to start by storing a stripped-down
 version of the item ID (alphanumeric characters only) in a separate
 column in the database table.
 
 Then, when a user enters some search data, I would remove
 non-alphanumeric characters, if any, from the user's input, and then
 search the stripped column with this normalized version of the input.
 
 If you want even fuzzier matching (inadvertent transpositions or an
 omitted character or two OK, for example), you might read about
 Levenshtein distance:
 
 http://en.wikipedia.org/wiki/Levenshtein_distance
 
 PHP has a levenshtein function but you'll have to figure out a way to
 use it efficiently with your data set. Or, if Levenshtein isn't quite
 right for your needs, the article above might at least point you in a
 useful direction.
 
 Ben
 

What's wrong with using the wildcards that are built into most SQL
variants?

SELECT * FROM table WHERE item_id LIKE '%#abcdef'

Will select all records where the item_id field ends in '#abcdef'

Thanks,
Ash
http://www.ashleysheridan.co.uk




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Andrea Giammarchi

Which DB?
If it is MySQL, as example, you can simply use REGEXP syntax ^[a-zA-Z0-9]+$ 
via SELECT

Regards

 From: sono...@fannullone.us
 To: php-general@lists.php.net
 Date: Wed, 2 Sep 2009 20:47:15 -0700
 Subject: [PHP] Searching on AlphaNumeric Content Only
 
   Is there is a way to search only for the alphanumeric content of  
 field in a db?  I have an itemID field that contains item #'s that  
 include dashes, forward slashes, etc, and I want people to be able to  
 search for an item # even if they don't enter the punctuation exactly.
 
   Here's an example:  let's say there is an itemID of 4D-2448-7PS but  
 someone omits the dashes and searches on 4D24487PS.  Is it possible in  
 PHP to have the find be successful, even if the search criteria  
 doesn't exactly match what's stored in the field?
 
   If this is possible, I'd appreciate it if someone could just point me  
 in the right direction so I can read up on it.
 
 Thanks,
 Frank
 
 -- 
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 

_
Drag n’ drop—Get easy photo sharing with Windows Live™ Photos.

http://www.microsoft.com/windows/windowslive/products/photos.aspx

RE: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Andrea Giammarchi


 Indeed you could do it via a regexp, but that uses up quite some memory. 
 Every time you do a SELECT. You can simply add a table column with the 
 stripped value and let the table update itself (with an ON UPDATE ON 
 INSERT trigger, which takes the input value for the itemID and strips it 
 once).
 
 When doing this on inputting the value into the database, you save 
 yourself the pain (and performance) of doing it on every SELECT-query.

Excuse me? Somebody suggested a PHP loop to solve a query problem and you are 
saying that REGEXP should not be used?
MySQL caches queries and 100 SELECT with a REGEXP will cost zero after the 
first one if nothing changed inside the table.

At the same time an internal REGEXP is faster than everything else has to move 
out and be parsed after via, probably, the same REGEXP engine. Try some bench.

This problem, imho, is a non-problem, at least not a PHP problem.

How MySQL optimizes internally REGEXPs is not PHP problem as well.

It's like to create a loop to read byte after byte because file_get_contents 
could be memory greedy (if you do that with 1 Gb of file you are you doing 
wrong in any case, logs need to be split as example) or avoid MATCH AGAINST in 
query if we have too many rows because of performances problem (table could be 
slipt as well to optimize performances) ... and these practices to avoid native 
solutions are a bit hilarious, imho.

Regards

_
Drag n’ drop—Get easy photo sharing with Windows Live™ Photos.

http://www.microsoft.com/windows/windowslive/products/photos.aspx

Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ben Dunlap
 Excuse me? Somebody suggested a PHP loop to solve a query problem and you are 
 saying that REGEXP should not be used?
 MySQL caches queries and 100 SELECT with a REGEXP will cost zero after the 
 first one if nothing changed inside the table.

Even if the REGEXP has to change with every query?

Performance aside, I think REGEXP() could be used here, but not in the
way you've suggested. As the OP has described his table, your regex
(^[a-zA-Z0-9]+$) won't match any rows, because all of his product
IDs have non-alphanumeric characters in them.

Suppose this table:

pk  | prod_id
1   | 07-ABCD-98
2   | 98-ZCXQ-21

And now suppose the OP's scenario, where a user tries to search on
product id, but enters 07ABCD98.

If the aim is to use REGEXP() to return row 1, I suppose you could
intersperse the search string with .? sequences and end up with this
query:

SELECT * FROM table WHERE prod_id REGEXP '^0.?7.?A.?B.?C.?D.?9.?8$'

I think just stripping the alphanumeric characters would end up being
more flexible, though.

-Ben

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ben Dunlap
 What's wrong with using the wildcards that are built into most SQL
 variants?

 SELECT * FROM table WHERE item_id LIKE '%#abcdef'

 Will select all records where the item_id field ends in '#abcdef'

That works if you know the user is always going to enter the last 7
characters of the product id, but that's not how the OP characterized
the problem. The OP talked about search strings where multiple
characters had been omitted from different parts of the product id.

Ben

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Andrea Giammarchi


 Even if the REGEXP has to change with every query?

Ben, it does not matter, this is not a PHP problem but a DB 
structure/select/insert/update problem.
Whatever REGEXP you use, a REGEXP is what you need to solve this problem, 
certainly not a PHP loop over each row with operations for each rows.

These things are OK if you do not know REGEXP or REGEXP MySQL syntax, but in 
this case you should ask for the correct REGEXP rather than talk about 
performances, obviously slower outside MySQL and via a runtime interpreted 
language as PHP is, or other solutions which aim is to end up with something 
that just emulate a select with REGEXP.

Did you get my point? Finally, when I say you I mean generally speaking :)

Regards

_
With Windows Live, you can organize, edit, and share your photos.
http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx

Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread sono-io
	Thanks to everyone who has responded.  After reading everyone's  
response, I think I have a very simple way to solve my problem.


	Using my original example, if someone wants to find item #  
4D-2448-7PS, no matter what they type in, I'll take the input, strip  
out all non-alphanumeric characters to make it 4D24487PS, add the  
wildcard character between each of the remaining characters like so,  
4*D*2*4*4*8*7*P*S, and then do the search.


	Still being new at this, it seems to be the simplest approach, or is  
my thinking flawed?  This also keeps me from having to add another  
field in the db to search on.


	BTW, this solution needs to work with any db, even ASCII files, so it  
has to happen in PHP.


Thanks again,
Frank

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ashley Sheridan
On Thu, 2009-09-03 at 12:12 -0700, sono...@fannullone.us wrote:
   Thanks to everyone who has responded.  After reading everyone's  
 response, I think I have a very simple way to solve my problem.
 
   Using my original example, if someone wants to find item #  
 4D-2448-7PS, no matter what they type in, I'll take the input, strip  
 out all non-alphanumeric characters to make it 4D24487PS, add the  
 wildcard character between each of the remaining characters like so,  
 4*D*2*4*4*8*7*P*S, and then do the search.
 
   Still being new at this, it seems to be the simplest approach, or is  
 my thinking flawed?  This also keeps me from having to add another  
 field in the db to search on.
 
   BTW, this solution needs to work with any db, even ASCII files, so it  
 has to happen in PHP.
 
 Thanks again,
 Frank
 
For speed you might want to consider an extra field in the DB in the
future. If the database gets larger, or your query needs to join several
tables together, then things will take a noticeable speed hit. I had a
similar issue myself where I had to search for names based on
mis-spellings of them. In the end I searched with metaphone tags on an
extra field in the DB set up for that purpose, but it was the only way
to do it that didn't affect the speed of the site.

Thanks,
Ash
http://www.ashleysheridan.co.uk




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Eddie Drapkin
On Thu, Sep 3, 2009 at 3:17 PM, Ashley Sheridana...@ashleysheridan.co.uk 
wrote:
 On Thu, 2009-09-03 at 12:12 -0700, sono...@fannullone.us wrote:
       Thanks to everyone who has responded.  After reading everyone's
 response, I think I have a very simple way to solve my problem.

       Using my original example, if someone wants to find item #
 4D-2448-7PS, no matter what they type in, I'll take the input, strip
 out all non-alphanumeric characters to make it 4D24487PS, add the
 wildcard character between each of the remaining characters like so,
 4*D*2*4*4*8*7*P*S, and then do the search.

       Still being new at this, it seems to be the simplest approach, or is
 my thinking flawed?  This also keeps me from having to add another
 field in the db to search on.

       BTW, this solution needs to work with any db, even ASCII files, so it
 has to happen in PHP.

 Thanks again,
 Frank

 For speed you might want to consider an extra field in the DB in the
 future. If the database gets larger, or your query needs to join several
 tables together, then things will take a noticeable speed hit. I had a
 similar issue myself where I had to search for names based on
 mis-spellings of them. In the end I searched with metaphone tags on an
 extra field in the DB set up for that purpose, but it was the only way
 to do it that didn't affect the speed of the site.

 Thanks,
 Ash
 http://www.ashleysheridan.co.uk




 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php



Has anyone considered deploying an actual search engine (Solr, Sphinx,
etc.), as they will take care of the stripping, stemming, spelling
corrections, etc?

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Tommy Pham
- Original Message 
 From: sono...@fannullone.us sono...@fannullone.us
 To: PHP General List php-general@lists.php.net
 Sent: Thursday, September 3, 2009 12:12:40 PM
 Subject: Re: [PHP] Searching on AlphaNumeric Content Only
 
 Thanks to everyone who has responded.  After reading everyone's response, 
 I 
 think I have a very simple way to solve my problem.
 
 Using my original example, if someone wants to find item # 4D-2448-7PS, 
 no 
 matter what they type in, I'll take the input, strip out all non-alphanumeric 
 characters to make it 4D24487PS, add the wildcard character between each of 
 the 
 remaining characters like so, 4*D*2*4*4*8*7*P*S, and then do the search.

The correct wildcard syntax to work in any DB (Oracle, MySQL, MSSQL, etc), is % 
and not * if I remember correctly.  Searching like this is ok but won't be 
efficient when you have a lot of rows.  As for external file processing txt, 
csv, etc... I recommend you create a separate mechanism for it since each 
storage medium is meant for different purposes.  txt (both delimited and fix 
formatted) and csv are usually meant for importing/exporting between various 
RDBMS types and different companies.  They're not mean for fast searching of 
data.  I suggest you think about the amount of the data you have to deal with 
1st and how often will the search be done on that data.  It's probably easier 
and faster just to import the ascii into db and do you search on db if you have 
to work with any ascii.

As for adding another field to the db, perhaps your project just started?  If 
so, wouldn't it be better to do it with the future in mind so later you won't 
have to go back and redesign the db and modify the codes because now you have 
over 100k rows to search and the search occurs just about every other hits?  
That time you now have could be used for code optimizing for better 
performance, add more features/functionalities to the site, etc... :)  Trust 
me, searching the db table with over 200k rows and return the results with 
multi-table joins based 1 criteria isn't fun.  Keep in mind that you shouldn't 
keep the users waiting more than 5 seconds.  Only exception to that rule is 
data mining where you'll have millions of rows to work with ;)  Then it's no 
longer your problem.  It's the DBA :D

Regards,
Tommy

 
 Still being new at this, it seems to be the simplest approach, or is my 
 thinking flawed?  This also keeps me from having to add another field in the 
 db 
 to search on.
 
 BTW, this solution needs to work with any db, even ASCII files, so it has 
 to 
 happen in PHP.
 
 Thanks again,
 Frank
 
 --PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Paul M Foster
On Thu, Sep 03, 2009 at 12:12:40PM -0700, sono...@fannullone.us wrote:

   Thanks to everyone who has responded.  After reading everyone's
 response, I think I have a very simple way to solve my problem.

   Using my original example, if someone wants to find item #
 4D-2448-7PS, no matter what they type in, I'll take the input, strip
 out all non-alphanumeric characters to make it 4D24487PS, add the
 wildcard character between each of the remaining characters like so,
 4*D*2*4*4*8*7*P*S, and then do the search.

Your expression, if used to directly search in your SQL table, won't
work. The '*' character isn't a valid wildcard for SQL. In PostgreSQL,
the wildcard for any number of characters is '%', and for a single
character is '_'. I don't know that MySQL understands this same
convention. And who knows about Oracle.

As others have mentioned, it would be ideal (though not very
normalized) to create a new table column which contains the
alphanumerics without the punctuation characters ('-'). In nearly any
SQL dialect, you could do a simple SELECT using LIKE to find your item,
if you're searching on this extra field.

If you want do the searching in PHP, then it becomes more complicated.
You'll have to strip out the dashes from the user input, and then query
all the keys from your table, and test them using a regular expression.
As mentioned before, this is time-consuming for a large table.

Here's something else to consider: Could there ever be two items which
only differ by the placement of their dashes? Like 4D-2448-7PS versus
4D2-44-87PS? If not, then you should store the item number without
punctuation, and use that as the primary key on your table. Have an
extra field which shows the item number with dashes. You can use this
extra field in printing inventory labels or whatever (I don't recall the
context of your original post).

Paul

-- 
Paul M. Foster

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Andrea Giammarchi

stripping, stemming, spelling corrections ?
 ... uhm, that's probably why they invented regular expressions, isn't it?

As I said, at the end of the day, this will be a manual slow, potentially wrong 
implementation of what we already have and use on daily basis.

But obviously, everybody is free to create his own problems, no doubts about 
that.

Regards

 Has anyone considered deploying an actual search engine (Solr, Sphinx,
 etc.), as they will take care of the stripping, stemming, spelling
 corrections, etc?


_
With Windows Live, you can organize, edit, and share your photos.
http://www.microsoft.com/middleeast/windows/windowslive/products/photo-gallery-edit.aspx

Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-03 Thread Ben Dunlap

 stripping, stemming, spelling corrections ?
  ... uhm, that's probably why they invented regular expressions, isn't it?

 As I said, at the end of the day, this will be a manual slow, potentially 
 wrong implementation of what we already have and use on daily basis.

If you've got a regular-expression-based method in mind that simply
nails the OP's problem, please share. I'm still not seeing how
regular expressions is a sufficient answer to the OP's problem,
which is basically fuzzy search.

My sense is that regular expressions are for situations where you
basically know just what you're searching for, but don't really know
where it falls in your search space.

The OP, on the other hand, is building a system where he won't know
just what he's searching for -- all he'll know is that his search key
is sort of like the thing he actually needs to find.

You might be able to squeeze this problem, or at least some part of
it, into a regex-based solution, but I don't think it's a natural fit.

Ben

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-02 Thread German Geek
Hi,

It's definitely possible to do when you do it in PHP, but not sure about on
the database side. You could read all records into memory and then iterate
over it with something like:

$toSearch = 4D24487PS
$charsToIgnore = array('-','+',...);

foreach ($items as $k=$item) {
  $itemVal = str_replace($charsToIgnore, '', $item);
  if (strcmp(str_replace($charsToIgnore, '', $toSearch), $itemVal) == 0) {
$return = $item;
break;
  }
}

This however might use a lot of memory, but if your DB is a manageable size
it should be ok. You can probably optimise it by iterating over a db result
set instead of reading everything into an array.

Cheers,
Tim
++Tim Hinnerk Heuer++

http://www.ihostnz.com

2009/9/3 sono...@fannullone.us

Is there is a way to search only for the alphanumeric content of
 field in a db?  I have an itemID field that contains item #'s that include
 dashes, forward slashes, etc, and I want people to be able to search for an
 item # even if they don't enter the punctuation exactly.

Here's an example:  let's say there is an itemID of 4D-2448-7PS but
 someone omits the dashes and searches on 4D24487PS.  Is it possible in PHP
 to have the find be successful, even if the search criteria doesn't exactly
 match what's stored in the field?

If this is possible, I'd appreciate it if someone could just point
 me in the right direction so I can read up on it.

 Thanks,
 Frank

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




Re: [PHP] Searching on AlphaNumeric Content Only

2009-09-02 Thread Ben Dunlap
        Is there is a way to search only for the alphanumeric content of
 field in a db?  I have an itemID field that contains item #'s that include
 dashes, forward slashes, etc, and I want people to be able to search for an
 item # even if they don't enter the punctuation exactly.

Not sure if there's anything specifically PHP-ish that will help you
here, but I would be inclined to start by storing a stripped-down
version of the item ID (alphanumeric characters only) in a separate
column in the database table.

Then, when a user enters some search data, I would remove
non-alphanumeric characters, if any, from the user's input, and then
search the stripped column with this normalized version of the input.

If you want even fuzzier matching (inadvertent transpositions or an
omitted character or two OK, for example), you might read about
Levenshtein distance:

http://en.wikipedia.org/wiki/Levenshtein_distance

PHP has a levenshtein function but you'll have to figure out a way to
use it efficiently with your data set. Or, if Levenshtein isn't quite
right for your needs, the article above might at least point you in a
useful direction.

Ben

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php